Page
Library
Module
Module type
Parameter
Class
Class type
Source
Fasta.In_channelSourceIn_channel for FASTA records. For more general info, see the Record_in_channel module mli file.
Simplest way. May raise exceptions.
let records = Fasta.In_channel.with_file_records_exn fname A bit more involved, but you won't get exceptions. Instead, you have to handle the Or_error.t.
let records =
match Fasta.In_channel.with_file_records name with
| Error err ->
eprintf "Problem reading records: %s\n" (Error.to_string_hum err);
exit 1
| Ok records -> recordsUse the iter functions when you need to go over each record and perform some side-effects with them.
Print sequence IDs and sequence lengths
let () =
Fasta.In_channel.with_file_iter_records_exn "sequences.fasta"
~f:(fun record ->
let open Fasta.Record in
printf "%s => %d\n" (id record) (seq_length record))Print sequence index, IDs, and sequence lengths.
This is like the last example except that we also want to print the index. The first record is 0, the 2nd is 1, etc.
let () =
Fasta.In_channel.with_file_iteri_records_exn "sequences.fasta"
~f:(fun index record ->
let open Fasta.Record in
printf "%d: %s => %d\n" (index + 1) (id record)
(seq_length record)If you need to reduce all the records down to a single value, use the fold functions.
Get total length of all sequences in the file.
Watch out as this may raise exceptions...see the _exn suffix.
let total_length =
Fasta.In_channel.with_file_fold_records_exn "sequences.fasta" ~init:0
~f:(fun length record -> length + Fasta.Record.seq_length record)Same thing, but this won't raise exceptions. You do have to handle Or_error.t to get the final value. Note that within the fold function, you get Fasta.Record.t and not Fasta.Record.t Or_error.t.
let total_length =
match
Fasta.In_channel.with_file_fold_records name ~init:0
~f:(fun length record -> length + Fasta.Record.seq_length record)
with
| Error err ->
eprintf "Problem reading records: %s\n" (Error.to_string_hum err);
exit 1
| Ok total_length -> total_lengthSometimes you have a "pipeline" of computations that you need to do one after the other on records. In that case, you could the sequence functions. Here's a silly example.
let () =
Fasta.In_channel.with_file_exn name ~f:(fun chan ->
Fasta.In_channel.record_sequence_exn chan
(* Add sequence index to record description *)
|> Sequence.mapi ~f:(fun i record ->
let new_desc =
match Fasta.Record.desc record with
| None -> Some (sprintf "sequence %d" i)
| Some old_desc ->
Some (sprintf "%s -- sequence %d" old_desc i)
in
Fasta.Record.with_desc new_desc record)
(* Convert all sequence chars to lowercase *)
|> Sequence.map ~f:(fun record ->
let new_seq = String.lowercase (Fasta.Record.seq record) in
Fasta.Record.with_seq new_seq record)
(* Print sequences *)
|> Sequence.iter ~f:(fun record ->
print_endline @@ Fasta.Record.serialize record))One thing to watch out for though...if you get an exception half way through and you are running side-effecting code like we are here then part of your side effects will have occured and part of them will not have occured.
There are also Or_error.t flavors of the sequence functions. Just watch out because these you actually do have to deal with Or_error.t for each Fasta.Record.t in the sequence.
As an alternative, you could use the record_sequence_exn function, but wrap that in the with_file function. That way you don't have to deal with the Or_error.t inside your pipeline. Instead you deal with it at the end.
let total_length =
match
Fasta.In_channel.with_file name ~f:(fun chan ->
Fasta.In_channel.record_sequence_exn chan
(* Blow up pipeline on second sequence. *)
|> Sequence.mapi ~f:(fun i record ->
if i = 1 then assert false;
record)
|> Sequence.fold ~init:0 ~f:(fun length record ->
length + String.length (Fasta.Record.seq record)))
with
| Error err ->
eprintf "Problem in parsing pipeline: %s\n"
(Error.to_string_hum err);
exit 1
| Ok total_length -> total_lengthAs you can see, if that fasta file has more than one sequence it will hit the assert false and blow up.
include Record_in_channel.S with type record := Record.tcreate_exn file_name opens an input channel on the file specified by file_name.
create file_name opens an input channel on the file specified by file_name.
close t is like close_exnt t except that it shouldn't raise.
with_file_exn file_name ~f executes ~f on the channel created from file_name and closes it afterwards.
with_file file_name ~f is like with_file_exn file_name ~f except that it shouldn't raise.
input_record_exn t returns Some record if there is a record to return. If there are no more records, None is returned. Exn is raised on bad input.
input_record t is like input_record_exn t except that it should not raise exceptions.
fold_records_exn t ~init ~f reduces all records from a t down to a single value of type 'a.
fold_records t ~init ~f is like fold_records_exn t ~init ~f except that it should not raise exceptions. Rather than deal with exceptions inside the reducing function, you must deal with them at the end when handling the return value.
Like fold_records_exn t ~init ~f except that f is provided the 0-based record index as its first argument. See fold_records_exn.
val foldi_records :
t ->
init:'a ->
f:(Base.int -> 'a -> Record.t -> 'a) ->
'a Base.Or_error.tLike foldi_records_exn t ~init ~f except that it shouldn't raise. See foldi_records_exn.
with_file_fold_records_exn file_name ~init ~f is like fold_records_exn t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.
val with_file_fold_records :
Base.string ->
init:'a ->
f:('a -> Record.t -> 'a) ->
'a Base.Or_error.twith_file_fold_records file_name ~init ~f is like fold_records t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.
val with_file_foldi_records_exn :
Base.string ->
init:'a ->
f:(Base.int -> 'a -> Record.t -> 'a) ->
'awith_file_foldi_records_exn file_name ~init ~f is like foldi_records_exn t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.
val with_file_foldi_records :
Base.string ->
init:'a ->
f:(Base.int -> 'a -> Record.t -> 'a) ->
'a Base.Or_error.twith_file_foldi_records file_name ~init ~f is like fold'_records t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.
The iter functions are like the fold functions except they do not take an init value and the f function returns unit insead of some other value 'a, and thus return unit rather than a value 'a.
They are mainly called for side effects.
iter_records_exn t ~f calls f on each record in t. As f returns unit this is generally used for side effects.
iter_records t ~f is like iter_records_exn t ~f except that it shouldn't raise.
iteri_records_exn t ~f is like iteri_records_exn t ~f except that f is passed in the 0-indexed record index as its first argument.
iteri_records t ~f is like iteri_records_exn t ~f except that it shouldn't raise.
with_file_iter_records_exn file_name ~init ~f is like iter_records_exn t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.
val with_file_iter_records :
Base.string ->
f:(Record.t -> Base.unit) ->
Base.unit Base.Or_error.twith_file_iter_records file_name ~init ~f is like iter_records t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.
val with_file_iteri_records_exn :
Base.string ->
f:(Base.int -> Record.t -> Base.unit) ->
Base.unitwith_file_iteri_records_exn file_name ~init ~f is like iteri_records_exn t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.
val with_file_iteri_records :
Base.string ->
f:(Base.int -> Record.t -> Base.unit) ->
Base.unit Base.Or_error.twith_file_iteri_records file_name ~init ~f is like iteri_records t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.
These functions return lists of recordss.
These are a bit different:
* There are no with_file versions as you would have to do some fiddly things to keep the channel open, making them not so nice to use.
* Each record that is yielded is wrapped in an Or_error.t. This is different from the iter, fold, and other non _exn functions in which case the entire result is wrapped in an Or_error.t, letting you ignore errors in the passed in ~f function and deal with failure once.
record_sequence_exn t returns a Sequence.t of record. May raise exceptions.
record_sequence t is like record_sequence_exn t except that instead of raising exceptions, each item of the sequence is a record Or_error.t rather than an "unwrapped" record. This could make things annoying to deal with. If you don't want exceptions, you could instead wrap your entire sequence processing pipeline in a call to with_file and handle the Or_error.t in that way. See the pipelines usage examples for more info.