Library
Module
Module type
Parameter
Class
Class type
In_channel
for FASTA records. For more general info, see the Record_in_channel
module mli file.
let records = Fasta.In_channel.with_file_records fname
Use the iter
functions when you need to go over each record and perform some side-effects with them.
Print sequence IDs and sequence lengths
let () =
Fasta.In_channel.with_file_iter_records "sequences.fasta"
~f:(fun record ->
let open Fasta.Record in
printf "%s => %d\n" (id record) (seq_length record))
Print sequence index, IDs, and sequence lengths.
This is like the last example except that we also want to print the index. The first record is 0, the 2nd is 1, etc.
let () =
Fasta.In_channel.with_file_iteri_records "sequences.fasta"
~f:(fun i record ->
let open Fasta.Record in
printf "%d: %s => %d\n" (i + 1) (id record) (seq_length record))
If you need to reduce all the records down to a single value, use the fold
functions.
Get total length of all sequences in the file.
let total_length =
Fasta.In_channel.with_file_fold_records "sequences.fasta" ~init:0
~f:(fun length record -> length + Fasta.Record.seq_length record)
Sometimes you have a "pipeline" of computations that you need to do one after the other on records. In that case, you could the sequence
functions. Here's a silly example.
let () =
Fasta.In_channel.with_file name ~f:(fun chan ->
Fasta.In_channel.record_sequence chan
(* Add sequence index to record description *)
|> Sequence.mapi ~f:(fun i record ->
let new_desc =
match Fasta.Record.desc record with
| None -> Some (sprintf "sequence %d" i)
| Some old_desc ->
Some (sprintf "%s -- sequence %d" old_desc i)
in
Fasta.Record.with_desc new_desc record)
(* Convert all sequence chars to lowercase *)
|> Sequence.map ~f:(fun record ->
let new_seq = String.lowercase (Fasta.Record.seq record) in
Fasta.Record.with_seq new_seq record)
(* Print sequences *)
|> Sequence.iter ~f:(fun record ->
print_endline @@ Fasta.Record.to_string record))
One thing to watch out for though...if you get an exception half way through and you are running side-effecting code like we are here then part of your side effects will have occured and part of them will not have occured.
As you can see, if that fasta file has more than one sequence it will hit the assert false
and blow up.
include Record_in_channel.S with type record := Record.t
val stdin : t
create file_name
opens an t
on the standard input channel.
val create : Base.string -> t
create file_name
opens an input channel on the file specified by file_name
. You may want to use Base.Exn.protectx
with this.
val with_file : Base.string -> f:(t -> 'a) -> 'a
with_file file_name ~f
executes ~f
on the channel created from file_name
and ensures it is closed properly.
val input_record : t -> Record.t Base.option
input_record t
returns Some record
if there is a record
to return. If there are no more records, None
is returned. Raises exceptions on bad input (e.g., bad file format).
fold_records t ~init ~f
reduces all records from a t
down to a single value of type 'a
.
fold'_records t ~init ~f
is like fold_records
except that f
is provided the 0-based record index as its first argument.
val with_file_fold_records :
Base.string ->
init:'a ->
f:('a -> Record.t -> 'a) ->
'a
with_file_fold_records file_name ~init ~f
is like fold_records t ~init ~f
except that it is passed a file name, and it manages t
automatically. See with_file
.
val with_file_foldi_records :
Base.string ->
init:'a ->
f:(Base.int -> 'a -> Record.t -> 'a) ->
'a
with_file_foldi_records file_name ~init ~f
is like foldi_records t ~init ~f
except that it is passed a file name, and it manages t
automatically. See with_file
.
The iter
functions are like the fold
functions except they do not take an init
value and the f
function returns unit
insead of some other value 'a
, and thus return unit
rather than a value 'a
.
Use them for side-effects.
iter_records t ~f
calls f
on each record
in t
. As f
returns unit
this is generally used for side effects.
iteri_records t ~f
is like iteri_records t ~f
except that f
is passed in the 0-indexed record index as its first argument.
val with_file_iter_records :
Base.string ->
f:(Record.t -> Base.unit) ->
Base.unit
with_file_iter_records file_name ~init ~f
is like iter_records t ~init ~f
except that it is passed a file name, and it manages t
automatically. See with_file
.
val with_file_iteri_records :
Base.string ->
f:(Base.int -> Record.t -> Base.unit) ->
Base.unit
with_file_iteri_records file_name ~init ~f
is like iteri_records t ~init ~f
except that it is passed a file name, and it manages t
automatically. See with_file
.
These functions return record
lists.
val with_file_records : Base.string -> Record.t Base.list
These are a bit different:
* There are no with_file
versions as you would have to do some fiddly things to keep the channel open, making them not so nice to use.
* If an exception is raised sometime during the pipeline, it will blow up, but any successful processing that happended, will have happened. So be careful if you are doing side-effecting things.
val record_sequence : t -> Record.t Base.Sequence.t
record_sequence t
returns a Sequence.t
of record
.