Library
Module
Module type
Parameter
Class
Class type
In_channel
for FASTA records. For more general info, see the Record_in_channel
module mli file.
Simplest way. May raise exceptions.
let records = Fasta.In_channel.with_file_records_exn fname
A bit more involved, but you won't get exceptions. Instead, you have to handle the Or_error.t
.
let records =
match Fasta.In_channel.with_file_records name with
| Error err ->
eprintf "Problem reading records: %s\n" (Error.to_string_hum err);
exit 1
| Ok records -> records
Use the iter
functions when you need to go over each record and perform some side-effects with them.
Print sequence IDs and sequence lengths
let () =
Fasta.In_channel.with_file_iter_records_exn "sequences.fasta"
~f:(fun record ->
let open Fasta.Record in
printf "%s => %d\n" (id record) (seq_length record))
Print sequence index, IDs, and sequence lengths.
This is like the last example except that we also want to print the index. The first record is 0, the 2nd is 1, etc.
let () =
Fasta.In_channel.with_file_iteri_records_exn "sequences.fasta"
~f:(fun index record ->
let open Fasta.Record in
printf "%d: %s => %d\n" (index + 1) (id record)
(seq_length record)
If you need to reduce all the records down to a single value, use the fold
functions.
Get total length of all sequences in the file.
Watch out as this may raise exceptions...see the _exn
suffix.
let total_length =
Fasta.In_channel.with_file_fold_records_exn "sequences.fasta" ~init:0
~f:(fun length record -> length + Fasta.Record.seq_length record)
Same thing, but this won't raise exceptions. You do have to handle Or_error.t
to get the final value. Note that within the fold function, you get Fasta.Record.t
and not Fasta.Record.t Or_error.t
.
let total_length =
match
Fasta.In_channel.with_file_fold_records name ~init:0
~f:(fun length record -> length + Fasta.Record.seq_length record)
with
| Error err ->
eprintf "Problem reading records: %s\n" (Error.to_string_hum err);
exit 1
| Ok total_length -> total_length
Sometimes you have a "pipeline" of computations that you need to do one after the other on records. In that case, you could the sequence
functions. Here's a silly example.
let () =
Fasta.In_channel.with_file_exn name ~f:(fun chan ->
Fasta.In_channel.record_sequence_exn chan
(* Add sequence index to record description *)
|> Sequence.mapi ~f:(fun i record ->
let new_desc =
match Fasta.Record.desc record with
| None -> Some (sprintf "sequence %d" i)
| Some old_desc ->
Some (sprintf "%s -- sequence %d" old_desc i)
in
Fasta.Record.with_desc new_desc record)
(* Convert all sequence chars to lowercase *)
|> Sequence.map ~f:(fun record ->
let new_seq = String.lowercase (Fasta.Record.seq record) in
Fasta.Record.with_seq new_seq record)
(* Print sequences *)
|> Sequence.iter ~f:(fun record ->
print_endline @@ Fasta.Record.serialize record))
One thing to watch out for though...if you get an exception half way through and you are running side-effecting code like we are here then part of your side effects will have occured and part of them will not have occured.
There are also Or_error.t
flavors of the sequence
functions. Just watch out because these you actually do have to deal with Or_error.t
for each Fasta.Record.t
in the sequence.
As an alternative, you could use the record_sequence_exn
function, but wrap that in the with_file
function. That way you don't have to deal with the Or_error.t
inside your pipeline. Instead you deal with it at the end.
let total_length =
match
Fasta.In_channel.with_file name ~f:(fun chan ->
Fasta.In_channel.record_sequence_exn chan
(* Blow up pipeline on second sequence. *)
|> Sequence.mapi ~f:(fun i record ->
if i = 1 then assert false;
record)
|> Sequence.fold ~init:0 ~f:(fun length record ->
length + String.length (Fasta.Record.seq record)))
with
| Error err ->
eprintf "Problem in parsing pipeline: %s\n"
(Error.to_string_hum err);
exit 1
| Ok total_length -> total_length
As you can see, if that fasta file has more than one sequence it will hit the assert false
and blow up.
include Record_in_channel.S with type record := Record.t
val stdin : t
create_exn file_name
opens an t
on the standard input channel.
val create_exn : Base.string -> t
create_exn file_name
opens an input channel on the file specified by file_name
.
val create : Base.string -> t Base.Or_error.t
create file_name
opens an input channel on the file specified by file_name
.
val close : t -> Base.unit Base.Or_error.t
close t
is like close_exnt t
except that it shouldn't raise.
val with_file_exn : Base.string -> f:(t -> 'a) -> 'a
with_file_exn file_name ~f
executes ~f
on the channel created from file_name
and closes it afterwards.
val with_file : Base.string -> f:(t -> 'a) -> 'a Base.Or_error.t
with_file file_name ~f
is like with_file_exn file_name ~f
except that it shouldn't raise.
val input_record_exn : t -> Record.t Base.option
input_record_exn t
returns Some record
if there is a record
to return. If there are no more records, None
is returned. Exn
is raised on bad input.
val input_record : t -> Record.t Base.option Base.Or_error.t
input_record t
is like input_record_exn t
except that it should not raise exceptions.
fold_records_exn t ~init ~f
reduces all records from a t
down to a single value of type 'a
.
val fold_records :
t ->
init:'a ->
f:('a -> Record.t -> 'a) ->
'a Base.Or_error.t
fold_records t ~init ~f
is like fold_records_exn t ~init ~f
except that it should not raise exceptions. Rather than deal with exceptions inside the reducing function, you must deal with them at the end when handling the return value.
Like fold_records_exn t ~init ~f
except that f
is provided the 0-based record index as its first argument. See fold_records_exn
.
val foldi_records :
t ->
init:'a ->
f:(Base.int -> 'a -> Record.t -> 'a) ->
'a Base.Or_error.t
Like foldi_records_exn t ~init ~f
except that it shouldn't raise. See foldi_records_exn
.
val with_file_fold_records_exn :
Base.string ->
init:'a ->
f:('a -> Record.t -> 'a) ->
'a
with_file_fold_records_exn file_name ~init ~f
is like fold_records_exn t ~init ~f
except that it is passed a file name, and it manages t
automatically. See with_file
.
val with_file_fold_records :
Base.string ->
init:'a ->
f:('a -> Record.t -> 'a) ->
'a Base.Or_error.t
with_file_fold_records file_name ~init ~f
is like fold_records t ~init ~f
except that it is passed a file name, and it manages t
automatically. See with_file
.
val with_file_foldi_records_exn :
Base.string ->
init:'a ->
f:(Base.int -> 'a -> Record.t -> 'a) ->
'a
with_file_foldi_records_exn file_name ~init ~f
is like foldi_records_exn t ~init ~f
except that it is passed a file name, and it manages t
automatically. See with_file
.
val with_file_foldi_records :
Base.string ->
init:'a ->
f:(Base.int -> 'a -> Record.t -> 'a) ->
'a Base.Or_error.t
with_file_foldi_records file_name ~init ~f
is like fold'_records t ~init ~f
except that it is passed a file name, and it manages t
automatically. See with_file
.
The iter
functions are like the fold
functions except they do not take an init
value and the f
function returns unit
insead of some other value 'a
, and thus return unit
rather than a value 'a
.
They are mainly called for side effects.
iter_records_exn t ~f
calls f
on each record
in t
. As f
returns unit
this is generally used for side effects.
val iter_records : t -> f:(Record.t -> Base.unit) -> Base.unit Base.Or_error.t
iter_records t ~f
is like iter_records_exn t ~f
except that it shouldn't raise.
iteri_records_exn t ~f
is like iteri_records_exn t ~f
except that f
is passed in the 0-indexed record index as its first argument.
iteri_records t ~f
is like iteri_records_exn t ~f
except that it shouldn't raise.
val with_file_iter_records_exn :
Base.string ->
f:(Record.t -> Base.unit) ->
Base.unit
with_file_iter_records_exn file_name ~init ~f
is like iter_records_exn t ~init ~f
except that it is passed a file name, and it manages t
automatically. See with_file
.
val with_file_iter_records :
Base.string ->
f:(Record.t -> Base.unit) ->
Base.unit Base.Or_error.t
with_file_iter_records file_name ~init ~f
is like iter_records t ~init ~f
except that it is passed a file name, and it manages t
automatically. See with_file
.
val with_file_iteri_records_exn :
Base.string ->
f:(Base.int -> Record.t -> Base.unit) ->
Base.unit
with_file_iteri_records_exn file_name ~init ~f
is like iteri_records_exn t ~init ~f
except that it is passed a file name, and it manages t
automatically. See with_file
.
val with_file_iteri_records :
Base.string ->
f:(Base.int -> Record.t -> Base.unit) ->
Base.unit Base.Or_error.t
with_file_iteri_records file_name ~init ~f
is like iteri_records t ~init ~f
except that it is passed a file name, and it manages t
automatically. See with_file
.
These functions return lists of records
s.
val records_exn : t -> Record.t Base.List.t
val records : t -> Record.t Base.List.t Base.Or_error.t
val with_file_records_exn : Base.string -> Record.t Base.List.t
val with_file_records : Base.string -> Record.t Base.List.t Base.Or_error.t
These are a bit different:
* There are no with_file
versions as you would have to do some fiddly things to keep the channel open, making them not so nice to use.
* Each record
that is yielded is wrapped in an Or_error.t
. This is different from the iter
, fold
, and other non _exn
functions in which case the entire result is wrapped in an Or_error.t
, letting you ignore errors in the passed in ~f
function and deal with failure once.
val record_sequence_exn : t -> Record.t Base.Sequence.t
record_sequence_exn t
returns a Sequence.t
of record
. May raise exceptions.
val record_sequence : t -> Record.t Base.Or_error.t Base.Sequence.t
record_sequence t
is like record_sequence_exn t
except that instead of raising exceptions, each item of the sequence is a record Or_error.t
rather than an "unwrapped" record
. This could make things annoying to deal with. If you don't want exceptions, you could instead wrap your entire sequence processing pipeline in a call to with_file
and handle the Or_error.t
in that way. See the pipelines usage examples for more info.