package delimited_parsing

  1. Overview
  2. Docs

Csv parses character-separated values where fields may be quoted and quotation marks within quoted fields are escaped with another quotation mark, MSExcel-style.

An applicative interface for parsing values from a csv file.

module Header = Header
type 'a t

This provides an applicative interface for constructing values from a csv file.

An 'a t describes how to build an OCaml model 'a for each row.

See lib/async_extended/example/csv_example.ml for an example of usage.

include Core.Applicative.S with type 'a t := 'a t
val return : 'a -> 'a t
val apply : ('a -> 'b) t -> 'a t -> 'b t
val map : 'a t -> f:('a -> 'b) -> 'b t
val map2 : 'a t -> 'b t -> f:('a -> 'b -> 'c) -> 'c t
val map3 : 'a t -> 'b t -> 'c t -> f:('a -> 'b -> 'c -> 'd) -> 'd t
val all : 'a t list -> 'a list t
val all_unit : unit t list -> unit t
val all_ignore : unit t list -> unit t
  • deprecated [since 2018-02] Use [all_unit]
val both : 'a t -> 'b t -> ('a * 'b) t
module Applicative_infix : sig ... end
include module type of Applicative_infix
val (<*>) : ('a -> 'b) t -> 'a t -> 'b t

same as apply

val (<*) : 'a t -> unit t -> 'a t
val (*>) : unit t -> 'a t -> 'a t
module Let_syntax : sig ... end
val at_index : int -> f:(string -> 'a) -> 'a t
val at_header : string -> f:(string -> 'a) -> 'a t
type 'a on_invalid_row

'a on_invalid_row specifies how to handle a row whose extents are known but whose contents cannot be converted to a value of type 'a. The default is to raise.

If a row's extents are unknown, the parser cannot continue and will always raise.

val fold_reader : ?strip:bool -> ?skip_lines:int -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a on_invalid_row -> 'a t -> init:'b -> f:('b -> 'a -> 'b Async.Deferred.t) -> Async.Reader.t -> 'b Async.Deferred.t

of_reader ?strip ?skip_lines ?sep ?quote ~init ~f r produces a value by folding over a csv document read from r.

If strip is true, leading and trailing whitespace is stripped from each field. Default value is false.

If skip_lines > 0, that many lines are skipped at the start of the input. Note that this skips lines without doing any CSV parsing of the lines being skipped, so newlines within a quoted field are treated identically to newlines outside a quoted field. Default value is 0.

sep is the character that separates fields within a row. Default value is ','

quote defines a character to use for quoting. The default is `Using '"' which implements the MS Excel convention: either a field is unquoted, or it has leading and trailing quotes and internal escaped characters are represented as quote-char char, e.g., |"a| for a. `No_quoting means all characters are literal.

val fold_reader' : ?strip:bool -> ?skip_lines:int -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a on_invalid_row -> 'a t -> init:'b -> f:('b -> 'a Core.Queue.t -> 'b Async.Deferred.t) -> Async.Reader.t -> 'b Async.Deferred.t

of_reader' ?strip ?skip_lines ?sep ?quote ~init ~f r works similarly to of_reader, except for the f argument. of_reader' runs f on batches of Row.ts rather than running f on each individual row.

val fold_reader_without_pushback : ?strip:bool -> ?skip_lines:int -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a on_invalid_row -> 'a t -> init:'b -> f:('b -> 'a -> 'b) -> Async.Reader.t -> 'b Async.Deferred.t
val fold_reader_to_pipe : ?strip:bool -> ?skip_lines:int -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a on_invalid_row -> 'a t -> Async.Reader.t -> 'a Async.Pipe.Reader.t
val fold_string : ?strip:bool -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a on_invalid_row -> 'a t -> init:'b -> f:('b -> 'a -> 'b) -> string -> 'b

Low-level interface

module Fast_queue : sig ... end
module On_invalid_row : sig ... end
module Parse_state : sig ... end

At the lowest level, we model csv parsing as a fold over string arrays, one array per row. It is up to you to interpret the header row.

Backwards-compatible interface

module Builder : sig ... end
val create_parse_state : ?strip:bool -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?on_invalid_row:'a on_invalid_row -> header_map:int Core.String.Map.t -> 'a t -> init:'b -> f:('b -> 'a -> 'b) -> 'b Parse_state.t
module Header_parse : sig ... end
module Row : sig ... end
exception Bad_csv_formatting of string list * string
OCaml

Innovation. Community. Security.