package delimited_parsing

  1. Overview
  2. Docs

Read CSVs & CSV-like delimited formats (following the CSV quoting behaviour).

These formats are loosely documented by RFC 4180: https://www.ietf.org/rfc/rfc4180.txt

include module type of Delimited_kernel.Read with module Streaming := Delimited_kernel.Read.Streaming
exception Bad_csv_formatting of string list * string

Row up to the error, and the field with the error up to the point of failure. Same as Expert.Parse_state.Bad_csv_formatting.

type 'a t

This provides an applicative interface for constructing values from a csv file.

An 'a t describes how to build an OCaml model 'a for each row.

Simple example:

type t =
  { foo : int
  ; bar : string
  }

(* Describes how to generate a [t] from a row of a csv file *)
let parse : t Delimited_kernel.Read.t =
  let open Delimited_kernel.Read.Let_syntax in
  let%map_open foo = at_header "foo" ~f:Int.of_string
  and bar = at_header "bar" ~f:String.of_string in
  { foo; bar }
;;

let _ =
  Delimited_kernel.Read.list_of_string ~header:`Yes parse
    "foo,bar\n2,\"hello, world\"\n"
;;
include Core.Applicative.S with type 'a t := 'a t
val return : 'a -> 'a t
val map : 'a t -> f:('a -> 'b) -> 'b t
val both : 'a t -> 'b t -> ('a * 'b) t
val (<*>) : ('a -> 'b) t -> 'a t -> 'b t

same as apply

val (<*) : 'a t -> unit t -> 'a t
val (*>) : unit t -> 'a t -> 'a t
val (>>|) : 'a t -> ('a -> 'b) -> 'b t
val apply : ('a -> 'b) t -> 'a t -> 'b t
val map2 : 'a t -> 'b t -> f:('a -> 'b -> 'c) -> 'c t
val map3 : 'a t -> 'b t -> 'c t -> f:('a -> 'b -> 'c -> 'd) -> 'd t
val all : 'a t list -> 'a list t
val all_unit : unit t list -> unit t
module Applicative_infix : sig ... end
module Open_on_rhs_intf : sig ... end
include Core.Applicative.Let_syntax with type 'a t := 'a t with module Open_on_rhs_intf := Open_on_rhs_intf
module Let_syntax : sig ... end
val at_index : int -> f:(string -> 'a) -> 'a t

Read a field at the given index. Use f to convert the field from string.

val at_header : string -> f:(string -> 'a) -> 'a t

Read a field at the given header. Use f to convert the field from string.

Note that if the given header is not provided through either the file or the ~header argument to the parsers, this will fail at runtime.

val at_header_opt : string -> f:(string option -> 'a) -> 'a t

Read a field at the given header, if it exists. Use f to convert the field from string.

module Record_builder : Record_builder.S with type 'a applicative = 'a t
module Fields_O : sig ... end

The following are convenience functions that build on Record_builder.field to make it easy to define a t Delimited.Read.t for some record type t.

module On_invalid_row : sig ... end
module Header : sig ... end

Header parsing control

module Row : sig ... end

Whole-row parsing.

val fold_string : ?strip:bool -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a On_invalid_row.t -> 'a t -> init:'b -> f:('b -> 'a -> 'b) -> string -> 'b

Fold the CSV rows contained in the given string.

val list_of_string : ?strip:bool -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a On_invalid_row.t -> 'a t -> string -> 'a list

Load the CSV as a list

val read_lines : ?strip:bool -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a On_invalid_row.t -> 'a t -> Core.In_channel.t -> 'a list

Read CSV file.

Experts only. If you really think you need a function in this module, please talk to a delimited dev first.

module Streaming : sig ... end

Async helpers for delimited parsing

val fold_reader : ?strip:bool -> ?skip_lines:int -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a On_invalid_row.t -> 'a t -> init:'b -> f:('b -> 'a -> 'b Async.Deferred.t) -> Async.Reader.t -> 'b Async.Deferred.t

fold_reader ?strip ?skip_lines ?sep ?quote ~init ~f r produces a value by folding over a csv document read from r. The reader will be closed on EOF.

If strip is true, leading and trailing whitespace is stripped from each field. Default value is false.

If skip_lines > 0, that many lines are skipped at the start of the input. Note that this skips lines without doing any CSV parsing of the lines being skipped, so newlines within a quoted field are treated identically to newlines outside a quoted field. An exception will be raised if the input has fewer than skip_lines lines. Default value is 0.

sep is the character that separates fields within a row. Default value is ','

quote defines a character to use for quoting. `Using '"' implements the MS Excel convention: either a field is unquoted, or it has leading and trailing quotes and internal escaped characters are represented as quote-char char, e.g., "\n to escape a newline. `No_quoting means all characters are literal. The default is `Using '"'

val fold_readeri : ?strip:bool -> ?skip_lines:int -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a On_invalid_row.t -> 'a t -> init:'b -> f:(int -> 'b -> 'a -> 'b Async.Deferred.t) -> Async.Reader.t -> 'b Async.Deferred.t

Same as fold_reader, except that it also passes the line number of the current row to f.

val fold_reader' : ?strip:bool -> ?skip_lines:int -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a On_invalid_row.t -> 'a t -> init:'b -> f:('b -> 'a Core.Queue.t -> 'b Async.Deferred.t) -> Async.Reader.t -> 'b Async.Deferred.t

fold_reader' ?strip ?skip_lines ?sep ?quote ~init ~f r works similarly to fold_reader, except for the f argument. fold_reader' runs f on batches of Row.ts rather than running f on each individual row.

val fold_readeri' : ?strip:bool -> ?skip_lines:int -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a On_invalid_row.t -> 'a t -> init:'b -> f:('b -> (int * 'a) Core.Queue.t -> 'b Async.Deferred.t) -> Async.Reader.t -> 'b Async.Deferred.t

Same as fold_reader', except that each element in a batch of Row.t's is a tuple of line number and row.

val fold_reader_without_pushback : ?strip:bool -> ?skip_lines:int -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a On_invalid_row.t -> 'a t -> init:'b -> f:('b -> 'a -> 'b) -> Async.Reader.t -> 'b Async.Deferred.t

Same as fold_reader but the fold function does not exert pushback on the fold.

val fold_reader_without_pushbacki : ?strip:bool -> ?skip_lines:int -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a On_invalid_row.t -> 'a t -> init:'b -> f:(int -> 'b -> 'a -> 'b) -> Async.Reader.t -> 'b Async.Deferred.t

Same as fold_reader_without_pushback, except that it also passes the line number of the current row to f.

val pipe_of_reader : ?strip:bool -> ?skip_lines:int -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a On_invalid_row.t -> 'a t -> Async.Reader.t -> 'a Async.Pipe.Reader.t

pipe_of_reader t reader produces a pipe reader of parsed values.

val pipe_of_chunks : ?strip:bool -> ?start_line_number:int -> ?skip_lines:int -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a On_invalid_row.t -> 'a t -> string Async.Pipe.Reader.t -> 'a Async.Pipe.Reader.t

pipe_of_chunks is like pipe_of_reader except taking a pipe of input strings instead of a reader.

Note that the fragmentation of the strings in the input pipe is irrelevant: they aren't assumed to be separate lines, so each string can contain multiple lines or be a fragment of a line, and must contain explicit newline charcters to separate each line. If you're using the output of Reader.lines here, note that it removes the newlines, so you'd need to add them back in, or see pipe_of_lines.

skip_lines has the same caveat as for fold_reader: the lines are skipped before CSV parsing starts, so it does not treat newlines within quoted strings specially, and it raises an exception if the input doesn't contain at least that many lines.

val pipe_of_lines : ?strip:bool -> ?start_line_number:int -> ?skip_lines:int -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a On_invalid_row.t -> 'a t -> string Async.Pipe.Reader.t -> 'a Async.Pipe.Reader.t

pipe_of_lines is like pipe_of_chunks, but it assumes that each element of the input pipe is a separate line of a csv, inserting newlines in between them. The newline insertion is not "smart" in any way, so in particular:

  • input elements with unquoted newlines in them are treated as multiple lines,
  • any existing trailing newlines in the input elements would still have a newline inserted after them, resulting in blank lines,
  • input elements that start but don't close a quoted string will lead to weird results.
val create_reader : ?strip:bool -> ?skip_lines:int -> ?sep:char -> ?quote:[ `No_quoting | `Using of char ] -> ?header:Header.t -> ?on_invalid_row:'a On_invalid_row.t -> 'a t -> string -> 'a Async.Pipe.Reader.t Async.Deferred.t

create_reader filename opens a reader for the given filename & returns a pipe of its parsed values.