package angstrom

  1. Overview
  2. Docs

Parser combinators built for speed and memory-efficiency.

Angstrom is a parser-combinator library that provides monadic and applicative interfaces for constructing parsers with unbounded lookahead. Its parsers can consume input incrementally, whether in a blocking or non-blocking environment. To achieve efficient incremental parsing, Angstrom offers both a buffered and unbuffered interface to input streams, with the Unbuffered interface enabling zero-copy IO. With these features and low-level iteration parser primitives like take_while and skip_while, Angstrom makes it easy to write efficient, expressive, and reusable parsers suitable for high-performance applications.

type +'a t

A parser for values of type 'a.

Basic parsers

val peek_char : char option t

peek_char accepts any char and returns it, or returns None if the end of input has been reached.

This parser does not advance the input. Use it for lookahead.

val peek_char_fail : char t

peek_char_fail accepts any char and returns it. If end of input has been reached, it will fail.

This parser does not advance the input. Use it for lookahead.

val peek_string : int -> string t

peek_string n accepts exactly n characters and returns them as a string. If there is not enough input, it will fail.

This parser does not advance the input. Use it for lookahead.

val char : char -> char t

char c accepts c and returns it.

val not_char : char -> char t

not_char accepts any character that is not c and returns the matched character.

val any_char : char t

any_char accepts any character and returns it.

val any_uint8 : int t

any_uint8 accepts any byte and returns it as an unsigned int8.

val any_int8 : int t

any_int8 accepts any byte and returns it as a signed int8.

val satisfy : (char -> bool) -> char t

satisfy f accepts any character for which f returns true and returns the accepted character.

val string : string -> string t

string s accepts s exactly and returns it.

val string_ci : string -> string t

string_ci s accepts s, ignoring case, and returns the matched string, preserving the case of the original input.

val skip : (char -> bool) -> unit t

skip f accepts any character for which f returns true and discards the accepted character. skip f is equivalent to satisfy f but discards the accepted character.

val skip_while : (char -> bool) -> unit t

skip_while f accepts input as long as f returns true and discards the accepted characters.

val take : int -> string t

take n accepts exactly n characters of input and returns them as a string.

val take_while : (char -> bool) -> string t

take_while f accepts input as long as f returns true and returns the accepted characters as a string.

This parser does not fail. If f returns false on the first character, it will return the empty string.

val take_while1 : (char -> bool) -> string t

take_while f accepts input as long as f returns true and returns the accepted characters as a string.

This parser requires that f return true for at least one character of input, and will fail otherwise.

val take_till : (char -> bool) -> string t

take_till f accepts input as long as f returns false and returns the accepted characters as a string.

This parser does not fail. If f returns true on the first character, it will return the empty string.

val advance : int -> unit t

advance n advances the input n characters, failing if the remaining input is less than n.

val end_of_line : unit t

end_of_input accepts either a line feed \n, or a carriage return followed by a line feed \r\n and returns unit.

val at_end_of_input : bool t

at_end_of_input returns whether the end of the end of input has been reached. This parser always succeeds.

val end_of_input : unit t

end_of_input succeeds if all the input has been consumed, and fails otherwise.

val scan : 'state -> ('state -> char -> 'state option) -> (string * 'state) t

scan init f consumes until f returns None. Returns the final state before None and the accumulated string

val scan_state : 'state -> ('state -> char -> 'state option) -> 'state t

scan init f Like scan but only returns the final state before None. Much more efficient than scan

val scan_string : 'state -> ('state -> char -> 'state option) -> string t

scan init f Like scan but discards the final state and returns the accumulated string

module BE : sig ... end

Big endian parsers

module LE : sig ... end

Little endian parsers

Combinators

val option : 'a -> 'a t -> 'a t

option v p runs p, returning the result of p if it succeeds and v if it fails.

val list : 'a t list -> 'a list t

list ps runs each p in ps in sequence, returning a list of results of each p.

val count : int -> 'a t -> 'a list t

count n p runs p n times, returning a list of the results.

val many : 'a t -> 'a list t

many p runs p zero or more times and returns a list of results from the runs of p.

val many1 : 'a t -> 'a list t

many1 p runs p one or more times and returns a list of results from the runs of p.

val many_till : 'a t -> 'b t -> 'a list t

many_till p e runs parser p zero or more times until action e succeeds and returns the list of result from the runs of p.

val sep_by : 'a t -> 'b t -> 'b list t

sep_by s p runs p zero or more times, interspersing runs of s in between.

val sep_by1 : 'a t -> 'b t -> 'b list t

sep_by1 s p runs p one or more times, interspersing runs of s in between.

val skip_many : 'a t -> unit t

skip_many p runs p zero or more times, discarding the results.

val skip_many1 : 'a t -> unit t

skip_many1 p runs p one or more times, discarding the results.

val fix : ('a t -> 'a t) -> 'a t

fix f computes the fixpoint of f and runs the resultant parser. The argument that f receives is the result of fix f, which f must use, paradoxically, to define fix f.

fix is useful when constructing parsers for inductively-defined types such as sequences, trees, etc. Consider for example the implementation of the many combinator defined in this library:

let many p =
fix (fun m ->
  (cons <$> p <*> m) <|> return [])

many p is a parser that will run p zero or more times, accumulating the result of every run into a list, returning the result. It's defined by passing fix a function. This function assumes its argument m is a parser that behaves exactly like many p. You can see this in the expression comprising the left hand side of the alternative operator <|>. This expression runs the parser p followed by the parser m, and after which the result of p is cons'd onto the list that m produces. The right-hand side of the alternative operator provides a base case for the combinator: if p fails and the parse cannot proceed, return an empty list.

Another way to illustrate the uses of fix is to construct a JSON parser. Assuming that parsers exist for the basic types such as false, true, null, strings, and numbers, the question then becomes how to define a parser for objects and arrays? Both contain values that are themselves JSON values, so it seems as though it's impossible to write a parser that will accept JSON objects and arrays before writing a parser for JSON values as a whole.

This is the exact situation that fix was made for. By defining the parsers for arrays and objects within the function that you pass to fix, you will gain access to a parser that you can use to parse JSON values, the very parser you are defining!

let json =
fix (fun json ->
  let arr = char '[' *> sep_by (char ',') json <* char ']' in
  let obj = char '{' *> ... json ... <* char '}' in
  choice [str; num; arr json, ...])

Alternatives

val (<|>) : 'a t -> 'a t -> 'a t

p <|> q runs p and returns the result if succeeds. If p fails, then the input will be reset and q will run instead.

val choice : 'a t list -> 'a t

choice ts runs each parser in ts in order until one succeeds and returns that result.

val (<?>) : 'a t -> string -> 'a t

p <?> name associates name with the parser p, which will be reported in the case of failure.

val commit : unit t

commit prevents backtracking beyond the current position of the input, allowing the manager of the input buffer to reuse the preceding bytes for other purposes.

The Unbuffered parsing interface will report directly to the caller the number of bytes committed to the when returning a Unbuffered.state.Partial state, allowing the caller to reuse those bytes for any purpose. The Buffered will keep track of the region of committed bytes in its internal buffer and reuse that region to store additional input when necessary.

Monadic/Applicative interface

val return : 'a -> 'a t

return v creates a parser that will always succeed and return v

val fail : string -> 'a t

fail msg creates a parser that will always fail with the message msg

val (>>=) : 'a t -> ('a -> 'b t) -> 'b t

p >>= f creates a parser that will run p, pass its result to f, run the parser that f produces, and return its result.

val (>>|) : 'a t -> ('a -> 'b) -> 'b t

p >>| f creates a parser that will run p, and if it succeeds with result v, will return f v

val (<*>) : ('a -> 'b) t -> 'a t -> 'b t

f <*> p is equivalent to f >>= fun f -> p >>| f.

val (<$>) : ('a -> 'b) -> 'a t -> 'b t

f <$> p is equivalent to p >>| f

val (*>) : 'a t -> 'b t -> 'b t

p *> q runs p, discards its result and then runs q, and returns its result.

val (<*) : 'a t -> 'b t -> 'a t

p <* q runs p, then runs q, discards its result, and returns the result of p.

val lift : ('a -> 'b) -> 'a t -> 'b t
val lift2 : ('a -> 'b -> 'c) -> 'a t -> 'b t -> 'c t
val lift3 : ('a -> 'b -> 'c -> 'd) -> 'a t -> 'b t -> 'c t -> 'd t
val lift4 : ('a -> 'b -> 'c -> 'd -> 'e) -> 'a t -> 'b t -> 'c t -> 'd t -> 'e t

The liftn family of functions promote functions to the parser monad. For any of these functions, the following equivalence holds:

liftn f p1 ... pn = f <$> p1 <*> ... <*> pn

These functions are more efficient than using the applicative interface directly, mostly in terms of memory allocation but also in terms of speed. Prefer them over the applicative interface, even when the arity of the function to be lifted exceeds the maximum n for which there is an implementation for liftn. In other words, if f has an arity of 5 but only lift4 is provided, do the following:

lift4 f m1 m2 m3 m4 <*> m5

Even with the partial application, it will be more efficient than the applicative implementation.

Running

type input = [
  1. | `String of string
  2. | `Bigstring of bigstring
]
val parse_only : 'a t -> input -> ('a, string) Result.result

parse_only t input runs t on input. The parser will receive an `Eof after all of input has been consumed. For use-cases requiring that the parser be fed input incrementally, see the Buffered and Unbuffered modules below.

module Buffered : sig ... end

Buffered parsing interface.

module Unbuffered : sig ... end

Unbuffered parsing interface.

OCaml

Innovation. Community. Security.