To focus the search input from anywhere on the page, press the 'S' key.
in-package search v0.1.0
Library
Module
Module type
Parameter
Class
Class type
Very Simple Parser Combinators
These combinators can be used to write very simple parsers, for example to extract data from a line-oriented file, or as a replacement to Scanf
.
A few examples
Some more advanced example(s) can be found in the /examples
directory.
Parse a tree
open CCParse;;
type tree = L of int | N of tree * tree;;
let mk_leaf x = L x
let mk_node x y = N(x,y)
let ptree = fix @@ fun self ->
skip_space *>
( (char '(' *> (pure mk_node <*> self <*> self) <* char ')')
<|>
(U.int >|= mk_leaf) )
;;
parse_string_exn ptree "(1 (2 3))" ;;
parse_string_exn ptree "((1 2) (3 (4 5)))" ;;
Parse a list of words
open Containers.Parse;;
let p = U.list ~sep:"," U.word;;
parse_string_exn p "[abc , de, hello ,world ]";;
Stress Test
This makes a list of 100_000 integers, prints it and parses it back.
let p = CCParse.(U.list ~sep:"," U.int);;
let l = CCList.(1 -- 100_000);;
let l_printed =
CCFormat.(to_string (within "[" "]" (list ~sep:(return ",@,") int))) l;;
let l' = CCParse.parse_string_exn p l_printed;;
assert (l=l');;
Stability guarantees
Some functions are marked "experimental" and are still subject to change.
module Position : sig ... end
module Error : sig ... end
'a or_error
is either Ok x
for some result x : 'a
, or an error Error.t
.
See stringify_result
and Error.to_string
to print the error message.
exception ParseError of Error.t
Input
Combinators
val return : 'a -> 'a t
Always succeeds, without consuming its input.
bind f p
results in a new parser which behaves as p
then, in case of success, applies f
to the result.
val eoi : unit t
Expect the end of input, fails otherwise.
val empty : unit t
Succeed with ()
.
val fail : string -> 'a t
fail msg
fails with the given message. It can trigger a backtrack.
parsing s p
behaves the same as p
, with the information that we are parsing s
, if p
fails. The message s
is added to the error, it does not replace it, not does the location change (the error still points to the same location as in p
).
set_error_message msg p
behaves like p
, but if p
fails, set_error_message msg p
fails with msg
instead and at the current position. The internal error message of p
is just discarded.
with_pos p
behaves like p
, but returns the (starting) position along with p
's result.
EXPERIMENTAL
val any_char : char t
any_char
parses any character. It still fails if the end of input was reached.
val any_char_n : int -> string t
any_char_n len
parses exactly len
characters from the input. Fails if the input doesn't contain at least len
chars.
val char : char -> char t
char c
parses the character c
and nothing else.
A slice of the input, as returned by some combinators such as split_1
or split_list
or take
.
The idea is that one can use some parsers to cut the input into slices, e.g. split into lines, or split a line into fields (think CSV or TSV). Then a variety of parsers can be used on each slice to extract data from it using recurse
.
Slices contain enough information to make it possible for recurse slice p
to report failures (if p
fails) using locations from the original input, not relative to the slice. Therefore, even after splitting the input into lines using, say, each_line
, a failure to parse the 500th line will be reported at line 500 and not at line 1.
EXPERIMENTAL
module Slice : sig ... end
Functions on slices.
recurse slice p
parses the slice
(most likely obtained via another combinator, such as split_1
or split_list
), using p
.
The slice contains a position which is used to relocate error messages to their position in the whole input, not just relative to the slice.
EXPERIMENTAL
set_current_slice slice
replaces the parser's state with slice
.
EXPERIMENTAL
val chars_fold :
f:
('acc ->
char ->
[ `Continue of 'acc
| `Consume_and_stop of 'acc
| `Stop of 'acc
| `Fail of string ]) ->
'acc ->
('acc * slice) t
chars_fold f acc0
folds over characters of the input. Each char c
is passed, along with the current accumulator, to f
; f
can either:
- stop, by returning
`Stop acc
. In this case the final accumulatoracc
is returned, andc
is not consumed. - consume char and stop, by returning
`Consume_and_stop acc
. - fail, by returning
`Fail msg
. In this case the parser fails with the given message. - continue, by returning
`Continue acc
. The parser continues to the next char with the new accumulator.
This is a generalization of of chars_if
that allows one to transform characters on the fly, skip some, handle escape sequences, etc. It can also be useful as a base component for a lexer.
val chars_fold_transduce :
f:
('acc ->
char ->
[ `Continue of 'acc
| `Yield of 'acc * char
| `Consume_and_stop
| `Stop
| `Fail of string ]) ->
'acc ->
('acc * string) t
Same as chars_fold
but with the following differences:
- returns a string along with the accumulator, rather than the slice of all the characters accepted by
`Continue _
. The string is built from characters returned by`Yield
. - new case
`Yield (acc, c)
addsc
to the returned string and continues parsing withacc
.
take_until_success p
accumulates characters of the input into a slice, until p
successfully parses a value x
; then it returns slice, x
.
NOTE performance wise, if p
does a lot of work at each position, this can be costly (thing naive substring search if p
is string "very long needle"
).
take len
parses exactly len
characters from the input. Fails if the input doesn't contain at least len
chars.
take_if f
takes characters as long as they satisfy the predicate f
.
take1_if f
takes characters as long as they satisfy the predicate f
. Fails if no character satisfies f
.
val char_if : ?descr:string -> (char -> bool) -> char t
char_if f
parses a character c
if f c = true
. Fails if the next char does not satisfy f
.
val chars_if : (char -> bool) -> string t
chars_if f
parses a string of chars that satisfy f
. Cannot fail.
val chars1_if : ?descr:string -> (char -> bool) -> string t
Like chars_if
, but accepts only non-empty strings. chars1_if p
fails if the string accepted by chars_if p
is empty. chars1_if p
is equivalent to take1_if p >|= Slice.to_string
.
val endline : char t
Parse '\n'.
val space : char t
Tab or space.
val white : char t
Tab or space or newline.
val skip_chars : (char -> bool) -> unit t
Skip 0 or more chars satisfying the predicate.
val skip_space : unit t
Skip ' ' and '\t'.
val skip_white : unit t
Skip ' ' and '\t' and '\n'.
suspend f
is the same as f ()
, but evaluates f ()
only when needed.
A practical use case is to implement recursive parsers manually, as described in fix
. The parser is let rec p () = …
, and suspend p
can be used in the definition to use p
.
val string : string -> string t
string s
parses exactly the string s
, and nothing else.
many p
parses p
repeatedly, until p
fails, and collects the results into a list.
optional p
tries to parse p
, and return ()
whether it succeeded or failed. Cannot fail itself. It consumes input if p
succeeded (as much as p
consumed), but consumes not input if p
failed.
try_ p
is just like p
(it used to play a role in backtracking semantics but no more).
try_opt p
tries to parse using p
, and return Some x
if p
succeeded with x
(and consumes what p
consumed). Otherwise it returns None
and consumes nothing. This cannot fail.
many_until ~until p
parses as many p
as it can until the until
parser successfully returns. If p
fails before that then many_until ~until p
fails as well. Typically until
can be a closing ')' or another termination condition, and what is consumed by until
is also consumed by many_until ~until p
.
EXPERIMENTAL
try_or p1 ~f ~else_:p2
attempts to parse x
using p1
, and then becomes f x
. If p1
fails, then it becomes p2
. This can be useful if f
is expensive but only ever works if p1
matches (e.g. after an opening parenthesis or some sort of prefix).
try_or_l ?else_ l
tries each pair (test, p)
in order. If the n-th test
succeeds, then try_or_l l
behaves like n-th p
, whether p
fails or not. If test
consumes input, the state is restored before calling p
. If they all fail, and else_
is defined, then it behaves like else_
. If all fail, and else_
is None
, then it fails as well.
This is a performance optimization compared to (<|>)
. We commit to a branch if the test succeeds, without backtracking at all. It can also provide better error messages, because failures in the parser will not be reported as failures in try_or_l
.
See lookahead_ignore
for a convenient way of writing the test conditions.
or_ p1 p2
tries to parse p1
, and if it fails, tries p2
from the same position.
both a b
parses a
, then b
, then returns the pair of their results.
many1 p
is like many p
excepts it fails if the list is empty (i.e. it needs p
to succeed at least once).
skip p
parses zero or more times p
and ignores its result. It is eager, meaning it will continue as long as p
succeeds. As soon as p
fails, skip p
stops consuming any input.
Same as sep
but stop when until
parses successfully.
lookahead p
behaves like p
, except it doesn't consume any input.
EXPERIMENTAL
lookahead_ignore p
tries to parse input with p
, and succeeds if p
succeeds. However it doesn't consume any input and returns ()
, so in effect its only use-case is to detect whether p
succeeds, e.g. in try_or_l
.
EXPERIMENTAL
Fixpoint combinator. fix (fun self -> p)
is the parser p
, in which self
refers to the parser p
itself (which is useful to parse recursive structures.
An alternative, manual implementation to let p = fix (fun self -> q)
is:
let rec p () =
let self = suspend p in
q
val line_str : string t
line_str
is line >|= Slice.to_string
. It parses the next line and turns the slice into a string. The state points to the character immediately after the '\n'
character.
split_1 ~on_char
looks for on_char
in the input, and returns a pair sl1, sl2
, where:
sl1
is the slice of the input the precedes the first occurrence ofon_char
, or the whole input ifon_char
cannot be found. It does not containon_char
.sl2
is the slice that comes afteron_char
, orNone
ifon_char
couldn't be found. It doesn't contain the first occurrence ofon_char
(if any).
The parser is now positioned at the end of the input.
EXPERIMENTAL
split_list ~on_char
splits the input on all occurrences of on_char
, returning a list of slices.
EXPERIMENTAL
split_list_at_most ~on_char n
applies split_1 ~on_char
at most n
times, to get a list of n+1
elements. The last element might contain on_char
. This is useful to limit the amount of work done by split_list
.
EXPERIMENTAL
split_2 ~on_char
splits the input into exactly 2 fields, and fails if the split yields less or more than 2 items. EXPERIMENTAL
split_list_map ~on_char p
uses split_list ~on_char
to split the input, then parses each chunk of the input thus obtained using p
.
The difference with sep ~by:(char on_char) p
is that sep
calls p
first, and only tries to find on_char
after p
returns. While it is more flexible, this technique also means p
has to be careful not to consume on_char
by error.
A useful specialization of this is each_line
, which is basically each_split ~on_char:'\n' p
.
EXPERIMENTAL
all
returns all the unconsumed input as a slice, and consumes it. Use Slice.to_string
to turn it into a string.
Note that lookahead all
can be used to peek at the rest of the input without consuming anything.
val all_str : string t
all_str
accepts all the remaining chars and extracts them into a string. Similar to all
but with a string.
EXPERIMENTAL
Memoize the parser. memo p
will behave like p
, but when called in a state (read: position in input) it has already processed, memo p
returns a result directly. The implementation uses an underlying hashtable. This can be costly in memory, but improve the run time a lot if there is a lot of backtracking involving p
.
Do not call memo
inside other functions, especially with (>>=)
, map
, etc. being so prevalent. Instead the correct way to use it is in a toplevel definition:
let my_expensive_parser = memo (foo *> bar >>= fun i -> …)
This function is not thread-safe.
Infix
module Infix : sig ... end
include module type of Infix
Alias to map
. p >|= f
parses an item x
using p
, and returns f x
.
Alias to bind
. p >>= f
results in a new parser which behaves as p
then, in case of success, applies f
to the result.
a <* b
parses a
into x
, parses b
and ignores its result, and returns x
.
a *> b
parses a
, then parses b
into x
, and returns x
. The result of a
is ignored.
Alias to or_
.
a <|> b
tries to parse a
, and if a
fails without consuming any input, backtracks and tries to parse b
, otherwise it fails as a
.
a <?> msg
behaves like a
, but if a
fails, a <?> msg
fails with msg
instead. Useful as the last choice in a series of <|>
. For example: a <|> b <|> c <?> "expected one of a, b, c"
.
Alias to both
. a ||| b
parses a
, then b
, then returns the pair of their results.
Let operators on OCaml >= 4.08.0, nothing otherwise
Parse input
Turn a Error.t
-oriented result into a more basic string result.
Version of parse_string
that returns a more detailed error.
val parse_string_exn : 'a t -> string -> 'a
parse_file p filename
parses file named filename
with p
by opening the file and reading it whole.
Version of parse_file
that returns a more detailed error.
val parse_file_exn : 'a t -> string -> 'a
Same as parse_file
, but
module U : sig ... end
module Debug_ : sig ... end
Debugging utils. EXPERIMENTAL