Library
Module
Module type
Parameter
Class
Class type
Parser provides functions and types to construct robust, performant and reusable parsers.
At the core is a type Reparse.t
which represents a constructed parser definition. A parser Reparse.t
is defined by composing together one or more parsers or Reparse.t
s via usage of parser operators.
An instance of Reparse.t
represents an un-evaluated parser. Use Reparse.parse
function to evaluate it.
Reparse.input
represents a generalization of data input to Reparse.parse
. Implement the interface to create new input types.
Parser operators - or functions - are broadly organized into following categories:
An Infix module contains infix and let syntax support functions.
See examples of use.
Represents a parser which can parse value 'a
.
Use parse functions to evaluate a parser.
class type input = object ... end
Represents a generalization of data input source to a parser. Implement this interface to provide new sources of input to Reparse.parse
.
Include the reparse
package in utop
.
Copy and paste the sample in utop and type ;;
to run it.
#require "reparse";;
Evaluate a parser.
val parse_string : ?track_lnum:bool -> 'a t -> string -> 'a
parse_string ~track_lnum p s
evaluates p
to value v
while consuming string instance s
.
If track_num
is true
then the parser tracks both the line and the column numbers. It is set to false
by default.
Line number and column number both start count from 1
if enabled, 0
otherwise.
Also see Reparse.lnum
and Reparse.cnum
.
Examples
Track line and column number
module P = Reparse
open P
;;
let s = "hello world" in
let p = P.(take next *> map2 (fun lnum cnum -> (lnum, cnum)) lnum cnum) in
let v = P.parse_string ~track_lnum:true p s in
v = (1, 12)
Default behaviour - doesn't track line, column number.
module P = Reparse
open P
;;
let s = "hello world" in
let p = P.(take next *> map2 (fun lnum cnum -> (lnum, cnum)) lnum cnum) in
let v = P.parse_string p s in
v = (0, 0)
parse
is a generalised version of Reparse.parse_string
over type Reparse.input
.
Use this function when you have a custom implementation of Reparse.input
.
Raised by parsers which are unable to parse successfully.
offset
is the current index position of input at the time of failure.
line_number
is line number at the time of failure.
column_number
is column number at the time of failure.
msg
contains an error description.
Create parsers from values.
val pure : 'a -> 'a t
pure v
always parses value v
.
Examples
module P = Reparse
;;
let input = new P.string_input "" in
let v1 = P.(parse input (pure 5)) in
let v2 = P.(parse input (pure "hello")) in
v1 = 5 && v2 = "hello"
val unit : unit t
unit
is a convenience function to create a new parser which always parses to value ()
.
unit
is pure ()
.
val fail : string -> 'a t
fail err_msg
returns a parser that always fails with err_msg
.
Examples
module P = Reparse
;;
let input = new P.string_input "" in
let r =
try
let _ = P.(parse input (fail "hello error")) in
assert false
with
| e -> e
in
r
= P.Parser
{ offset = 0
; line_number = 0
; column_number = 0
; msg = "hello error"
}
Define parsers by joining two or more parsers.
module Applicative_infix : sig ... end
module Monad_infix : sig ... end
ignore_m t
is map t ~f:(fun _ -> ())
. ignore_m
used to be called ignore
, but we decided that was a bad name, because it shadowed the widely used Stdlib.ignore
. Some monads still do let ignore = ignore_m
for historical reasons.
Like all
, but ensures that every monadic value in the list produces a unit value, all of which are discarded rather than being collected into a list.
module Infix : sig ... end
Provides functions to support infix and let syntax operators.
Provides functions to support infix and let syntax operators.
Open the module to use it:
open Reparse.Infix
p >|= f
returns a new parser encapsulating value b
where,
a
is the parsed value of p
.b
is f a
.Also known as map
operation.
Examples
module P = Reparse
open P
;;
let f a = Char.code a in
let p = P.char 'h' in
let p = p >|= f in
let v = P.parse_string p "hello" in
v = 104
pf <*> q
returns a new parser encapsulating value b
where
pf
and q
are evaluated sequentially in order as given.f
is the parsed value of pf
a
is the parsed value of q
b
is f a
Also known as Applicative
operation.
Examples
module P = Reparse
open P
;;
let f a = a + 2 in
let pf = P.pure f in
let q = P.pure 2 in
let p = pf <*> q in
let v = P.parse_string p "hello" in
v = 4
v <$ p
replaces the parse value of p
with v
.
Examples
module P = Reparse
open P
;;
let v = "hello" in
let p = P.char 'h' in
let p = v <$ p in
let v2 = P.parse_string p "hello" in
v2 = "hello"
f <$> p
returns a parser encapsulating value b
where,
a
is the parsed value of p
b
is f a
This is the infix version of Reparse.Infix.map
.
Examples
module P = Reparse
open P
;;
let f a = a ^ " world" in
let p = P.string "hello" in
let p = f <$> p in
let v = P.parse_string p "hello" in
v = "hello world"
p *> q
returns a parser encapsulating value a
where,
p
, q
are evaluated sequentially in order as given.a
is parsed value of q
.p
is discarded.Also known as discard left
.
Examples
module P = Reparse
open P
;;
let p = P.string "world" in
let q = P.pure "hello" in
let p = p *> q in
let v = P.parse_string p "world" in
v = "hello"
p <* q
returns a parser encapsulating value a
where,
p
, q
are evaluated sequentially in order as given.a
is parsed value of p
.q
is discarded.Also know as discard_right.
Examples
module P = Reparse
open P
;;
let p = P.string "world" in
let q = P.pure "hello" in
let p = p <* q in
let v = P.parse_string p "world" in
v = "world"
p <|> q
returns a parser encapsulating value a
where,
p
,q
are evaluated sequentially in order as given.a
is the parsed value of p
if p
is successfula
is the parsed value of q
if p
is a failure and q
is a success.p
and q
- fails, then the parser fails.Examples
p
fails and q
succeeds, therefore we return q
's parsed value 'w'
module P = Reparse
open P
;;
let p = P.char 'h' in
let q = P.char 'w' in
let p = p <|> q in
let v = P.parse_string p "world" in
v = 'w'
p
succeeds therefore we return its parsed value 'h'
let p = P.char 'h' in
let q = P.char 'w' in
let p = p <|> q in
let v = P.parse_string p "hello" in
v = 'h'
The parser fails if both p
and q
fails.
let p = P.char 'h' in
let q = P.char 'w' in
let p = p <|> q in
let v =
try
let _ = P.parse_string p "" in
false
with
| _ -> true
in
v = true
p <?> err_msg
parses p
to value a
and returns a new parser encapsulating a
. If p
is a failure, then it fails with error message err_msg
.
Often used as a last choice in <|>
, e.g. a <|> b <|> c <?> "expected a b c"
.
Examples
module P = Reparse
open P
;;
let p = P.char 'h' <|> P.char 'w' in
let err_msg = "[error]" in
let p = p <?> err_msg in
let v =
try
let _ = P.parse_string p "" in
false
with
| P.Parser
{ offset = 0
; line_number = 0
; column_number = 0
; msg = "[error]"
} ->
true
| _ -> false
in
v = true
let*
is a let syntax binding for Reparse.Infix.((>>=))
Examples
module P = Reparse
open P
;;
let p =
let* a = P.pure 5 in
let total = a + 5 in
P.pure total
in
let v = P.parse_string p "" in
v = 10
let*
is a let syntax binding for Reparse.((>|=))
Examples
module P = Reparse
open P
;;
let p =
let+ a = P.pure 5 in
let total = a + 5 in
total
in
let v = P.parse_string p "" in
v = 10
delay p
returns a parser which lazily parses p
.
Examples
module P = Reparse
open P
;;
let p = P.(delay (lazy (char 'z')) <|> delay (lazy (char 'a'))) in
let v = P.parse_string p "abc" in
v = 'a'
named name p
uses name
as part of an error message when constructing exception Reparse
if parse of p
fails.
Also see Reparse.Infix.((<?>))
Examples
module P = Reparse
open P
;;
let p = P.(char 'a' |> named "parse_c") in
let v =
try
let _ = P.parse_string p "zzd" in
assert false
with
| e -> e
in
v
= P.Parser
{ offset = 0
; line_number = 0
; column_number = 0
; msg = "[parse_c] Reparse.Parser(0, 0, 0, \"[char] expected 'a'\")"
}
One or the other.
any l
parses the value of the first successful parser in list l
.
Specified parsers in l
are evaluated sequentially from left to right. A failed parser doesn't consume any input, i.e. offset
is unaffected.
The parser fails if none of the parsers in l
are evaluated successfully.
Examples
First successful parser result is returned
module P = Reparse
;;
let p = P.(any [ char 'z'; char 'x'; char 'a' ]) in
let v = P.parse_string p "zabc" in
v = 'z'
;;
let p = P.(any [ char 'z'; char 'x'; char 'a' ]) in
let v = P.parse_string p "xabc" in
v = 'x'
;;
let p = P.(any [ char 'z'; char 'x'; char 'a' ]) in
let v = P.parse_string p "abc" in
v = 'a'
Parser fails when none of the parsers in l
are successful.
let p = P.(any [ char 'z'; char 'x'; char 'a' ]) in
let v =
try
let _ = P.parse_string p "yyy" in
false
with
| _ -> true
in
v = true
recur f
returns a recursive parser. Function value f
accepts a parser p
as its argument and returns a parser q
. Parser q
in its definition can refer to p
and p
can refer to q
in its own definition.
Such parsers are also known as a fixpoint or y combinator.
Discards parsed values.
skip ~at_least ~up_to p
repeatedly parses p
and discards its value.
The lower and upper bound of repetition is specified by arguments at_least
and up_to
respectively. The default value of at_least
is 0. The default value of up_to
is unspecified, i.e. there is no upper limit.
The repetition ends when one of the following occurs:
p
evaluates to failureup_to
upper bound value is reachedThe parser encapsulates the count of times p
was evaluated successfully.
Examples
module P = Reparse
;;
let p = P.(skip space) in
let v = P.parse_string p " " in
v = 5
skip_while p ~while_
repeatedly parses p
and discards its value if parser while_
parses to value true
.
The repetition ends when one of the following occurs:
p
evaluates to failurewhile_
returns false
Note while_
does not consume input.
The parser encapsulates the count of times p
was evaluated successfully.
Examples
module P = Reparse
;;
let p = P.(skip_while next ~while_:(is space)) in
let v = P.parse_string p " " in
v = 5
Collects parsed values
take ~at_least ~up_to ~sep_by p
repeatedly parses p
and returns the parsed values.
The lower and upper bound of repetition is specified by arguments at_least
and up_to
respectively. The default value of at_least
is 0
. The default value of up_to
is unspecified, i.e. there is no upper limit.
If sep_by
is specified then the evaluation of p
must be followed by a successful evaluation of sep_by
. The parsed value of sep_by
is discarded.
The repetition ends when one of the following occurs:
p
evaluates to failuresep_by
evaluates to failureup_to
upper boudn value is reachedThe parser fails if the count of repetition of p
does not match the value specified by at_least
.
Examples
Default behaviour.
module P = Reparse
;;
let p = P.(take (char 'a')) in
let v = P.parse_string p "aaaaa" in
v = [ 'a'; 'a'; 'a'; 'a'; 'a' ]
Specify ~sep_by
.
module P = Reparse
;;
let p = P.(take ~sep_by:(char ',') (char 'a')) in
let v = P.parse_string p "a,a,a,a,a" in
v = [ 'a'; 'a'; 'a'; 'a'; 'a' ]
Specify lower bound argument at_least
.
module P = Reparse
;;
let p = P.(take ~at_least:3 ~sep_by:(char ',') (char 'a')) in
let v = P.parse_string p "a,a,a,a,a" in
v = [ 'a'; 'a'; 'a'; 'a'; 'a' ]
Lower bound not met results in error.
module P = Reparse
;;
let p = P.(take ~at_least:5 ~sep_by:(char ',') (char 'a')) in
let v =
try
let _ = P.parse_string p "a,a,a,a" in
false
with
| _ -> true
in
v = true
Specify upper bound up_to
.
module P = Reparse
;;
let p = P.(take ~up_to:3 ~sep_by:(char ',') (char 'a')) in
let v = P.parse_string p "a,a,a,a,a" in
v = [ 'a'; 'a'; 'a' ]
take_while ~sep_by p ~while_ p
repeatedly parses p
and returns its value.
p
is evaluated if and only if while_
evaluates to true
.
If sep_by
is specified then the evaluation of p
must be followed by a successful evaluation of sep_by
. The parsed value of sep_by
is discarded.
The repetition ends when one of the following occurs:
p
evaluates to failurewhile_
returns false
sep_by
evaluates to failureNote while_
does not consume input.
Examples
Default behaviour.
module P = Reparse
;;
let p = P.(take_while ~while_:(is_not (char 'b')) (char 'a')) in
let v = P.parse_string p "aab" in
v = [ 'a'; 'a' ]
Specify sep_by
.
module P = Reparse
;;
let p =
P.(take_while ~sep_by:(char ',') ~while_:(is_not (char 'b')) (char 'a'))
in
let v = P.parse_string p "a,a,ab" in
v = [ 'a'; 'a'; 'a' ]
take_between ~sep_by ~start ~end_ p
parses start
and then repeatedly parses p
while the parsed value of p
doesn't equal to parsed value of end_
. After the repetition end, it parses end_
. The parser returns the list of parsed values of p
.
Both start
and end_
parser values are discarded.
If sep_by
is specified then the evaluation of p
must be followed by a successful evaluation of sep_by
. The parsed value of sep_by
is discarded.
The repetition ends when one of the following occurs:
p
evaluates to failureend_
parsed value matches p
parsed valuesep_by
evaluates to failureExamples
module P = Reparse
;;
let p =
P.(
take_between ~sep_by:(char ',') ~start:(P.char '(') ~end_:(char ')')
next)
in
let v = P.parse_string p "(a,a,a)" in
v = [ 'a'; 'a'; 'a' ]
take_while_on ~sep_by ~while_ ~on_take p
repeatedly parses p
and calls callback on_take_cb
with the parsed value.
p
is evaluated if and only if while_
evaluates to true
.
If sep_by
is specified then the evaluation of p
must be followed by a successful evaluation of sep_by
. The parsed value of sep_by
is discarded.
p
is evaluated repeatedly. The repetition ends when one of the following occurs:
on_take_cb
is the callback function that is called every time p
is evaluated.
p
evaluates to failurewhile_
returns false
sep_by
evaluates to failuretake_while_cb
is the general version of Reparse.take_while
. It allows to specify how the value a
is to be collected.
Note while_
does not consume input.
Examples
module P = Reparse
open P
;;
let buf = Buffer.create 0 in
let on_take_cb a = Buffer.add_char buf a in
let p =
P.(take_while_cb (char 'a') ~while_:(is_not (char 'b')) ~on_take_cb)
in
let v = P.parse_string p "aaab" in
let s = Buffer.contents buf in
v = 3 && s = "aaa"
Don't fail when parsing is not successful.
optional p
parses Some a
if successful and None
otherwise. a
is the parsed value of p
.
Examples
module P = Reparse
open P
;;
let p = P.(optional (char 'a')) in
let v = P.parse_string p "ab" in
v = Some 'a'
;;
let p = P.(optional (char 'z')) in
let v = P.parse_string p "ab" in
v = None
val is_eoi : bool t
is_eoi
parses to true
if parser has reached end of input, false
otherwise.
Examples
module P = Reparse
;;
let v = P.(parse_string is_eoi "") in
v = true
;;
let v = P.(parse_string is_eoi "a") in
v = false
val eoi : unit t
eoi
parses end of input. Fails if parser is not at end of input.
Examples
module P = Reparse
;;
let v = P.(parse_string eoi "") in
v = ()
;;
let v =
try
let _ = P.(parse_string eoi "a") in
false
with
| _ -> true
in
v = true
val lnum : int t
lnum
parses the current line number of input. line number count start form 1
.
Examples
module P = Reparse
open P
;;
let p = P.(next *> lnum) in
let v = P.parse_string ~track_lnum:true p "bcb" in
v = 1
val cnum : int t
cnum
parses the current column number. column number count start from 1
.
Examples
module P = Reparse
open P
;;
let p = P.(next *> cnum) in
let v = P.parse_string ~track_lnum:true p "bcb" in
v = 2
val offset : int t
offset
parses the current input offset. offset count start from 0
.
Examples
module P = Reparse
open P
;;
let p = P.(next *> offset) in
let v = P.parse_string ~track_lnum:true p "bcb" in
v = 1
true
, false
, is, is not.
not_ p
parses value ()
if and only if p
fails to parse, otherwise the parse fails.
Examples
module P = Reparse
;;
let p = P.(not_ (char 'a')) in
let v = P.parse_string p "bbb" in
v = ()
not_followed_by p q
parses value of p
only if immediate and subsequent parse of q
is a failure. Parser q
doesn't consumes any input.
Examples
module P = Reparse
;;
let p = P.(not_followed_by (char 'a') (char 'a')) in
let v = P.parse_string p "ab" in
v = 'a'
is_not p
parses value true
if p
fails to parse and false
otherwise. Note evaluating p
doesn't consume any input.
Examples
module P = Reparse
;;
let p = P.(is_not (char 'a')) in
let v = P.parse_string p "bbb" in
v = true
is p
parses true
if p
is successful, false
otherwise. Note evaluation of p
doesn't consume any input.
Examples
module P = Reparse
;;
let p = P.(is (char 'b')) in
let v = P.parse_string p "bcb" in
v = true
Text parsing.
val peek_char : char t
peek_char t
parses the next character from input without consuming it.
Examples
module P = Reparse
;;
let p = P.peek_char in
let v = P.parse_string p "hello" in
v = 'h'
Input is not consumed.
module P = Reparse
;;
let p = P.(peek_char *> offset) in
let v = P.parse_string p "hello" in
v = 0
val peek_string : int -> string t
peek_string n
parse a string of length n
without consuming it.
Examples
module P = Reparse
open P
;;
let p = P.peek_string 5 in
let v = P.parse_string p "hello" in
v = "hello"
Input is not consumed.
module P = Reparse
;;
let p = P.(peek_string 5 *> offset) in
let v = P.parse_string p "hello" in
v = 0
val next : char t
next
parses the next character from input. Fails if input has reached end of input.
Examples
module P = Reparse
;;
let v = P.(parse_string next "hello") in
v = 'h'
val char : char -> char t
char c
parses character c
exactly.
Examples
module P = Reparse
;;
let p = P.char 'h' in
let v = P.parse_string p "hello" in
v = 'h'
val char_if : (char -> bool) -> char t
char_if f
parses a character c
if f c
is true
.
Examples
module P = Reparse
;;
let p =
P.char_if (function
| 'a' -> true
| _ -> false)
in
let v = P.parse_string p "abc" in
v = 'a'
val string : ?case_sensitive:bool -> string -> string t
string ~case_sensitive s
parses a string s
exactly.
If case_sensitive
is false
then comparison is done without character case consideration. Default value is true
.
Examples
module P = Reparse
;;
let p = P.string "hello" in
let v = P.parse_string p "hello world" in
v = "hello"
val string_of_chars : char list -> string t
string_of_chars l
converts char list
l
to string
Examples
module P = Reparse
;;
let p = P.(take ~sep_by:space next >>= string_of_chars) in
let v = P.parse_string p "h e l l o" in
v = "hello"
val line : [ `LF | `CRLF ] -> string t
line c
parses a line of text from input.
Line delimiter c
can be either `LF
or `CRLF
. This corresponds to \n
or \r\n
character respectively.
Examples
module P = Reparse
;;
let p = P.line `CRLF in
let v = P.parse_string p "line1\r\nline2" in
v = "line1"
Parsers as defined in RFC 5234, Appendix B.1.
val alpha : char t
alpha
parses a character in range A- Z
or a-z
.
Examples
module P = Reparse
open P
;;
let p = P.(take alpha) in
let v = P.parse_string p "abcdABCD" in
v = [ 'a'; 'b'; 'c'; 'd'; 'A'; 'B'; 'C'; 'D' ]
val alpha_num : char t
alpha_num
parses a character in range A-Z
or a-z
or 0-9
.
Examples
module P = Reparse
open P
;;
let p = P.(take alpha_num) in
let v = P.parse_string p "ab123ABCD" in
v = [ 'a'; 'b'; '1'; '2'; '3'; 'A'; 'B'; 'C'; 'D' ]
val lower_alpha : char t
lower_alpha
parses a character in range a-z
.
Examples
module P = Reparse
open P
;;
let p = P.(take lower_alpha) in
let v = P.parse_string p "abcd" in
v = [ 'a'; 'b'; 'c'; 'd' ]
val upper_alpha : char t
upper_alpha
parses a character in range A-Z
.
Examples
module P = Reparse
open P
;;
let p = P.(take upper_alpha) in
let v = P.parse_string p "ABCD" in
v = [ 'A'; 'B'; 'C'; 'D' ]
val bit : char t
bit
parses a character which is either '0'
or '1'
.
Examples
module P = Reparse
;;
let p = P.(take bit) in
let v = P.parse_string p "0110 ab" in
v = [ '0'; '1'; '1'; '0' ]
val ascii_char : char t
ascii_char
parses any US-ASCII character.
Examples
module P = Reparse
;;
let p = P.(take ascii_char) in
let v = P.parse_string p "0110 abc '" in
v = [ '0'; '1'; '1'; '0'; ' '; 'a'; 'b'; 'c'; ' '; '\'' ]
val cr : char t
cr
parses character '\r'
.
Examples
module P = Reparse
;;
let v = P.(parse_string cr "\rab") in
v = '\r'
val crlf : string t
crlf
parses string "\r\n"
.
Examples
module P = Reparse
;;
let v = P.(parse_string crlf "\r\n abc") in
v = "\r\n"
val control : char t
control
parses characters in range 0x00 - 0x1F
or character 0x7F
.
Examples
module P = Reparse
;;
let v = P.(parse_string control "\x00") in
v = '\x00'
val digit : char t
digit
parses one of the digit characters, 0 .. 9
.
Examples
module P = Reparse
;;
let p = P.(take digit) in
let v = P.parse_string p "0123456789a" in
v = [ '0'; '1'; '2'; '3'; '4'; '5'; '6'; '7'; '8'; '9' ]
val digits : string t
digits
parses one or more digit characters, 0 .. 9
.
Examples
module P = Reparse
;;
let v = P.(parse_string digits "1234 +") in
v = "1234"
val dquote : char t
dquote
parses double quote character '"'
.
Examples
module P = Reparse
;;
let v = P.(parse_string dquote "\"hello ") in
v = '"'
val hex_digit : char t
hex_digit
parses any of the hexadecimal digits - 0..9, A, B, C, D, E, F
.
Examples
module P = Reparse
;;
let p = P.(take hex_digit) in
let v = P.parse_string p "0ABCDEFa" in
v = [ '0'; 'A'; 'B'; 'C'; 'D'; 'E'; 'F' ]
val htab : char t
htab
parses a horizontal tab character '\t'
.
Examples
module P = Reparse
;;
let v = P.(parse_string htab "\t") in
v = '\t'
val lf : char t
lf
parses a linefeed '\n'
character.
Examples
module P = Reparse
;;
let v = P.(parse_string lf "\n") in
v = '\n'
val octet : char t
octect
parses any character in the range \x00 - \xFF
. Synonym for Reparse.next
Examples
module P = Reparse
;;
let p = P.(take octet) in
let v = P.parse_string p "0110 abc '" in
v = [ '0'; '1'; '1'; '0'; ' '; 'a'; 'b'; 'c'; ' '; '\'' ]
val space : char t
space
parses a space character.
Examples
module P = Reparse
;;
let v = P.(parse_string space " abc '") in
v = ' '
val spaces : char list t
spaces
parses one or more spaces.
Examples
module P = Reparse
;;
let v = P.(parse_string spaces " abc") in
v = [ ' '; ' '; ' ' ]
val vchar : char t
vchar
parses any of the visible - printable - characters.
Examples
module P = Reparse
;;
let p = P.(take vchar) in
let v = P.parse_string p "0110abc\x00" in
v = [ '0'; '1'; '1'; '0'; 'a'; 'b'; 'c' ]
val whitespace : char t
whitespace
parses a space ' '
or horizontal tab '\t'
character.
Examples
module P = Reparse
;;
let p = P.(take whitespace) in
let v = P.parse_string p "\t \t " in
v = [ '\t'; ' '; '\t'; ' ' ]
t >>= f
returns a computation that sequences the computations represented by two monad elements. The resulting computation first does t
to yield a value v
, and then runs the computation returned by f v
.
t >>| f
is t >>= (fun a -> return (f a))
.
module Let_syntax : sig ... end