Library
Module
Module type
Parameter
Class
Class type
It is traditionnal to do parsing in two phases (scanning/parsing). This is not necessary with combinators in general (scannerless). This is still true with Pacomb. However, this makes the grammar more readable to use a lexing phase.
Moreover, lexing is often done with a longuest match rule that is not semantically equivalent to the semantics of context free grammar.
This modules provide combinator to create terminals that the parser will call.
type buf = Input.buffer
Position in a buffer is a Input.buffer
together with an index Input.pos
.
type idx = Input.idx
Type of terminal function, similar to blank, but with a returned value
type _ ast =
| Any : char ast
| Any_utf8 : Uchar.t ast
| Any_grapheme : string ast
| Eof : unit ast
| Char : char -> unit ast
| Grapheme : string -> unit ast
| String : string -> unit ast
| Nat : int ast
| Int : int ast
| Float : float ast
| CharLit : char ast
| StringLit : string ast
| Test : (char -> bool) -> char ast
| NotTest : (char -> bool) -> unit ast
| Seq : 'a t * 'b t * ('a -> 'b -> 'c) * 'c Assoc.key -> 'c ast
| Alt : 'a t * 'a t -> 'a ast
| Save : 'a t * (string -> 'a -> 'b) * 'b Assoc.key -> 'b ast
| Option : 'a * 'a t -> 'a ast
| Appl : 'a t * ('a -> 'b) * 'b Assoc.key -> 'b ast
| Star : 'a t * (unit -> 'b) * ('b -> 'a -> 'b) * 'b Assoc.key -> 'b ast
| Plus : 'a t * (unit -> 'b) * ('b -> 'a -> 'b) * 'b Assoc.key -> 'b ast
| Keyword : string * int -> unit ast
| Custom : 'a lexeme * 'a Assoc.key -> 'a ast
ast for terminals, needed for equality
and 'a terminal = {
n : string;
name
*)f : 'a lexeme;
the terminal itself
*)a : 'a ast;
c : Charset.t;
the set of characters accepted at the beginning of input
*)}
The previous types encapsulated in a record
and 'a t = 'a terminal
Abbreviation
exception when failing,
Combinator.give_up
Combinator.parse_buffer
that will give the most advanced positiongive_up ()
rejects parsing from a corresponding semantic action. An error message can be provided. Can be used both in the semantics of terminals and parsing rules.
val any : ?name:string -> unit -> char t
accept any character, except eof
val eof : ?name:string -> unit -> unit t
Terminal accepting the end of a buffer only. remark: eof
is automatically added at the end of a grammar by Combinator.parse_buffer
. name
default is "EOF"
val char : ?name:string -> char -> unit t
Terminal accepting a given char, remark: char '\255'
is equivalent to eof
. name
default is the given charater.
val test : ?name:string -> (char -> bool) -> char t
Accept any character for which the test returns true
. name
default to the result of Charset.show
.
Accept a character in the given charset. name
default as in test
val not_test : ?name:string -> (char -> bool) -> unit t
Reject the input (raises Noparse
) if the first character of the input passed the test. Does not read the character if the test fails. name
default to "^"
prepended to the result of Charset.show
.
Reject the input (raises Noparse
) if the first character of the input is in the charset. Does not read the character if not in the charset. name
default as in not_test
Compose two terminals in sequence. name
default is the concatenation of the two names.
save t f
save the part of the input parsed by the terminal t
and combine it with its semantics using f
alt t1 t2
parses the input with t1
or t2
. Contrary to grammars, terminals does not use continuations, if t1
succeds, no backtrack will be performed to try t2
. For instance,
seq1 (alt (char 'a' ())
(seq1 (char 'a' ()) (char 'b' ())))
(char 'b' ())
will reject "ab". If both t1
and t2
accept the input, longuest match is selected. name
default to sprintf "(%s)|(%s)" t1.n t2.n
.
option x t
parses the given terminal 0 or 1 time. x
is returned if 0. name
defaults to sprintf "(%s)?" t.n
.
Applies a function to the result of the given terminal. name
defaults to the terminal name.
star t a f
Repetition of a given terminal 0,1 or more times. The type of function to compose the action allows for 'b = Buffer.t
for efficiency. The returned value is f ( ... (f(f (a ()) x_1) x_2) ...) x_n
if t
returns x_1
... x_n
. name
defaults to sprintf "(%s)*" t.n
Same as above but parses at least once.
val string : ?name:string -> string -> unit t
string s
Accepts only the given string. Raises Invalid_argument
if s = ""
. name
defaults to sprintf "%S" s
.
val nat : ?name:string -> unit -> int t
Parses an natural in base 10. "-42"
and "-42"
are not accepted. name
defaults to "NAT"
val int : ?name:string -> unit -> int t
Parses an integer in base 10. "+42"
is accepted. name
defaults to "INT"
val float : ?name:string -> unit -> float t
Parses a float in base 10. ".1"
is accepted as "0.1"
name
defaults to "FLOAT"
val char_lit : ?name:string -> unit -> char t
Parses a char litteral 'c' using ocaml escaping convention name
defaults to "CHARLIT"
val string_lit : ?name:string -> unit -> string t
Parses a string litteral "cccc" using ocaml escaping convention name
defaults to "STRINGLIT"
utf8 c
parses a specific unicode char and returns ()
, name
defaults to the string representing the char
val any_grapheme : ?name:string -> unit -> string t
Parses any utf8 grapheme. name
defaults to "GRAPHEME"
val grapheme : ?name:string -> string -> unit t
grapheme s
parses the given utf8 grapheme and return ()
. The difference with string s x
is that if the input starts with a grapheme s'
such that s
is a strict prefix of s'
, parsing will fail. name
defaults to "GRAPHEME("^s^")"
val accept_empty : 'a t -> bool
Test wether a terminal accept the empty string. Such a terminal are illegal in a grammar, but may be used in combinator below to create terminals
Test constructor for the test constructor in Grammar
If you build custom lexeme, you need to use this to fill the a
field of the record