package pacomb

  1. Overview
  2. Docs

Module Pacomb.GrammarSource

Main module of Pacomb

Type

Sourcetype 'a grammar

type of a grammar with semantical action of type 'a .

Sourcetype 'a t = 'a grammar

An abbreviation

Sourcetype name_kind =
  1. | Created
  2. | Inherited
  3. | Given
Sourcetype name = string * name_kind

Grammar contructors

All construœqctors can be given an optional name argument that is used when printing the grammar.

Sourceval print_grammar : ?no_other:bool -> ?def:bool -> out_channel -> 'a grammar -> unit

print_grammar ch g prints the grammar g of the given output channel. if def=false (the default is true) it will print the transformed grammar prior to compilation.

Sourceval fail : ?name:string -> unit -> 'a grammar

fail () is a grammar that parses nothing (always fails)

Sourceval error : ?name:string -> string list -> 'a grammar

fails reporting an error

Sourceval empty : ?name:string -> 'a -> 'a grammar

empty a accepts the empty input and returns a

Sourceval cond : ?name:string -> bool -> unit grammar

cond b is if b then empty () else fail (). Very usefull in grammar family at the beginning of a rule. The test is done at grammar construction, not at parsing time (except if it is used in a dependant grammar).

Sourceval term : ?name:string -> 'a Lex.terminal -> 'a grammar

term t accepts the terminal t and returns its semantics. See module Lex

Sourceval appl : ?name:string -> 'a grammar -> ('a -> 'b) -> 'b grammar

appl g f parses with g and apply f to the resulting semantics

Sourceval unmerge : ?name:string -> 'a list grammar -> 'a grammar

unmerge g introduce multiple parse branch from a list of semantics. Allows to create ambiguous terminals and allows fr unmerge to continue a dependent parsing

Sourceval lazy_ : ?name:string -> 'a grammar -> 'a lazy_t grammar
Sourceval force : ?name:string -> 'a lazy_t grammar -> 'a grammar
Sourceval alt : ?name:string -> 'a grammar list -> 'a grammar

alt [g1;g2;...;gn] parses with g1 and if it fails then g2 and so on

Sourceval seq : ?name:string -> 'a grammar -> ('a -> 'b) grammar -> 'b grammar

seq g1 g2 parses with g1 and then with g2 for the rest of the input, combine both semantics by apply the semantics of g2 to g1

Sourceval iseq : ?name:string -> 'a grammar -> 'b grammar -> 'b grammar

variation of the abover when we do not use all semantics, it allows cost less right recursion with no semantics

Sourceval dseq : ?name:string -> ('a * 'b) grammar -> ('a -> ('b -> 'c) grammar) -> 'c grammar

dseq g1 g2) is a dependant sequence, the grammar g2 used after g1 may depend upon the semantics of g1. This is not very efficient as the grammar g2 must be compiled at parsing time. g2 is memoized by default to partially overcome this fact.

Sourceval diseq : ?name:string -> 'a grammar -> ('a -> 'b grammar) -> 'b grammar

variation of the abover when we do not use all semantics, it allows cost less right recursion with no semantics

Sourceval lpos : ?name:string -> (Pos.spos -> 'a) grammar -> 'a grammar

lpos g is identical to g but passes the position just before parsing with g to the semantical action of g

Sourceval rpos : ?name:string -> (Pos.spos -> 'a) grammar -> 'a grammar

rpos g is identical to g but passes the position just after parsing with g to the semantical action of g

Sourceval mk_pos : ?name:string -> (Pos.pos -> 'a) grammar -> 'a grammar
Sourceval seq_pos : ?name:string -> 'a grammar -> ((Pos.pos * 'a) -> 'b) grammar -> 'b grammar

variants of seq with the position of the first iterm

Sourceval dseq_pos : ?name:string -> ('a * 'b) grammar -> ('a -> ((Pos.pos * 'b) -> 'c) grammar) -> 'c grammar

variants of dseq with the position of the first iterm

Sourceval cache : ?name:string -> ?merge:('a -> 'a -> 'a) -> 'a grammar -> 'a grammar

cache g avoids to parse twice the same input with g by memoizing the result of the first parsing. The optional merge parameter is applied to group semantics corresponding to the same part of the input. Using cache with merge allows to recover a polynomial time complexity (cubic at worst) and a quadratic space (in the size of the input)

Sourceval test_before : ?name:string -> (Lex.buf -> Lex.idx -> Lex.buf -> Lex.idx -> bool) -> 'a grammar -> 'a grammar

allows to perform a test, the test function receive the position before and after the blanks

Sourceval test_after : ?name:string -> ('a -> Lex.buf -> Lex.idx -> Lex.buf -> Lex.idx -> bool) -> 'a grammar -> 'a grammar
Sourceval no_blank_before : ?name:string -> 'a grammar -> 'a grammar

particular cases of the above testing the absence of blanks.

Sourceval no_blank_after : ?name:string -> 'a grammar -> 'a grammar
Sourceval layout : ?name:string -> ?config:Blank.layout_config -> Blank.t -> 'a grammar -> 'a grammar

layout b g changes the blank function to parse the input with the grammar g. The optional parameters allow to control which blanks are used at the boundary. Both can be used in which case the new blanks are used second before parsing with g and first after.

Sourceval option : ?name:string -> 'a grammar -> 'a option grammar

usual option/star/plus combinator

Sourceval default_option : ?name:string -> 'a -> 'a grammar -> 'a grammar
Sourceval star : ?name:string -> 'a grammar -> 'a list grammar
Sourceval plus : ?name:string -> 'a grammar -> 'a list grammar
Sourceval star_sep : ?name:string -> 'b grammar -> 'a grammar -> 'a list grammar
Sourceval plus_sep : ?name:string -> 'b grammar -> 'a grammar -> 'a list grammar

Definition of recursive grammars

Sourceval declare_grammar : string -> 'a grammar

to define recursive grammars, one may declare the grammar first and then gives its value. declare_grammar name creates an undefined grammar with the given name

Sourceval set_grammar : 'a grammar -> 'a grammar -> unit

set_grammar g1 g2 set the value of g1 declared with declare_grammar. will raise Invalid_argument if g1 was not defined using declare_grammar or if it was already set.

Sourceval fixpoint : ?name:string -> ('a grammar -> 'a grammar) -> 'a grammar

fixpoint g compute the fixpoint of g, that is a grammar g0 such that g0 = g g0

Sourceval grammar_family : ?param_to_string:('a -> string) -> string -> ('a -> 'b grammar) * (('a -> 'b grammar) -> unit)

grammar_family to_str name returns a pair (gs, set_gs), where gs is a finite family of grammars parametrized by a value of type 'a. A name name is to be provided for the family, and an optional function to_str can be provided to print the parameter and display better error messages.

(* Declare the grammar family *)
let (gr, set_gr) = grammar_family to_str name in

... code using grammars of gr to define mutually recursive grammars ...

(* Define the grammar family *)
let _ = set_gr the_grammars

... now the new family can be used ...

Compilation of a grammar and various

Sourceval compile : 'a grammar -> 'a Pacomb.Comb.t

compile g produces a combinator that can be used to actually do the parsing see the Comb module

Sourceval grammar_name : 'a grammar -> string

gives the grammar name

Sourceval give_name : string -> 'a grammar -> 'a grammar

allows to rename a grammar

Sourceval parse_buffer : 'a grammar -> Blank.t -> ?offset:Lex.idx -> Lex.buf -> 'a

Parse a whole input buffer. the eof combinator is added at the end of the given combinator

Sourceval partial_parse_buffer : 'a grammar -> Blank.t -> ?blank_after:bool -> ?offset:Lex.idx -> Lex.buf -> 'a * Lex.buf * Lex.idx

Partial parsing. Beware, the returned position is not the maximum position that can be reached by the grammar if the grammar is ambiguous. In this case, a message is printed on stderr. The charset is the character accepted at the end of input. Mainly useful with 'eof' when blank_after is true.

Sourceval parse_all_buffer : 'a grammar -> Blank.t -> ?offset:Lex.idx -> Lex.buf -> 'a list

Returns all possible parse trees. Usefull for natural languages but also to debug ambiguity in a supposed non ambiguous grammar.

Sourceval parse_string : ?utf8:Utf8.context -> 'a grammar -> Blank.t -> string -> 'a

Parse a whole string, reporting position according to utf8 if optional argument utf8 is given and Utf8.UTF8 or Utf8.CJK_UTF8

Sourceval parse_channel : ?utf8:Utf8.context -> ?filename:string -> 'a grammar -> Blank.t -> in_channel -> 'a

Parse a whole input channel, reporting postiion according to utf8. After closing the file position reporting by parsing cannot be transformed bash to line/column number.

Sourceval parse_fd : ?utf8:Utf8.context -> ?filename:string -> 'a grammar -> Blank.t -> Unix.file_descr -> 'a

Parse a whole Unix.file_desc, reporting postiion according to utf8. After closing the file position reporting by parsing cannot be transformed bash to line/column number.

Sourceval parse_file : ?utf8:Utf8.context -> 'a grammar -> Blank.t -> string -> 'a

Parse a whole file, reporting postiion according to utf8. File is reopen to read position. So the file should not change on disk

OCaml

Innovation. Community. Security.