package fmlib_parse
Install
    
    dune-project
 Dependency
Authors
Maintainers
Sources
sha256=987144e79a5ab8544a9cac669284ef7610a70c3362d4f55e5d27e4f33b49a1b9
    
    
  doc/fmlib_parse/Fmlib_parse/Ucharacter/Make_utf8/index.html
Module Ucharacter.Make_utf8
Parse an input stream consisting of unicode characters encoded in utf-8.
- State: User state.
- Final: Final result type of the parser.
- Semantic: Semantic error message (triggered by- fail error)
Parameters
module State : Fmlib_std.Interfaces.ANYmodule Final : Fmlib_std.Interfaces.ANYmodule Semantic : Fmlib_std.Interfaces.ANYSignature
Final Parser
module Parser : sig ... endGeneric Combinators
Basic Combinators
p >>= f
Parse first the input according to the combinator p. In case of success, feed the returned value of p into the function f to get the combinator to parse next.
let* x = p in f x is equivalent to p >>= f
The let* combinator let us express parsing sequences conveniently. Example:
    let* x = p in       (* parse [p], result [x] in case of success. *)
    let* y = q x in     (* parse [q x], result [y] ... *)
    let* z = r x y in   (* ... *)
    ...
    return f x y z ...The wildcard let* _ = ... can be used to ignore results of intermediate parsing steps.
map f p
Try combinator p. In case of success, map the returned value x to f x. In case of failure, do nothing.
map f p is equivalent to let* x = p in return (f x).
map_and_update f p
Try combinator p. In case of success, map the returned state state and value a to f state a. In case of failure, do nothing.
val succeed : 'a -> 'a tsucceed a
Succeed immediately without consuming token. Return object a as result.
val return : 'a -> 'a treturn a is equivalent to succeed a.
val unexpected : string -> 'a tunexpected expect triggers a syntax error signalling the expectation expect.
val clear_last_expectation : 'a -> 'a tclear_last_expectation p Clear last failed expectation.
This is useful e.g. after stripping whitespace. Since stripping whitespace means skip_one_or_more ws or skip_zero_or_more ws, after skipping whitespace the parser can still expect more whitespace. Therefore there is a failed expectation *whitespace* on the stack. However you rarely want this expectation to be reported.
val fail : Semantic.t -> 'a tfail error triggers a semantic error.
p </> q
Try first combinator p. In case of success or failure with consumed token, p </> q is equivalent to p.
If p fails without consuming token, then p </> q is equivalent to q.
choices p [q r t ...] is equivalent to p </> q </> r </> t </> ....
p <?> expect
Try combinator p. In case of success or failure with consumed token, p <?> expect is equivalent to p.
If p fails without consuming token, then the failed expectations are replaced with the failed expectation expect.
Usually p is a combinator implementing a choice between various alternatives of a grammar construct. The <?> combinator allows to replace the set of failed grammar alternatives with a higher abstraction of the failed expectation. E.g. instead of getting the failed expectations identifier, '(', -, ... we can get the failed expectation expression.
no_expectations p
Parse the combinator p.
- pfails:- no_expectations pfails with the same error.
- psucceeds without consuming tokens:- no_expectations psucceeds without any added expectations.
- psucceeds and consumes some token:- no_expectations psucceeds without any expectations.
Many combinators can succeed with expectations. E.g. the combinator optional p expects a p and succeeds if it does not encounter a construct described by p. All repetitive combinators like one_or_more try to consume as many items as possible. At the end they are still expecting an item.
This combinator allows to clear such unneeded expectations. It is particularly useful when removing whitespace. The expectation of whitespace is not a meaningful error message to the user.
State Combinators
get_and_update f Get the current user state and then update the user state. The returned value is the old state.
state_around before p after
If s0 is the initial state, then execute p with the start state before s0 and set the update the final state s1 by after s0 a s1 where a is the returned value in case of success and s1 is the final state after executing p.
Optional Elements
optional p
Try combinator p.
- Success: Return Some awhereais the returned value.
- Failure without consuming token: Return None
- Failure with consuming token: Remain in the error state.
Repetition
zero_or_more_fold_left start f p
Try the combinator p as often as possible. Accumulate the results to the start value start using the folding function f.
one_or_more_fold_left first f p
Try the combinator p at least once and then as often as possible. Put the first value returned by p into the function first returning a result and accumulate the subsequent values as often as possible and accumulate the results to the start value returned by first using the folding function f.
zero_or_more p Parse zero or more occurrences of p and return the collected result in a list.
zero_or_more p Parse one or more occurrences of p and return the collected results as a pair of the first value and a list of the remaining values.
skip_zero_or_more p Parse zero or more occurrences of p, ignore the result and return the number of occurrences.
skip_one_or_more p Parse one or more occurrences of p, ignore the result and return the number of occurrences.
val one_or_more_separated : 
  ('item -> 'r t) ->
  ('r -> 'sep -> 'item -> 'r t) ->
  'item t ->
  'sep t ->
  'r tone_or_more_separated first next p sep
Parse one or more occurrences of p separated by sep. Use first to convert the first occurrence of p into the result and use next to accumulate the results.
counted min max start next p
Collect between min and max numbers if elements recognized by the combinator p and accumulate them with the folding function next into the start value start.
Parenthesized expressions
val parenthesized : 
  ('lpar -> 'a -> 'rpar -> 'b t) ->
  'lpar t ->
  (unit -> 'a t) ->
  ('lpar -> 'rpar t) ->
  'b tparenthesized make lpar p rpar
Parse an expression recognized by the combinator p enclosed within parentheses. lpar recognizes the left parenthesis and rpar recognizes the right parenthesis. The value returned by lpar is given to rpar. With that mechanism it is possible to recognize matching parentheses of different kinds.
After successful parsing the function make is called with the final value (and the parentheses).
The combinator p is entered as a thunk in order to be able to call it recursively. In the combinator parenthesized the combinator p is called only guardedly. Therefore the combinator p can contain nested parenthesized expressions.
Precondition: The combinator lpar has to consume at least one token in case of success.
Operator expressions
val operator_expression : 
  'exp t ->
  'op t option ->
  'op t ->
  ('op -> 'op -> bool t) ->
  ('op -> 'exp -> 'exp t) ->
  ('exp -> 'op -> 'exp -> 'exp t) ->
  'exp t    operator_expression
        primary         (* Parse a primary expression *)
        unary_operator  (* Parse a unary operator *)
        binary_operator (* Parse a binary operator *)
        is_left         (* Is the left operator binding stronger? *)
        make_unary      (* Make a unary expression from the operator and
                           its operand *)
        make_binary     (* Make a binary expression from the operator
                           and its operands *)Parse an operator expression by using the following combinators:
- is_left o1 o2decides, if the operator- o1on the left has more binding power than the operator- o2. I.e. if the unary operator- uhas more binding power than the binary operator- o, then- u a o bis parsed as- (u a) o b. If the binary operator- o1has more binding power than the binary operator- o2, then- a o1 b o2 bis parsed as- (a o1 b) o2 c.
- make_unary u amakes the unary expression- (u a).
- make_binary a o bmakes the binary expression- (a o b).
- primaryparses a primary expression.
- unary_operatorparses a unary operator.
- binary_operatorparses a binary operator.
Precondition: primary, unary_operator and binary_operator have to consume at least one token in case of success. Otherwise infinite recursion can happen.
Backtracking
backtrack p expect
Try the combinator p. In case of failure with consuming token, push the consumed token back to the lookahead and let it fail without consuming token. Use expect to record the failed expectation.
Backtracking reduces the performance, because the token pushed back to the lookahead have to be parsed again. Try to avoid backtracking whenever possible.
followed_by p expect
Parses p and backtracks (i.e. all tokens of p will be pushed back to the lookahead). In case p succeeds, the followed_by parser succeeds without consuming token. Otherwise it fails without consuming tokens.
not_followed_by p expect
Parses p and backtracks (i.e. all tokens of p will be pushed back to the lookahead). In case p succeeds, the not_followed_by parser fails without consuming token. Otherwise it succeeds without consuming tokens.
followed_by and not_followed_by can be used to peek into the token stream without consuming token.
Location Combinators
located p Parse p and return its result with its start and end position.
Note: If p removes whitespace at the end, the returned end position is at the end of the whitespace. This is not what you usually want. Therefore first parse the essential part located and then remove the whitespace.
val position : Position.t tThe current position in the file.
Indentation Combinators
The indentation of a normal construct is the indentation of its leftmost token. The indentation of a vertically aligned construct is the indentation of its first token.
indent i p Indent p by i columns relative to its parent.
Precondition: 0 <= i
The indentation of p is defined by the indentation of its first token. The first token has to be indented at least i columns relative to the parent of p. After the first token of p has been parsed successfully, all subsequent tokens must have at least the same indentation.
Note: Indentation of p relative to its parent only makes sense, if the first token of p is not the first token of its parent! I.e. the parent of p should have consumed at least one token before the parsing of p starts.
CAUTION WITH ALIGNMENT !!
If you want to align a certain number of constructs vertically it is mandatory to indent the whole block of constructs. Do not indent the individual items to be aligned. Indent the whole block.
Reason: The parent of the block usually has already consumed some token and the indentation of a construct is the position of the leftmost token. If you don't indent the aligned block, then it will be aligned with the leftmost token of the parent construct. This is usually not intended and a common pitfall. Any indentation e.g. zero indentation is ok.
align p
Use the start position of the first token of p to align it with other constructs. If p does not consume any token, then align p has no effect.
Alignment makes sense if there are at least two combinators which are aligned and indented. E.g. suppose there are two combinators p and q. Then we can form
indent 1 (
        let* a = align p in
        let* b = align q in
        return (a,b)
)This combinator parses p whose first token has to be indented at least one column relative to its parent. And then it parses q whose first token must be aligned with the first token of p.
The indentation decouples the alignment of p and q with other aligned siblings or parents. indent 0 ... can be used to make the indentation optional.
left_align p
Align a construct described by p at its leftmost possible column. If a whole block of constructs have to be vertically left aligned, then it is important that at least the first construct is left aligned. The subsequent constructs will be aligned exactly vertically. For the subsequent constructs left_align has the same effect as align.
detach p Parse p without any indentation and alignment restrictions.
Detachment is needed to parse whitespace. The whitespace at the beginning of a line never satisfies any nontrivial indentation or aligment requirements.
End of Input
val expect_end : 'a -> 'a texpect_end a Expect the end of token stream.
In case of success return a.
In case of failure return the syntax error with the expectation "end of input".
CAUTION: There is usually no need to use this combinator! This combinator is needed only for partial parsers.
Never ever backtrack over this combinator.
Lexer Support
val lexer : 'a t -> 'tok -> 'tok t -> (Position.range * 'tok) tlexer whitespace end_token tok
A lexer combinator.
- The whitespacecombinator recognizes a possibly empty sequence of whitespace (usually blanks, tabs, newlines, comments, ...).
- end_tokenis a token which the lexer returns when it has successfully consumed the end of input.
- tokis a combinator recognizing tokens (usually- tok1 </> tok2 </> ... </> tokn).
The lexer combinator recognizes tokens in an input stream of the form
WS Token WS Token .... WS EOF
Note: If a combinator fails to recognize a token and having consumed some input, then the subsequent combinators are not used anymore as alternatives. Therefore if there are tokens which can begin with the same prefix, then it is necessary to make the recognition of the common prefixes backtrackable in all but the last combinator recognizing a token with the same prefix. The same applies to whitespace if part of the whitespace can begin like a token.
Examples:
- comment: "// ...."
- division operator: "/"
In this case the recognition at least of the first slash of the comment has to be backtrackable.
Character Combinators
val charp : (char -> bool) -> string -> char tcharp p expect Parse a character which satisfies the predicate p.
In case of failure, report the failed expectation expect.
val range : char -> char -> char trange c1 c2 Parses a character in the range between c1 and c2, i.e. a character c which satisfies c1 <= c && c <= c2.
val char : char -> char tchar c Parse the character c.
val one_of_chars : string -> string -> char tone_of_chars str expect
Parse one of the characters in the string str. In case of failure, report the failed expectation expect.
val string : string -> string tstring str Parse the string str.
val uppercase_letter : char tParse an uppercase letter.
val lowercase_letter : char tParse a lowercase letter.
val letter : char tParse a letter.
val digit_char : char tParse a digit 0..9 and return it as character.
val digit : int tParse a digit and return it as number.
val word : (char -> bool) -> (char -> bool) -> string -> string tword first inner error
Parse a word which starts with a character satisfying the predicate first followed by zero or more characters satisfying the predicate inner. In case of failure add the expectation error.
val hex_uppercase : int tEquivalent to range 'A' 'F' and then converted to the corresponding number between 10 and 15.
val hex_lowercase : int tEquivalent to range 'a' 'f' and then converted to the corresponding number between 10 and 15.
val hex_digit : int tParse a hexadecimal digit and return the corresponding number between 0 and 15.
Unicode Combinators
ucharp p error
Parse a unicode character which satisfies the predicate p. If the next character does not satisfy the predicate, then use the string error to express the failed expectation.
urange uc1 uc2
Parse a unicode character whose scalar value is in the range between the scalar values of uc1 and uc2 including the boundaries.
uword first inner error
Parse a sequence of unicode characters whose first character satisfies the predicate first and all subsequence characters satisfy the predicate inner. If no such word is encountered then use the string error to express the expectation.
Make the Final Parser
make state c
Make a parser which starts in state state and parses a construct defined by the combinator c. The token stream must be ended by put_end, otherwise the parse won't succeed.
CAUTION: c must not be a combinator containing expect_end. Moreover it must not have been constructed by lexer.
val make_partial : Position.t -> State.t -> Final.t t -> Parser.tmake_partial pos state c
Make parser which analyzes a part of the input stream. The parser starts at position pos in state state and parses a construct defined by the combinator c. The parser can succeed even if no end token has been pushed into the parser.