package fmlib_parse

  1. Overview
  2. Docs

Generate a parser with a utf8 lexer and a token parser.

The generated parser parses a stream of unicode characters encoded in utf-8. The lexer is used to convert the stream of characters into a stream of tokens of type Position.range * Token.t which are fed into the token parser.

The type of tokens is utf-8 decoded unicode characters.

type token = Utf8.Decoder.t

Type of syntax expectations:

type expect = string * Indent.expectation option

Parameters

module State : ANY
module Token : ANY
module Final : ANY
module Semantic : ANY
module Parse : Interfaces.FULL_PARSER with type state = State.t and type token = Position.range * Token.t and type expect = string * Indent.expectation option and type final = Final.t and type semantic = Semantic.t

Signature

A parser p is a sink of token. As long as it signals needs_more p more token can be pushed into the parser via put token p or the input stream can be ended via put_end p.

has_ended p is equivalent to not (needs_more p). has_ended p signals that the parser has either succeeded or failed.

If it has succeeded the final value is available via final p.

type t

Type of the parser.

Feeding Tokens

type token = Utf8.Decoder.t

Type of the tokens.

type item = token

In order to conform to the interface Fmlib_std.Interfaces.SINK.

val needs_more : t -> bool

needs_more p Does the parser p need more tokens?

val put : token -> t -> t

put tok p Push token tok into the parser p.

Even if the parser has ended, more tokens can be pushed into the parser. The parser stores the token as lookahead token.

If the parser has already received the end of the token stream via put_end, then all subsequent tokens are ignored.

val put_end : t -> t

put_end p Push and end token into the parser p.

Success

type final = Final.t

Type of the final result.

val has_succeeded : t -> bool

has_succeeded p Has the parser p succeeded?

val has_ended : t -> bool

has_ended p Has the parser p ended parsing and either succeeded or failed?

has_ended p is the same as not (needs_more p)

val has_consumed_end : t -> bool

Has the parser consumed the end of input?

val final : t -> final

final p The final object constructed by the parser p in case of success.

Precondition: has_succeeded p

Syntax Errors

type expect = string * Indent.expectation option

Type of expectations.

val has_failed_syntax : t -> bool

has_failed_syntax p Has the parser p failed with a syntax error?

val failed_expectations : t -> expect list

failed_expectations p The failed expectations due to a syntax error.

Precondition: has_failed_syntax p

Semantic Errors

type semantic = Semantic.t

Type of semantic errors.

val has_failed_semantic : t -> bool

Has the parser failed because of a semantic error?

val failed_semantic : t -> semantic

The semantic error encountered.

Precondition: A semantic error has occurred.

State

type state = State.t

Type of the state of the parser (in many cases unit)

val state : t -> state

The state of the parser.

Lexer and Parser

val make : Lex.t -> Parse.t -> t

make lex parse Make the parser from a lexer and a parser.

val lex : t -> Lex.t

The lexer part of the parser.

val parse : t -> Parse.t

The parser part of the parser.

Partial Parser

If the input stream shall be parsed in parts, then a parser with a lexer can be used for partial parsing as well.

Note that the lexer must be partial, because it succeeds after successfully parsing a lexical token from the input stream and is restarted afterwards. The restart of the lexer transfers the lookahead from the previous lexer to the next lexer.

A parser with a lexer becomes partial, if the token parser is partial. As user of this module you have to transfer only the lookahead buffer from the old token parser to the next token parser.

If the old and the new token parser have the same type, then the function make_next can be used to transfer the lookahead buffer.

If the old and the new token parser have different types then the following will do the job. Assume that TP1.t and TP2.t are the types of the old and new token parser, P1.t and P2.t are the types of the corresponding parsers with lexers and tp2: TP2.t is the new token parser

assert (P1.has_succeeded p1);
assert (not (P1. has_consumed_end p1));
let lex = P1.lex p1
and tp1 = P1.parse p1
in
let tp2 = TP2.fold_lookahead tp2 TP2.put TP2.put_end tp1 in
let p2  = P2.make lex tp2 in
...

Note that as described in the chapter Partial Parsing the parser p2 might have used the lookaheads of p1 to either succeed or fail. You can continue parsing the input stream only of this is not yet the case. Otherwise you might need a new subsequent token parser to continue to parse the remaining input stream.

val make_next : t -> Parse.t -> t

make_next p tp

This function assumes that p has been made with a partial token parser and has already successfully consumed a part of the input stream and tp is the token parser which shall be used to parse the next part of the input stream.

Since the token parser contained in p might have unconsumed lookahead tokens, these tokens must be transferred to the new token parser tp.

The call make_next p tp makes a new parser with lexer using the old lexer and the new token parser tp with all the lookaheads transferred to it.

Position

val position : t -> Position.t

The current position in the input.

val range : t -> Position.range

The current range in the input; usually the range of the first lookahead token. In case of a syntax error this is the unexpected token i.e. the token which caused the syntax error.

Run the Parser

val run_on_string : string -> t -> t

run_on_string str p Run the parser p on the string str.

val run_on_string_at : int -> string -> t -> int * t

run_on_string str start p Run the parser p on the string str starting at index start Return the parser and the index next to be pushed in.

val run_on_channel : in_channel -> t -> t

run_on_channel ch p Run the parser p on the channel ch.

OCaml

Innovation. Community. Security.