package fmlib_parse

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type

A lexer analyses a stream of characters and groups the stream of characters into tokens. It usually strips off whitespace. I.e. a lexer expects a stream of characters of the form

            WS Token WS Token ... WS Token WS EOS

WS is a possibly empty sequence of whitespace characters like blanks, tabs and newlines and comments. Token represents a legal token. EOS represents the end of the stream.

A lexer is in one of three states:

  • needs_more: The lexer needs more characters from the stream of characters in order to decide the next correct token or the end of input. The lexer is ready to receive more characters via put or to receive the end of input via put_end.
  • has_succeeded: The lexer has found a correct token or detected the end of input. In this state (except at the end of inpute) the lexer can be restarted to find the next token.
  • has_failed_syntax: The lexer has detected a character (or the end of intput) which cannot be part of a legal token.

In the state has_succeeded the lexer signals via has_consumed_end that the end of input has been reached.

A module conforming to the module type LEXER can be used in the module Parse_with_lexer to create a two stage parser where the lexer handles tokens and a combinator parser handles the higher level constructs.

A parser p is a sink of token. As long as it signals needs_more p more token can be pushed into the parser via put token p or the input stream can be ended via put_end p.

has_ended p is equivalent to not (needs_more p). has_ended p signals that the parser has either succeeded or failed.

If it has succeeded the final value is available via final p.

type t

Type of the parser.

Feeding Tokens

type token = Utf8.Decoder.t

Type of the tokens.

type item = token

In order to conform to the interface Fmlib_std.Interfaces.SINK.

val needs_more : t -> bool

needs_more p Does the parser p need more tokens?

val put : token -> t -> t

put tok p Push token tok into the parser p.

Even if the parser has ended, more tokens can be pushed into the parser. The parser stores the token as lookahead token.

If the parser has already received the end of the token stream via put_end, then all subsequent tokens are ignored.

val put_end : t -> t

put_end p Push and end token into the parser p.

Success

type final = Position.range * Token.t

Type of the final result.

val has_succeeded : t -> bool

has_succeeded p Has the parser p succeeded?

val has_ended : t -> bool

has_ended p Has the parser p ended parsing and either succeeded or failed?

has_ended p is the same as not (needs_more p)

val final : t -> final

final p The final object constructed by the parser p in case of success.

Precondition: has_succeeded p

Syntax Errors

type expect = string * Indent.expectation option

Type of expectations.

val has_failed_syntax : t -> bool

has_failed_syntax p Has the parser p failed with a syntax error?

val failed_expectations : t -> expect list

failed_expectations p The failed expectations due to a syntax error.

Precondition: has_failed_syntax p

Lookahead

val has_consumed_end : t -> bool

Has the lexer consumed the end of input?

Position

val position : t -> Position.t

Line and column number of the current position of the lexer.

Start

val start : t

The lexer for the first token.

Restart

A lexer does not consume the entire input stream. It just consumes characters until a token has been recognized. In case of the successful recognition of a token, it returns the token (see final). Then it can be restarted to recognize the next token.

val restart : t -> t

restart p

Next lexer, ready to recognize the next token of the input stream.

All lookaheads from the previous lexer are pushed onto the new lexer which starts a the position where the previous lexer finished.

Preconditions:

  • has_succeeded p
  • not (has_consumed_end p)
OCaml

Innovation. Community. Security.