Library
Module
Module type
Parameter
Class
Class type
A lexer analyses a stream of characters and groups the stream of characters into tokens. It usually strips off whitespace. I.e. a lexer expects a stream of characters of the form
WS Token WS Token ... WS Token WS EOS
WS
is a possibly empty sequence of whitespace characters like blanks, tabs and newlines and comments. Token
represents a legal token. EOS
represents the end of the stream.
A lexer is in one of three states:
needs_more
: The lexer needs more characters from the stream of characters in order to decide the next correct token or the end of input. The lexer is ready to receive more characters via put
or to receive the end of input via put_end
.has_succeeded
: The lexer has found a correct token or detected the end of input. In this state (except at the end of inpute) the lexer can be restarted to find the next token.has_failed_syntax
: The lexer has detected a character (or the end of intput) which cannot be part of a legal token.In the state has_succeeded
the lexer signals via has_consumed_end
that the end of input has been reached.
A module conforming to the module type LEXER
can be used in the module Parse_with_lexer
to create a two stage parser where the lexer handles tokens and a combinator parser handles the higher level constructs.
A parser p
is a sink of token. As long as it signals needs_more p
more token can be pushed into the parser via put token p
or the input stream can be ended via put_end p
.
has_ended p
is equivalent to not (needs_more p)
. has_ended p
signals that the parser has either succeeded or failed.
If it has succeeded the final value is available via final p
.
type item = token
In order to conform to the interface Fmlib_std.Interfaces.SINK
.
val needs_more : t -> bool
needs_more p
Does the parser p
need more tokens?
put tok p
Push token tok
into the parser p
.
Even if the parser has ended, more tokens can be pushed into the parser. The parser stores the token as lookahead token.
If the parser has already received the end of the token stream via put_end
, then all subsequent tokens are ignored.
type final = Position.range * Token.t
Type of the final result.
val has_succeeded : t -> bool
has_succeeded p
Has the parser p
succeeded?
val has_ended : t -> bool
has_ended p
Has the parser p
ended parsing and either succeeded or failed?
has_ended p
is the same as not (needs_more p)
final p
The final object constructed by the parser p
in case of success.
Precondition: has_succeeded p
type expect = string * Indent.expectation option
Type of expectations.
val has_failed_syntax : t -> bool
has_failed_syntax p
Has the parser p
failed with a syntax error?
failed_expectations p
The failed expectations due to a syntax error.
Precondition: has_failed_syntax p
val has_consumed_end : t -> bool
Has the lexer consumed the end of input?
val position : t -> Position.t
Line and column number of the current position of the lexer.
val start : t
The lexer for the first token.
A lexer does not consume the entire input stream. It just consumes characters until a token has been recognized. In case of the successful recognition of a token, it returns the token (see final
). Then it can be restarted to recognize the next token.