package fmlib_parse
Install
dune-project
Dependency
Authors
Maintainers
Sources
sha256=650393b6315075780d51cc698e2ee19bc359f114fc39365fbe137b24f663189e
doc/fmlib_parse/Fmlib_parse/index.html
Module Fmlib_parse
Parsing Library
Documentation
Introduction to Combinator Parsing
Utilities
module Position : sig ... endRepresent a position in a text file.
module Located : sig ... endA parsing construct located within a file.
module Indent : sig ... endThe allowed indentations: Helper module for indentation sensitive parsing.
module Error_reporter : sig ... endConvenience module to generate readable error messages.
module Interfaces : sig ... endModule types
Parsers
Parse streams of characters
Character parsers are the simplest parsers. The tokens are characters. In order to generate a character parser you just need 3 modules. A State module which in many cases is just Unit, a module Final to describe the type of the construct which the parser returns after successful parsing and a module Semantic which describes the semantic errors (the parser itself handles just syntax errors).
module Character : sig ... endCharacter Parser: An indentation sensitive parser which parses streams of characters i.e. the token type is char.
Unicode Parsers
module Ucharacter : sig ... endParser for streams of unicode characters.
module Utf8 : sig ... endEncoder and Decoder for Unicode Characters encoded in UTF-8.
module Utf16 : sig ... endEncoders and Decoders for Unicode Characters encoded in UTF-16.
Parsing with lexers
Sometimes pure character parser are not very efficient if a lot of backtracking is necessary (and for many languages backtracking is necessary). Backtracking causes all characters of a failed construct to be pushed back into the lookahead and rescanning all characters for a different construct.
For these cases the library offers parsers with 2 layers. A lexer and a token parser. The lexer parses the lexical tokens. A lexer usually needs no or very little backtracking. The token parser receives the already parsed tokens where each token is a unit consisting of all parsed characters. In case of backtracking the token parser just pushes back the whole tokens (not character by character) into the lookahead and reparses the whole tokens (again not character by character).
module Token_parser : sig ... endToken Parser: A parser which parses streams of user supplied tokens.
module Parse_with_lexer : sig ... endA parser which works with two components: A lexer which splits up the input into a sequence of tokens and parser which parses the tokens.
Full generic parser
All parsers of the library are based on this generic parser. The user usually does not write a generic parser.
module Generic : sig ... endA Generic Parser where all parameters are customizable.