package fmlib_parse

  1. Overview
  2. Docs

Partial Parsing

Up

Overview

A partial parser is a parser which parses a part of the input stream and not the whole input stream until the end of input is reached. Some examples where a partial parser is useful:

  • Convert a character input stream into a stream of tokens i.e. a lexer: A lexer usually strips off whitespace, recognizes one token (an identifier, an operator, a semicolon, ...) and in case of success returns the token even if there are more tokens to come.
  • The language to be parsed has some constructs which require to parse another file before resume parsing (e.g. the C's famous #include file or Ocaml's open. The reading of another file is I/O which cannot be done within a parsing combinator.
  • REPLs (read evaluate print loops): REPLs usually parse a construct and evaluate it immediately before parsing the next construct. The evaluation might need to do I/O which is not possible within a parsing combinator.

The input stream for partial parsing might look like

    A B C EOS

where A, B, C are certain grammatical constructs and EOS marks the end of input. In that case we split the input stream into three different parts. The parsers parsing the constructs A and B have to be partial parsers. They can terminate successfully without consuming the end of input. Then parser for C is a complete parser, it terminates successfully only if followed by the end of input.

Furthermore it is possible that the input stream looks like

    A A A ... EOS

i.e. the stream consist of zero or more (or one or more) grammatical constructs A.

In order to parse A* i.e. zero ore more repetitions of the construct A we need a partial parser which parses either an A or the end of input.

In order to parse A+ i.e. one or more repetitions of the construct A we need a partial parser which parses an A with an optional end of input at its end.

The structure of the stream suited for partial parsing can have arbitrary loops.

    A B+ C* D* EOS

But the structure must be linear. It is not possible to nest partial parsers.

Combinators suited for Partial Parsing

Complete Parsing

Usually you have a combinator final: Final.t which parses a grammatical construct A. If the structure of the stream is A EOS i.e. the structure A is ended by the end of input, then you make the final parser by

make start_state final

The function make converts a combinator and adds to it the expectation of the end of input after the construct. Therefore the parser cannot succeed unless the construct A fills the complete input stream to the end.

Partial Parsing

In order to make partial parser there is the function

make_partial start_state final

This function makes a partial parser parsing a grammatical construct A without expecting the end of input after A.

The function make is implemented by the function make_partial in the following way:

let make (state: State.t) (final: Final.t t): Parser.t =
    make_partial state (final >>= expect_end)

A Loop of One or More

Now suppose we want to write a parser which parse an input stream of the form

    A+ EOS

and we have a combinator final: Final.t t which parses a construct A. We construct a partial parser by

make_partial
    state
    (
        let* a = final in
        expect_end a
        </>
        return a
    )

This parser parse a construct A followed by an optional end of input. In both cases (with or without end of input) the parser can succeed and return an object of type Final.t. The function has_consumed_end can be used to decide if the parsing has ended of if there is a next partial parser needed to parse the remainder of the input stream.

A Loop Of Zero or More

Now let the input stream look like

    A* EOS

and let final: Final.t t be the combinator which parses the construct A.

In that case we need an object eos: Final.t which the parser returns if the end of input has been reached. We can construct the parser by

make_partial
    state
    (final </> expect_end eos)

This parser in case of success has either parsed successfully a construct A or has reached the end of input.

The termination of the loop can be detected either by looking at has_consumed_end or by comparing the final result of the parser with eos.

Pasting together Partial Parsers

Same Types

Let's assume our input stream has the structure

    A B EOS

and pa:P.t is a parser parsing A without consuming the end of input. Furthermore assume that pa has finished successfully without consuming the end of input.

assert (P.has_succeeded pa);
assert (not (P.has_consumed_end pa));

At this point we can extract the construct representing A and the current state

let state = P.state pa
and a     = P.final pa

Now we can do anything we want (make I/O etc.) using this information.

After that we want to finish parsing with the next parser. For that we need a combinator recognizing the construct B, say b_combi, and a new state (might be the same as state) and make a complete parser (the parser has to consume the end of input) by

let pb = make new_state b_combi

This is not yet enough, because the parser pa might have loaded some lookahead which is part of the construct B, clearly without consuming it. We have to transfer the lookahead of pa to pb by

let pb = P.transfer_lookahead pa pb

In extreme cases this might put the parser pb into a success state. You have to check that before continue with the parsing. If this is not yet the case i.e. the parser pb needs more, then you can continue the parsing.

Different Types

The description in the previous section only works, if the parsers pa and pb have the same type. In this section we assume that the types are different pa: PA.t and pb: PB.t. In order to transfer the lookaheads from pa to pb there is a generic function fold_lookahead which can be called by

let pb = PA.fold_lookahead pb PB.put PB.put_end pa

Then, as above, check if pb has_succeeded or failed. If it needs more tokens the parsing can continue.

Parsing with Lexers

For two stage parsers i.e. parsers with lexers the pasting together of subsequent parser works in principle in the same way. The user of the library only has to paste the token parser together. For details see the documentation Fmlib_parse.Parse_with_lexer.Make

Up

OCaml

Innovation. Community. Security.