package diffast-langs-python-parsing

  1. Overview
  2. Docs

Module Make.MenhirInterpreterSource

include MenhirLib.IncrementalEngine.INCREMENTAL_ENGINE with type token = token
Sourcetype token = token
Sourcetype production

A value of type production is (an index for) a production. The start productions (which do not exist in an .mly file, but are constructed by Menhir internally) are not part of this type.

Sourcetype 'a env

A value of type 'a env represents a configuration of the automaton: current state, stack, lookahead token, etc. The parameter 'a is the type of the semantic value that will eventually be produced if the parser succeeds.

In normal operation, the parser works with checkpoints: see the functions offer and resume. However, it is also possible to work directly with environments (see the functions pop, force_reduction, and feed) and to reconstruct a checkpoint out of an environment (see input_needed). This is considered advanced functionality; its purpose is to allow error recovery strategies to be programmed by the user.

Sourcetype 'a checkpoint = private
  1. | InputNeeded of 'a env
  2. | Shifting of 'a env * 'a env * bool
  3. | AboutToReduce of 'a env * production
  4. | HandlingError of 'a env
  5. | Accepted of 'a
  6. | Rejected

The type 'a checkpoint represents an intermediate or final state of the parser. An intermediate checkpoint is a suspension: it records the parser's current state, and allows parsing to be resumed. The parameter 'a is the type of the semantic value that will eventually be produced if the parser succeeds.

Accepted and Rejected are final checkpoints. Accepted carries a semantic value.

InputNeeded is an intermediate checkpoint. It means that the parser wishes to read one token before continuing.

Shifting is an intermediate checkpoint. It means that the parser is taking a shift transition. It exposes the state of the parser before and after the transition. The Boolean parameter tells whether the parser intends to request a new token after this transition. (It always does, except when it is about to accept.)

AboutToReduce is an intermediate checkpoint. It means that the parser is about to perform a reduction step. It exposes the parser's current state as well as the production that is about to be reduced.

HandlingError is an intermediate checkpoint. It means that the parser has detected an error and is currently handling it, in several steps.

offer allows the user to resume the parser after it has suspended itself with a checkpoint of the form InputNeeded env. offer expects the old checkpoint as well as a new token and produces a new checkpoint. It does not raise any exception.

Sourcetype strategy = [
  1. | `Legacy
  2. | `Simplified
]

The optional argument strategy influences the manner in which resume deals with checkpoints of the form HandlingError _. Its default value is `Legacy. It can be briefly described as follows:

  • If the error token is used only to report errors (that is, if the error token appears only at the end of a production, whose semantic action raises an exception) then the simplified strategy should be preferred. (This includes the case where the error token does not appear at all in the grammar.)
  • If the error token is used to recover after an error, or if perfect backward compatibility is required, the legacy strategy should be selected.

More details on strategies appear in the file Engine.ml.

Sourceval resume : ?strategy:strategy -> 'a checkpoint -> 'a checkpoint

resume allows the user to resume the parser after it has suspended itself with a checkpoint of the form Shifting _, AboutToReduce _, or HandlingError _. resume expects the old checkpoint and produces a new checkpoint. It does not raise any exception.

A token supplier is a function of no arguments which delivers a new token (together with its start and end positions) every time it is called.

Sourceval lexer_lexbuf_to_supplier : (Lexing.lexbuf -> token) -> Lexing.lexbuf -> supplier

A pair of a lexer and a lexing buffer can be turned into a supplier.

The functions offer and resume are sufficient to write a parser loop. One can imagine many variations (which is why we expose these functions in the first place!). Here, we expose a few variations of the main loop, ready for use.

Sourceval loop : ?strategy:strategy -> supplier -> 'a checkpoint -> 'a

loop supplier checkpoint begins parsing from checkpoint, reading tokens from supplier. It continues parsing until it reaches a checkpoint of the form Accepted v or Rejected. In the former case, it returns v. In the latter case, it raises the exception Error. The optional argument strategy, whose default value is Legacy, is passed to resume and influences the error-handling strategy.

Sourceval loop_handle : ('a -> 'answer) -> ('a checkpoint -> 'answer) -> supplier -> 'a checkpoint -> 'answer

loop_handle succeed fail supplier checkpoint begins parsing from checkpoint, reading tokens from supplier. It continues parsing until it reaches a checkpoint of the form Accepted v or HandlingError env (or Rejected, but that should not happen, as HandlingError _ will be observed first). In the former case, it calls succeed v. In the latter case, it calls fail with this checkpoint. It cannot raise Error.

This means that Menhir's error-handling procedure does not get a chance to run. For this reason, there is no strategy parameter. Instead, the user can implement her own error handling code, in the fail continuation.

Sourceval loop_handle_undo : ('a -> 'answer) -> ('a checkpoint -> 'a checkpoint -> 'answer) -> supplier -> 'a checkpoint -> 'answer

loop_handle_undo is analogous to loop_handle, except it passes a pair of checkpoints to the failure continuation.

The first (and oldest) checkpoint is the last InputNeeded checkpoint that was encountered before the error was detected. The second (and newest) checkpoint is where the error was detected, as in loop_handle. Going back to the first checkpoint can be thought of as undoing any reductions that were performed after seeing the problematic token. (These reductions must be default reductions or spurious reductions.)

loop_handle_undo must initially be applied to an InputNeeded checkpoint. The parser's initial checkpoints satisfy this constraint.

Sourceval shifts : 'a checkpoint -> 'a env option

shifts checkpoint assumes that checkpoint has been obtained by submitting a token to the parser. It runs the parser from checkpoint, through an arbitrary number of reductions, until the parser either accepts this token (i.e., shifts) or rejects it (i.e., signals an error). If the parser decides to shift, then Some env is returned, where env is the parser's state just before shifting. Otherwise, None is returned.

It is desirable that the semantic actions be side-effect free, or that their side-effects be harmless (replayable).

The function acceptable allows testing, after an error has been detected, which tokens would have been accepted at this point. It is implemented using shifts. Its argument should be an InputNeeded checkpoint.

For completeness, one must undo any spurious reductions before carrying out this test -- that is, one must apply acceptable to the FIRST checkpoint that is passed by loop_handle_undo to its failure continuation.

This test causes some semantic actions to be run! The semantic actions should be side-effect free, or their side-effects should be harmless.

The position pos is used as the start and end positions of the hypothetical token, and may be picked up by the semantic actions. We suggest using the position where the error was detected.

Sourcetype 'a lr1state

The abstract type 'a lr1state describes the non-initial states of the LR(1) automaton. The index 'a represents the type of the semantic value associated with this state's incoming symbol.

Sourceval number : _ lr1state -> int

The states of the LR(1) automaton are numbered (from 0 and up).

Sourceval production_index : production -> int

production_index maps a production to its integer index.

Sourceval find_production : int -> production

find_production maps a production index to a production. Its argument must be a valid index; use with care.

An element is a pair of a non-initial state s and a semantic value v associated with the incoming symbol of this state. The idea is, the value v was pushed onto the stack just before the state s was entered. Thus, for some type 'a, the state s has type 'a lr1state and the value v has type 'a. In other words, the type element is an existential type.

The parser's stack is (or, more precisely, can be viewed as) a stream of elements. The functions top and pop offer access to this stream.

Sourceval top : 'a env -> element option

top env returns the parser's top stack element. The state contained in this stack element is the current state of the automaton. If the stack is empty, None is returned. In that case, the current state of the automaton must be an initial state.

Sourceval pop_many : int -> 'a env -> 'a env option

pop_many i env pops i cells off the automaton's stack. This is done via i successive invocations of pop. Thus, pop_many 1 is pop. The index i must be nonnegative. The time complexity is O(i).

Sourceval get : int -> 'a env -> element option

get i env returns the parser's i-th stack element. The index i is 0-based: thus, get 0 is top. If i is greater than or equal to the number of elements in the stack, None is returned. The time complexity is O(i).

Sourceval current_state_number : 'a env -> int

current_state_number env is (the integer number of) the automaton's current state. This works even if the automaton's stack is empty, in which case the current state is an initial state. This number can be passed as an argument to a message function generated by menhir --compile-errors.

Sourceval equal : 'a env -> 'a env -> bool

equal env1 env2 tells whether the parser configurations env1 and env2 are equal in the sense that the automaton's current state is the same in env1 and env2 and the stack is *physically* the same in env1 and env2. If equal env1 env2 is true, then the sequence of the stack elements, as observed via pop and top, must be the same in env1 and env2. Also, if equal env1 env2 holds, then the checkpoints input_needed env1 and input_needed env2 must be equivalent. The function equal has time complexity O(1).

positions env returns the start and end positions of the current lookahead token. In an initial state, a pair of twice the initial position is returned.

Sourceval env_has_default_reduction : 'a env -> bool

When applied to an environment taken from a checkpoint of the form AboutToReduce (env, prod), the function env_has_default_reduction tells whether the reduction that is about to take place is a default reduction.

Sourceval state_has_default_reduction : _ lr1state -> bool

state_has_default_reduction s tells whether the state s has a default reduction. This includes the case where s is an accepting state.

Sourceval pop : 'a env -> 'a env option

pop env returns a new environment, where the parser's top stack cell has been popped off. (If the stack is empty, None is returned.) This amounts to pretending that the (terminal or nonterminal) symbol that corresponds to this stack cell has not been read.

Sourceval force_reduction : production -> 'a env -> 'a env

force_reduction prod env should be called only if in the state env the parser is capable of reducing the production prod. If this condition is satisfied, then this production is reduced, which means that its semantic action is executed (this can have side effects!) and the automaton makes a goto (nonterminal) transition. If this condition is not satisfied, Invalid_argument _ is raised.

Sourceval input_needed : 'a env -> 'a checkpoint

input_needed env returns InputNeeded env. That is, out of an env that might have been obtained via a series of calls to the functions pop, force_reduction, feed, etc., it produces a checkpoint, which can be used to resume normal parsing, by supplying this checkpoint as an argument to offer.

This function should be used with some care. It could "mess up the lookahead" in the sense that it allows parsing to resume in an arbitrary state s with an arbitrary lookahead symbol t, even though Menhir's reachability analysis (menhir --list-errors) might well think that it is impossible to reach this particular configuration. If one is using Menhir's new error reporting facility, this could cause the parser to reach an error state for which no error message has been prepared.

Sourcetype _ nonterminal =
  1. | N_yield_stmt : Ast.simplestmt_desc nonterminal
  2. | N_yield_expr : Ast.testlist nonterminal
  3. | N_xor_expr : Ast.target nonterminal
  4. | N_with_stmt : Ast.statement_desc nonterminal
  5. | N_with_item_list : (Ast.target * Ast.target option) list nonterminal
  6. | N_with_item : (Ast.target * Ast.target option) nonterminal
  7. | N_wildcard_pattern : Ast.pattern nonterminal
  8. | N_while_stmt : Ast.statement_desc nonterminal
  9. | N_varargslist : Ast.parameters nonterminal
  10. | N_varargs_ : Ast.vararg list nonterminal
  11. | N_vararg : Ast.vararg nonterminal
  12. | N_value_pattern : Ast.pattern nonterminal
  13. | N_typedargslist : (Ast.loc * Ast.vararg list) nonterminal
  14. | N_typedargs_ : Ast.vararg list nonterminal
  15. | N_typedarg : Ast.vararg nonterminal
  16. | N_try_stmt : Ast.statement_desc nonterminal
  17. | N_try_except : (Ast.suite * (Ast.except * Ast.suite) list) nonterminal
  18. | N_trailer : Ast.trailer nonterminal
  19. | N_tfpdef : Ast.fpdef nonterminal
  20. | N_testlist_star_expr : Ast.testlist nonterminal
  21. | N_testlist_or_yield_expr : Ast.testlist nonterminal
  22. | N_testlist_comp : Ast.primary_desc nonterminal
  23. | N_testlist_ : Ast.testlist nonterminal
  24. | N_testlist1_star_expr : Ast.target list nonterminal
  25. | N_testlist1_ : Ast.target list nonterminal
  26. | N_testlist1 : Ast.target list nonterminal
  27. | N_testlist : Ast.testlist nonterminal
  28. | N_test : Ast.target nonterminal
  29. | N_term : Ast.target nonterminal
  30. | N_sync_comp_for : (Ast.target list * Ast.target * Ast.compiter option) nonterminal
  31. | N_suite : Ast.suite nonterminal
  32. | N_subscripts : Ast.sliceitem list nonterminal
  33. | N_subscriptlist : Ast.sliceitem list nonterminal
  34. | N_subscript : Ast.sliceitem nonterminal
  35. | N_subject_expr : Ast.subject_expr nonterminal
  36. | N_strings : Ast.pystring list nonterminal
  37. | N_stringliteral : Ast.pystring nonterminal
  38. | N_stmts : Ast.statement list nonterminal
  39. | N_stmt : Ast.statement nonterminal
  40. | N_star_pattern : Ast.pattern nonterminal
  41. | N_star_expr : Ast.target nonterminal
  42. | N_small_stmts : Ast.simplestmt list nonterminal
  43. | N_small_stmt_ : Ast.simplestmt_desc nonterminal
  44. | N_small_stmt : Ast.simplestmt nonterminal
  45. | N_sliceop : Ast.target option nonterminal
  46. | N_simple_stmt_ : Ast.statement nonterminal
  47. | N_simple_stmt : Ast.statement nonterminal
  48. | N_signed_number : Ast.literal_expr nonterminal
  49. | N_shift_expr : Ast.target nonterminal
  50. | N_sequence_pattern : Ast.pattern nonterminal
  51. | N_separated_nonempty_list_PIPE_closed_pattern_ : Ast.pattern list nonterminal
  52. | N_return_stmt : Ast.simplestmt_desc nonterminal
  53. | N_ret_annot : Ast.target nonterminal
  54. | N_raise_stmt : Ast.simplestmt_desc nonterminal
  55. | N_print_stmt : Ast.simplestmt_desc nonterminal
  56. | N_primary : Ast.primary nonterminal
  57. | N_power : Ast.target nonterminal
  58. | N_patterns : Ast.pattern nonterminal
  59. | N_pattern_capture_target : Ast.pattern nonterminal
  60. | N_pattern : Ast.pattern nonterminal
  61. | N_pass_stmt : Ast.simplestmt_desc nonterminal
  62. | N_parameters : Ast.parameters nonterminal
  63. | N_or_test : Ast.target nonterminal
  64. | N_or_pattern : Ast.pattern nonterminal
  65. | N_option_open_sequence_pattern_ : Ast.pattern option nonterminal
  66. | N_option_maybe_sequence_pattern_ : Ast.pattern option nonterminal
  67. | N_option_guard_ : Ast.guard option nonterminal
  68. | N_option_COMMA_ : unit option nonterminal
  69. | N_open_sequence_pattern : Ast.pattern nonterminal
  70. | N_old_test : Ast.target nonterminal
  71. | N_old_lambdef : Ast.expr_desc nonterminal
  72. | N_number : Ast.literal nonterminal
  73. | N_not_test : Ast.target nonterminal
  74. | N_nonlocal_stmt : Ast.simplestmt_desc nonterminal
  75. | N_nonempty_list_dot_or_ellipsis_ : int list nonterminal
  76. | N_nonempty_list_case_block_ : Ast.case_block list nonterminal
  77. | N_names : Ast.name list nonterminal
  78. | N_namedexpr_test : Ast.target nonterminal
  79. | N_name : Ast.name nonterminal
  80. | N_mop : Ast.bop nonterminal
  81. | N_maybe_star_pattern : Ast.pattern nonterminal
  82. | N_maybe_sequence_pattern : Ast.pattern nonterminal
  83. | N_match_stmt : Ast.statement_desc nonterminal
  84. | N_mapping_pattern : Ast.pattern nonterminal
  85. | N_main : Ast.fileinput nonterminal
  86. | N_literal_expr : Ast.literal_expr nonterminal
  87. | N_literal : Ast.literal nonterminal
  88. | N_lambdef : Ast.expr_desc nonterminal
  89. | N_keyword_pattern : Ast.pattern nonterminal
  90. | N_key_value_pattern : Ast.pattern nonterminal
  91. | N_imports : Ast.name_as_name list nonterminal
  92. | N_import_stmt : Ast.simplestmt_desc nonterminal
  93. | N_import_name : Ast.simplestmt_desc nonterminal
  94. | N_import_from : Ast.simplestmt_desc nonterminal
  95. | N_import_as_names_list : Ast.name_as_name list nonterminal
  96. | N_import_as_names : Ast.name_as_name list nonterminal
  97. | N_import_as_name : Ast.name_as_name nonterminal
  98. | N_if_stmt : Ast.statement_desc nonterminal
  99. | N_guard : Ast.guard nonterminal
  100. | N_group_pattern : Ast.pattern nonterminal
  101. | N_global_stmt : Ast.simplestmt_desc nonterminal
  102. | N_funcdef : Ast.statement_desc nonterminal
  103. | N_fplist : Ast.fpdef list nonterminal
  104. | N_fpdefs : Ast.fpdef list nonterminal
  105. | N_fpdef : Ast.fpdef nonterminal
  106. | N_for_stmt : Ast.statement_desc nonterminal
  107. | N_flow_stmt : Ast.simplestmt_desc nonterminal
  108. | N_finally : (Ast.loc * Ast.suite) nonterminal
  109. | N_file_input_ : Ast.statement list nonterminal
  110. | N_file_input : Ast.statement list nonterminal
  111. | N_factor : Ast.target nonterminal
  112. | N_exprlist : Ast.target list nonterminal
  113. | N_expr_stmt : Ast.simplestmt_desc nonterminal
  114. | N_expr : Ast.target nonterminal
  115. | N_exec_stmt : Ast.simplestmt_desc nonterminal
  116. | N_except_clause_suites : (Ast.except * Ast.suite) list nonterminal
  117. | N_except_clause : Ast.except nonterminal
  118. | N_eq_testlists : Ast.testlist list nonterminal
  119. | N_els : (Ast.loc * Ast.suite) nonterminal
  120. | N_elifs : (Ast.loc * Ast.target * Ast.suite) list nonterminal
  121. | N_elif : (Ast.loc * Ast.target * Ast.suite) nonterminal
  122. | N_double_star_pattern : Ast.pattern nonterminal
  123. | N_dotted_name_ : Ast.dottedname nonterminal
  124. | N_dotted_name : Ast.dottedname nonterminal
  125. | N_dotted_as_names : Ast.dottedname_as_name list nonterminal
  126. | N_dotted_as_name : Ast.dottedname_as_name nonterminal
  127. | N_dot_or_ellipsis_seq : Ast.dots nonterminal
  128. | N_dictorsetmaker : Ast.dictorsetmaker nonterminal
  129. | N_dictelems : Ast.dictelem list nonterminal
  130. | N_dictelem : Ast.dictelem nonterminal
  131. | N_del_stmt : Ast.simplestmt_desc nonterminal
  132. | N_decorators : Ast.decorator list nonterminal
  133. | N_decorator : Ast.decorator nonterminal
  134. | N_decorated : Ast.statement_desc nonterminal
  135. | N_continue_stmt : Ast.simplestmt_desc nonterminal
  136. | N_compound_stmt_ : Ast.statement_desc nonterminal
  137. | N_compound_stmt : Ast.statement nonterminal
  138. | N_complex_number : Ast.literal_expr nonterminal
  139. | N_comparison : Ast.target nonterminal
  140. | N_comp_op : Ast.bop nonterminal
  141. | N_comp_iter : Ast.compiter nonterminal
  142. | N_comp_if : Ast.compif nonterminal
  143. | N_comp_for : Ast.compfor nonterminal
  144. | N_closed_pattern : Ast.pattern nonterminal
  145. | N_classdef : Ast.statement_desc nonterminal
  146. | N_class_pattern : Ast.pattern nonterminal
  147. | N_case_block : Ast.case_block nonterminal
  148. | N_break_stmt : Ast.simplestmt_desc nonterminal
  149. | N_augassign : Ast.augop nonterminal
  150. | N_atom : Ast.primary_desc nonterminal
  151. | N_async_stmt : Ast.statement_desc nonterminal
  152. | N_async_funcdef : Ast.statement_desc nonterminal
  153. | N_assert_stmt : Ast.simplestmt_desc nonterminal
  154. | N_as_pattern : Ast.pattern nonterminal
  155. | N_arith_expr : Ast.target nonterminal
  156. | N_argument : Ast.argument nonterminal
  157. | N_arglist : Ast.arglist nonterminal
  158. | N_arg_comma_list_ : Ast.argument list nonterminal
  159. | N_annot : (Ast.loc * Ast.target) nonterminal
  160. | N_annassign : ((Ast.loc * Ast.target) * Ast.testlist option) nonterminal
  161. | N_and_test : Ast.target nonterminal
  162. | N_and_expr : Ast.target nonterminal
  163. | N__star_exprs : Ast.target list nonterminal
  164. | N__primary : Ast.primary nonterminal
  165. | N__positional_patterns : Ast.pattern list nonterminal
  166. | N__maybe_star_patterns : Ast.pattern list nonterminal
  167. | N__keyword_patterns : Ast.pattern list nonterminal
  168. | N__items_pattern : Ast.pattern list nonterminal
  169. | N__exprs : Ast.target list nonterminal
include MenhirLib.IncrementalEngine.INSPECTION with type 'a lr1state := 'a lr1state with type production := production with type 'a terminal := 'a terminal with type 'a nonterminal := 'a nonterminal with type 'a env := 'a env
include MenhirLib.IncrementalEngine.SYMBOLS with type 'a terminal := 'a terminal with type 'a nonterminal := 'a nonterminal
Sourcetype 'a symbol =
  1. | T : 'a terminal -> 'a symbol
  2. | N : 'a nonterminal -> 'a symbol

The type 'a symbol represents a terminal or nonterminal symbol. It is the disjoint union of the types 'a terminal and 'a nonterminal.

Sourcetype xsymbol =
  1. | X : 'a symbol -> xsymbol

The type xsymbol is an existentially quantified version of the type 'a symbol. This type is useful in situations where 'a is not statically known.

Sourcetype item = production * int

An LR(0) item is a pair of a production prod and a valid index i into this production. That is, if the length of rhs prod is n, then i is comprised between 0 and n, inclusive.

The following are total ordering functions.

Sourceval compare_terminals : _ terminal -> _ terminal -> int
Sourceval compare_nonterminals : _ nonterminal -> _ nonterminal -> int
Sourceval compare_symbols : xsymbol -> xsymbol -> int
Sourceval compare_productions : production -> production -> int
Sourceval compare_items : item -> item -> int
Sourceval incoming_symbol : 'a lr1state -> 'a symbol

incoming_symbol s is the incoming symbol of the state s, that is, the symbol that the parser must recognize before (has recognized when) it enters the state s. This function gives access to the semantic value v stored in a stack element Element (s, v, _, _). Indeed, by case analysis on the symbol incoming_symbol s, one discovers the type 'a of the value v.

Sourceval items : _ lr1state -> item list

items s is the set of the LR(0) items in the LR(0) core of the LR(1) state s. This set is not epsilon-closed. This set is presented as a list, in an arbitrary order.

lhs prod is the left-hand side of the production prod. This is always a non-terminal symbol.

Sourceval rhs : production -> xsymbol list

rhs prod is the right-hand side of the production prod. This is a (possibly empty) sequence of (terminal or nonterminal) symbols.

Sourceval nullable : _ nonterminal -> bool

nullable nt tells whether the non-terminal symbol nt is nullable. That is, it is true if and only if this symbol produces the empty word epsilon.

Sourceval first : _ nonterminal -> _ terminal -> bool

first nt t tells whether the FIRST set of the nonterminal symbol nt contains the terminal symbol t. That is, it is true if and only if nt produces a word that begins with t.

Sourceval xfirst : xsymbol -> _ terminal -> bool

xfirst is analogous to first, but expects a first argument of type xsymbol instead of _ terminal.

Sourceval foreach_terminal : (xsymbol -> 'a -> 'a) -> 'a -> 'a

foreach_terminal enumerates the terminal symbols, including error.

Sourceval foreach_terminal_but_error : (xsymbol -> 'a -> 'a) -> 'a -> 'a

foreach_terminal_but_error enumerates the terminal symbols, excluding error.

feed symbol startp semv endp env causes the parser to consume the (terminal or nonterminal) symbol symbol, accompanied with the semantic value semv and with the start and end positions startp and endp. Thus, the automaton makes a transition, and reaches a new state. The stack grows by one cell. This operation is permitted only if the current state (as determined by env) has an outgoing transition labeled with symbol. Otherwise, Invalid_argument _ is raised.