package obelisk

  1. Overview
  2. Docs
Pretty-printing for Menhir files

Install

Dune Dependency

Authors

Maintainers

Sources

obelisk-0.8.0.tbz
sha256=89e86dd6484679765deed8028932c2826aa106a794b7ee64cb38c8fc89491aa8
sha512=59e03ea49715a8cc0ecf5db0686dfc316270ff56dc515b46a13e440fd5d44fe103c121ce73d181d83588d8e305b536cee965a1055359185aa8e3dffccb131c76

Description

Obelisk is a simple tool which produces pretty-printed output from a Menhir parser file (.mly). It is inspired from yacc2latex and is also written in OCaml, but is aimed at supporting features from Menhir instead of only those of ocamlyacc.

Published: 10 Mar 2025

README

Obelisk

Obelisk is a simple tool that produces pretty-printed output from a Menhir parser file (.mly).

It is inspired by yacc2latex and is also written in OCaml but is aimed at supporting features from Menhir instead of only those of ocamlyacc.

Installation

Dependencies

The Makefile also uses imagemagick and wkhtmltopdf to build documentation images.

In addition to the package xparse, which is used to define starred commands, here is a summary of package dependencies for the different LaTeX modes:

OPAM

If you use OPAM, simply type:

opam install obelisk

Manual installation

git clone to clone the Obelisk repository,y, then type:

dune build

This will provide you with an executable which you can feed .mly files with: dune exec src/main.exe -- <options> <file.mly>.

If you want to install Obelisk, you can type:

dune install [--prefix <the destination directory>]

Usage

obelisk [ebnf|latex|html] [options] <files>

If multiple files are specified, Obelisk will output a concatenated result without consistency checks. The user is responsible for avoiding, e.g., name clashes between the several files.

By default, Obelisk defaults to standard output; use -o <file> to specify an output file.

Pattern recognition

Obelisk can infer some common patterns (possibly parameterized):

  • options

  • lists and non-empty lists

  • separated lists and non-empty separated lists

Once recognized, if the -i switch is specified, the rules are deleted, and their instances are replaced with default constructions (e.g., _*, _+, [_]). Without the -i flag, only the productions of the recognized rules are replaced, and the total number of rules remains the same.

For example, on these simple rules (from this file):

my_option(X, Y):
  |     {}
  | Y X {}

my_list(A):
  |              {}
  | A my_list(A) {}

my_nonempty_list(C):
  | C                     {}
  | C my_nonempty_list(C) {}

my_separated_nonempty_list(X,Y):
  | X                                   {}
  | X Y my_separated_nonempty_list(X,Y) {}

my_separated_list(X,S):
  |                                 {}
  | my_separated_nonempty_list(X,S) {}

my_rule:
  | my_option(E, F)                    {}
  | my_list(E)                         {}
  | my_nonempty_list(F)                {}
  | my_separated_nonempty_list(E,S1)   {}
  | my_separated_list(F,S2)            {}

Obelisk (obelisk misc/reco.mly) outputs:

<my_option(X, Y)> ::= [Y X]

<my_list(A)> ::= A*

<my_nonempty_list(C)> ::= C+

<my_separated_nonempty_list(X, Y)> ::= X (Y X)*

<my_separated_list(X, S)> ::= [X (S X)*]

<my_rule> ::= <my_option(E, F)>
            | <my_list(E)>
            | <my_nonempty_list(F)>
            | <my_separated_nonempty_list(E, S1)>
            | <my_separated_list(F, S2)>

And with the -i switch (obelisk -i misc/reco.mly):

<my_rule> ::= [F E]  
            | E*
            | F+
            | E (S1 E)*
            | [F (S2 F)*]

Multi-format output

By default, the output format is a simple text format close to the BNF syntax. You can use the subcommands ebnf, latex or html to get, respectively, an EBNF text output, LaTeX output, or HTML output.

In default, EBNF, and HTML mode, the option -noaliases avoid printing token aliases in the output.

EBNF

In EBNF mode, parameterized rules are specialized into dedicated regular rules. On the example above (obelisk ebnf misc/reco.mly):

my_rule ::= my_option_0
          | my_list_0
          | my_nonempty_list_0
          | my_separated_nonempty_list_0
          | my_separated_list_0

my_option_0 ::= (F E)?

my_nonempty_list_0 ::= F+

my_separated_nonempty_list_1 ::= F (S2 F)*

my_separated_list_0 ::= (F (S2 F)*)?

my_separated_nonempty_list_0 ::= E (S1 E)*

my_list_0 ::= E*

And with the -i switch (obelisk ebnf -i misc/reco.mly):

my_rule ::= (F E)?   
          | E*
          | F+
          | E (S1 E)*
          | (F (S2 F)*)?
LaTeX

Use the following options to tweak the LaTeX:

  • -tabular: a tabular-based format using the tabu package (default)

  • -simplebnf: use the simplebnf package

  • -syntax: use the syntax package

  • -backnaur: use the backnaur package (not recommended: manual line-wrapping through this trick)

Either way, the output may be customized using LaTeX commands that you can redefine to fit your needs. The command names are auto-generated from the terminal names, and because of LaTeX limitations, underscores are removed, and numbers are converted into their Roman form.

By default, in LaTeX mode, the -o <grammar.tex> switch will produce the standalone LaTeX file <grammar.tex> which you can directly compile (e.g. with pdflatex).

But in conjunction with -o <grammar.tex>, you can use -package <definitions> to output two files:

  1. a LaTeX file <grammar.tex> containing only the grammar contents ;

  2. a package file <definitions.sty> (the .sty extension is added automatically) containing the necessary extra package requirements and command definitions.

These two files are then intended to be included in a user-provided main LaTeX file following this example skeleton:

\documentclass[preview]{standalone}

\usepackage{definitions}

\begin{document}

\include{grammar}

\end{document}

To avoid name clashes, in particular when using the -package option and, e.g., importing multiple grammars with the same LaTeX command names, or in the case where one of the syntax construction names matches one already defined LaTeX macro, you can specify a common prefix for the commands with the option -prefix <myprefix>.

HTML

The HTML file uses an internal CSS stylesheet that allows customizing the output (in a poorer way than in the latex mode). The stylesheet uses content properties for some parts of the grammar by default (-css option) to make it modular and easily modifiable. Still, some symbols are not treated as content and, for example, are not copy-pastable. Use the -nocss option to turn off the use of such properties.

Example

Here are the outputs of the different formats obtained by Obelisk from its own [parser](src/parser. my).

Default
<specification> ::= <rule>* EOF

<rule> ::= <old_rule>
         | <new_rule>

<old_rule> ::= [<flags>] <ident> ATTRIBUTE* <parameters(<ident>)> COLON
               <optional_bar> <group> (BAR <group>)* SEMICOLON*

<flags> ::= PUBLIC
          | INLINE
          | PUBLIC INLINE
          | INLINE PUBLIC

<optional_bar> ::= [BAR]

<group> ::= <production> (BAR <production>)* ACTION [<precedence>]

<production> ::= <producer>* [<precedence>]

<producer> ::= [LID EQ] <actual> ATTRIBUTE* SEMICOLON*

<generic_actual(A, B)> ::= <ident> <parameters(A)>
                         | B <modifier>

<actual> ::= <generic_actual(<lax_actual>, <actual>)>

<lax_actual> ::= <generic_actual(<lax_actual>, <actual>)>
               | <group> (BAR <group>)*

<new_rule> ::= [PUBLIC] LET LID ATTRIBUTE* <parameters(<ident>)> <binder>
               <expression>

<binder> ::= COLONEQ
           | EQEQ

<expression> ::= <optional_bar> <seq_expression> (BAR <seq_expression>)*

<seq_expression> ::= [<pattern> EQ] <symbol_expression> SEMICOLON
                     <seq_expression>
                   | <symbol_expression>
                   | <action_expression>

<symbol_expression> ::= <ident> <parameters(<expression>)>
                      | <symbol_expression> <modifier>

<action_expression> ::= <action>
                      | <action> <precedence>
                      | <precedence> <action>

<action> ::= ACTION
           | POINTFREEACTION

<pattern> ::= LID
            | UNDERSCORE
            | TILDE
            | LPAR [<pattern> (COMMA <pattern>)*] RPAR

<modifier> ::= OPT
             | PLUS
             | STAR

<precedence> ::= PREC <ident>

<parameters(X)> ::= [LPAR [X (COMMA X)*] RPAR]

<ident> ::= UID
          | LID
          | QID
EBNF
specification ::= rule* EOF

rule ::= old_rule
       | new_rule

old_rule ::= flags? ident ATTRIBUTE* parameters_0 COLON optional_bar group
             (BAR group)* SEMICOLON*

flags ::= PUBLIC
        | INLINE
        | PUBLIC INLINE
        | INLINE PUBLIC

optional_bar ::= BAR?

group ::= production (BAR production)* ACTION precedence?

production ::= producer* precedence?

producer ::= (LID EQ)? actual ATTRIBUTE* SEMICOLON*

actual ::= generic_actual_0

lax_actual ::= generic_actual_0
             | group (BAR group)*

new_rule ::= PUBLIC? LET LID ATTRIBUTE* parameters_0 binder expression

binder ::= COLONEQ
         | EQEQ

expression ::= optional_bar seq_expression (BAR seq_expression)*

seq_expression ::= (pattern EQ)? symbol_expression SEMICOLON seq_expression
                 | symbol_expression
                 | action_expression

symbol_expression ::= ident parameters_2
                    | symbol_expression modifier

action_expression ::= action
                    | action precedence
                    | precedence action

action ::= ACTION
         | POINTFREEACTION

pattern ::= LID
          | UNDERSCORE
          | TILDE
          | LPAR (pattern (COMMA pattern)*)? RPAR

modifier ::= OPT
           | PLUS
           | STAR

precedence ::= PREC ident

ident ::= UID
        | LID
        | QID

generic_actual_0 ::= ident parameters_1
                   | actual modifier

parameters_1 ::= (LPAR (lax_actual (COMMA lax_actual)*)? RPAR)?

parameters_0 ::= (LPAR (ident (COMMA ident)*)? RPAR)?

parameters_2 ::= (LPAR (expression (COMMA expression)*)? RPAR)?
LaTeX
Tabular

Simplebnf

Syntax

Backnaur

HTML
With CSS content properties

Without CSS content properties

Dependencies (4)

  1. re >= "1.7.2"
  2. menhir >= "20190613"
  3. dune >= "2.2.0"
  4. ocaml >= "4.08"

Dev Dependencies

None

Used by (1)

  1. catala >= "0.9.0"

Conflicts

None

OCaml

Innovation. Community. Security.