package obelisk
Install
Dune Dependency
Authors
Maintainers
Sources
md5=1494ac4b54ad165d2ddea26a0f54d2dc
sha512=a911070bb474e75c9332dac208c9916ef6be2f46148cafec9ce9c920f07e782d26796cb2f77ecb7f62247c6f56a53bc34aaf66d37ce5ab6d148e1459dd61a8e0
Description
Obelisk is a simple tool which produces pretty-printed output from a Menhir parser file (.mly). It is inspired from yacc2latex and is also written in OCaml, but is aimed at supporting features from Menhir instead of only those of ocamlyacc.
Published: 10 Feb 2021
README
Obelisk
Obelisk is a simple tool which produces pretty-printed output from a Menhir parser file (.mly).
It is inspired from yacc2latex and is also written in OCaml, but is aimed at supporting features from Menhir instead of only those of ocamlyacc.
Table of Contents
Installation
Dependencies
The Makefile also uses imagemagick and wkhtmltopdf to build documentation images.
In addition to the package suffix, which is used to define starred commands, here is a summary of package dependencies for the different LaTeX modes:
-tabular
:-backnaur
: backnaur
OPAM
If you use OPAM, just type:
opam install obelisk
Manual installation
Just git clone
to clone the Obelisk repository, then type:
dune build
This will provide you with an executable which you can feed .mly files with: dune exec src/main.exe -- <options> <file.mly>
.
If you want to install obelisk, you can type:
dune install [--prefix <the destination directory>]
Usage
obelisk [latex|html] [options] <files>
If multiple files are specified, Obelisk will output a concatenated result, without consistency checks, so the user is responsible for avoiding eg. name clashes between the several files.
By default Obelisk defaults to standard output, use -o <file>
to specify an output file.
Pattern recognition
Obelisk can infer some common patterns (possibly parameterized):
options
lists and non-empty lists
separated lists and non-empty separated lists
Once recognized, if the -i
switch is specified the rules are deleted and their instances are replaced with default constructions (eg. _*, _+, [_]). Without the -i
flag, only the productions of the recognized rules are replaced, the total amount of rules remaining the same.
For example, on these simple rules (from this file):
my_option(X, Y):
| {}
| Y X {}
my_list(A):
| {}
| A my_list(A) {}
my_nonempty_list(C):
| C {}
| C my_nonempty_list(C) {}
my_separated_nonempty_list(X,Y):
| X {}
| X Y my_separated_nonempty_list(X,Y) {}
my_separated_list(X,S):
| {}
| my_separated_nonempty_list(X,S) {}
my_rule(E,F,S1,S2):
| my_option(E, F) {}
| my_list(E) {}
| my_nonempty_list(F) {}
| my_separated_nonempty_list(E,S1) {}
| my_separated_list(F,S2) {}
Obelisk outputs:
<my_option(X, Y)> ::= [Y X]
<my_list(A)> ::= A*
<my_nonempty_list(C)> ::= C+
<my_separated_nonempty_list(X, Y)> ::= X (Y X)*
<my_separated_list(X, S)> ::= [X (S X)*]
<my_rule(E, F, S1, S2)> ::= <my_option(E, F)>
| <my_list(E)>
| <my_nonempty_list(F)>
| <my_separated_nonempty_list(E, S1)>
| <my_separated_list(F, S2)>
And with the -i
switch:
<my_rule(E, F, S1, S2)> ::= [F E]
| E*
| F+
| E (S1 E)*
| [F (S2 F)*]
Multi-format output
By default the output format is a simple text format close to the BNF syntax. You can use the subcommands latex
or html
to get a LaTeX (resp. HTML) file.
In default and HTML mode, the option -noaliases
avoid printing token aliases in the output.
LaTeX
Use the following options to tweak the LaTeX:
-tabular
: a tabular-based format from the tabu package (default)-syntax
: use the syntax package-backnaur
: use the backnaur package (not recommended: manual line-wrapping through this trick)
In either cases, the output may be customized via the use of LaTeX commands that you can redefine to fit your needs. The commands names are auto-generated from the terminal names, and because of LaTeX limitations, underscore are removed and numbers are converted into their roman form.
By default in LaTeX mode, the -o <grammar.tex>
switch will produce the standalone LaTeX file <grammar.tex> which you can directly compile (eg. with pdflatex).
But in conjunction with -o <grammar.tex>
, you can use -package <definitions>
to output two files:
a LaTeX file <grammar.tex> containing only the grammar contents ;
a package file <definitions.sty> (the .sty extension is added automatically) containing the necessary extra packages requirements and command definitions.
These two files are then intended to be included in a non-supplied main LaTeX file following this example skeleton:
\documentclass[preview]{standalone}
\usepackage{definitions}
\begin{document}
\include{grammar}
\end{document}
To avoid name clashes, in particular when using the -package
option and eg. importing multiple grammars with the same LaTeX commands names, or in the case where one of the syntax construction name matches one already defined LaTeX macro, you can specify a common prefix for the commands with the option -prefix <myprefix>
.
As end
-beginning commands are forbidden in LaTeX, commands creating from rules with names beginning with end
are automatically prefixed with zzz
.
HTML
The HTML file uses internal CSS stylesheet which allows one to customize the output (in a poorer way than in the latex
mode). The stylesheet uses content
properties for some parts of the grammar by default (-css
option), to make it modular and easily modifiable, but then some symbols are not treated as content and, for example, are not copy-pastable. Use the -nocss
option to disable the use of such properties.
Example
Here are the different formats output obtained by Obelisk from its own parser.
Default
<specification> ::= <rule>* EOF
<rule> ::= <old_rule>
| <new_rule>
<old_rule> ::= [<flags>] <ident> ATTRIBUTE* <parameters(<ident>)> COLON
<optional_bar> <group> (BAR <group>)* SEMICOLON*
<flags> ::= PUBLIC
| INLINE
| PUBLIC INLINE
| INLINE PUBLIC
<optional_bar> ::= [BAR]
<group> ::= <production> (BAR <production>)* ACTION [<precedence>]
<production> ::= <producer>* [<precedence>]
<producer> ::= [LID EQ] <actual> ATTRIBUTE* SEMICOLON*
<generic_actual(A, B)> ::= <ident> <parameters(A)>
| B <modifier>
<actual> ::= <generic_actual(<lax_actual>, <actual>)>
<lax_actual> ::= <generic_actual(<lax_actual>, <actual>)>
| <group> (BAR <group>)*
<new_rule> ::= [PUBLIC] LET LID ATTRIBUTE* <parameters(<ident>)> <binder>
<expression>
<binder> ::= COLONEQ
| EQEQ
<expression> ::= <optional_bar> <seq_expression> (BAR <seq_expression>)*
<seq_expression> ::= [<pattern> EQ] <symbol_expression> SEMICOLON
<seq_expression>
| <symbol_expression>
| <action_expression>
<symbol_expression> ::= <ident> <parameters(<expression>)>
| <symbol_expression> <modifier>
<action_expression> ::= <action>
| <action> <precedence>
| <precedence> <action>
<action> ::= ACTION
| POINTFREEACTION
<pattern> ::= LID
| UNDERSCORE
| TILDE
| LPAR [<pattern> (COMMA <pattern>)*] RPAR
<modifier> ::= OPT
| PLUS
| STAR
<precedence> ::= PREC <ident>
<parameters(X)> ::= [LPAR [X (COMMA X)*] RPAR]
<ident> ::= UID
| LID
| QID