Page
Library
Module
Module type
Parameter
Class
Class type
Source
This walkthrough builds an arithmetic expression parser that handles expressions like 1+2*3 or (1+2)*3 and produces a structured tree that respects operator precedence.
Along the way, we'll cover the standard recursive-descent technique for precedence, expect for clear error messages, and the chainl1 combinator as a shortcut for left-associative operator chains.
Source: examples/better_errors.ml
A parser that handles expressions like:
1+2*3 parses as 1 + (2 * 3) (multiplication binds tighter)(1+2)*3 parses as (1 + 2) * 3 (parentheses override precedence)1+2+3 parses as (1 + 2) + 3 (left-associative)The result is an AST (abstract syntax tree), not a computed value.
type expr =
| Num of int
| Add of expr * expr
| Mul of expr * exprNum 3 is a literal number. Add (Num 1, Num 2) is 1 + 2. Mul (Add (Num 1, Num 2), Num 3) is (1 + 2) * 3.
The standard technique for operator precedence in recursive descent is to split the grammar into levels, one per precedence tier. Lower-precedence operators are parsed at higher levels (closer to the entry point), and higher-precedence operators at lower levels (closer to the leaves).
For our two operators:
expr = term (('+' term)*) ← lowest precedence: addition
term = factor (('*' factor)*) ← higher precedence: multiplication
factor = number | '(' expr ')' ← atomic values and groupingEach level calls the next one for its operands. This structure guarantees that * binds tighter than + without any explicit precedence tables.
(* Addition level: lowest precedence *)
let rec expr () =
let left = term () in
let rest =
Parseff.many
(fun () ->
Parseff.skip_whitespace ();
let _ = Parseff.expect "a '+' operator" (fun () -> Parseff.char '+') in
Parseff.skip_whitespace ();
term ())
()
in
List.fold_left (fun acc t -> Add (acc, t)) left rest
(* Multiplication level: higher precedence *)
and term () =
let left = factor () in
let rest =
Parseff.many
(fun () ->
Parseff.skip_whitespace ();
let _ = Parseff.expect "a '*' operator" (fun () -> Parseff.char '*') in
Parseff.skip_whitespace ();
factor ())
()
in
List.fold_left (fun acc f -> Mul (acc, f)) left rest
(* Atoms and parenthesized groups *)
and factor () =
Parseff.skip_whitespace ();
Parseff.or_
(fun () ->
let _ = Parseff.expect "an opening parenthesis" (fun () -> Parseff.char '(') in
Parseff.skip_whitespace ();
let e = expr () in
Parseff.skip_whitespace ();
let _ = Parseff.expect "a closing parenthesis" (fun () -> Parseff.char ')') in
e)
(fun () ->
let d = Parseff.expect "a number" Parseff.digit in
Num d)
()Each precedence level calls the next one for its operands. expr parses term (('+' term)*), term parses factor (('*' factor)*), and factor handles numbers and parenthesized sub-expressions. Because term is called from within expr, multiplication binds tighter than addition.
many collects zero or more operator-operand pairs, and fold_left builds left-associative AST nodes: 1+2+3 becomes Add(Add(Num 1, Num 2), Num 3). or_ in factor tries the parenthesized path first; if the input doesn't start with (, it backtracks and tries a digit.
expect improves errorsWithout expect, a failure on the closing parenthesis would say something like expected ')'. With expect, it says expected a closing parenthesis. This matters when users see the error.
Compare the error messages on "1+":
expect: expected digitexpect: expected a numberAnd on "(1+2":
expect: expected ')'expect: expected a closing parenthesisLet's trace 1+2*3 through the parser:
expr() calls term()term() calls factor(), which matches 1 → Num 1term() tries many(* factor), but no * follows, so rest = []term() returns Num 1 to expr()expr() tries many(+ term) and sees ++, calls term()term() calls factor(), which matches 2 → Num 2term() tries many(* factor) and sees **, calls factor(), which matches 3 → Num 3term() builds Mul(Num 2, Num 3) via fold_leftexpr() builds Add(Num 1, Mul(Num 2, Num 3)) via fold_leftThe key insight: * is consumed inside term, so by the time expr sees the result, 2*3 is already a single Mul node. That's how precedence works. Each level "claims" its operators before the level above sees them.
chainl1The many + fold_left pattern is so common that Parseff provides chainl1 as a shortcut:
let rec expr () =
Parseff.chainl1
term
(fun () ->
Parseff.skip_whitespace ();
let _ = Parseff.char '+' in
Parseff.skip_whitespace ();
fun a b -> Add (a, b))
()
and term () =
Parseff.chainl1
factor
(fun () ->
Parseff.skip_whitespace ();
let _ = Parseff.char '*' in
Parseff.skip_whitespace ();
fun a b -> Mul (a, b))
()
and factor () =
Parseff.skip_whitespace ();
Parseff.or_
(fun () ->
let _ = Parseff.char '(' in
let e = expr () in
Parseff.skip_whitespace ();
let _ = Parseff.char ')' in
e)
(fun () -> Num (Parseff.digit ()))
()chainl1 element op parses one or more elements separated by op, combining them left-to-right. The op parser returns the combining function. This is more concise and expresses the intent directly.
For right-associative operators (like exponentiation), use chainr1 instead.
For debugging, a simple to_string function:
let rec expr_to_string = function
| Num n -> string_of_int n
| Add (l, r) ->
Printf.sprintf "(%s + %s)" (expr_to_string l) (expr_to_string r)
| Mul (l, r) ->
Printf.sprintf "(%s * %s)" (expr_to_string l) (expr_to_string r)(* "1+2*3" -> "(1 + (2 * 3))" *)
(* "(1+2)*3" -> "((1 + 2) * 3)" *)