Line breaking is based on three concepts:
bh
fires a new line within box
b
, then the indentation of the new line is simply the sum of:
the current indentation of box b
+
the additional box breaking indentation, as defined by box b
+
the additional hint breaking indentation, as defined by break
hint bh
.There are 4 types of boxes. (The most often used is the "hov" box type, so skip the rest at first reading).
Format.open_hbox
procedure): within this box, break hints do not
lead to line breaks.Format.open_vbox
procedure): within this box, every break hint lead
to a new line.Format.open_hvbox
procedure): if it is possible, the entire box
is written on a single line; otherwise, every break hint within the box
leads to a new line.Format.open_box
or Format.open_hovbox
procedures): within this
box, break hints are used to cut the line when there is no more room on the
line. There are two kinds of "hov" boxes, you can find the details below.
In first approximation, let me consider these two kinds of "hov" boxes as
equivalent and obtained by calling the Format.open_box
procedure.Let me give an example. Suppose we can write 10 chars before
the right margin (that indicates no more room). We represent any
char as a -
sign; characters [
and ]
indicates the opening and closing of a box and b
stands
for a break hint given to the pretty-printing engine.
The output "--b--b--" is displayed like this (the b
symbol
stands for the value of the break that is explained below):
Within a "h" box:
--b--b--
Within a "v" box:
--b
--b
--
Within a "hv" box:
If there is enough room to print the box on the line:
--b--b--
But "---b---b---" that cannot fit on the line is written
---b
---b
---
Within a "hov" box:
If there is enough room to print the box on the line:
--b--b--
But if "---b---b---" cannot fit on the line, it is written as
---b---b
---
The first break hint does not lead to a new line, since there is enough room on the line. The second one leads to a new line since there is no more room to print the material following it. If the room left on the line were even shorter, the first break hint may lead to a new line and "---b---b---" is written as:
---b
---b
---
Break hints are also used to output spaces (if the line is not split when the
break is encountered, otherwise the new line indicates properly the separation
between printing items). You output a break hint using print_break sp indent
,
and this sp integer is used to print "sp" spaces. Thus print_break sp ...
may
be thought as: print sp
spaces or output a new line.
For instance, if b is break 1 0
in the output "--b--b--", we get
within a "h" box:
-- -- --
within a "v" box:
--
--
--
within a "hv" box:
-- -- --
or, according to the remaining room on the line:
--
--
--
and similarly for "hov" boxes.
Generally speaking, a printing routine using "format", should not directly
output white spaces: the routine should use break hints instead. (For instance
print_space ()
that is a convenient abbreviation for print_break 1 0
and
outputs a single space or break the line.)
The user gets 2 ways to fix the indentation of new lines:
When defining the box: when you open a box, you can fix the indentation added to each new line opened within that box.
For instance: open_hovbox 1
opens a "hov" box with new lines indented 1 more
than the initial indentation of the box. With output "---[--b--b--b--", we
get:
---[--b--b
--b--
with open_hovbox 2, we get
---[--b--b
--b--
Note: the [ sign in the display is not visible on the screen, it is just there to materialise the aperture of the pretty-printing box. Last "screen" stands for:
-----b--b
--b--
When defining the break that makes the new line. As said above, you output
a break hint using print_break sp indent
. The indent
integer is used to fix
the additional indentation of the new line. Namely, it is added to the default
indentation offset of the box where the break occurs.
For instance, if [ stands for the opening of a "hov" box with 1 as extra
indentation (as obtained by open_hovbox 1
), and b is print_break 1 2
, then
from output "---[--b--b--b--", we get:
---[-- --
--
--
The "hov" box type is refined into two categories.
Format.open_hovbox
procedure): break hints are used to cut the line when
there is no more room on the line; no new line occurs if there is enough room
on the line.Format.open_box
procedure): similar to the "hov" packing box, the break
hints are used to cut the line when there is no more room on the line; in
addition, break hints that can show the box structure lead to new lines even if
there is enough room on the current line.The difference between a packing and a structural "hov" box is shown by a
routine that closes boxes and parentheses at the end of printing: with packing
boxes, the closure of boxes and parentheses do not lead to new lines if there
is enough room on the line, whereas with structural boxes each break hint will
lead to a new line. For instance, when printing
"[(---[(----[(---b)]b)]b)]", where "b" is a break hint without extra
indentation (print_cut ()
). If "[" means opening of a packing "hov" box
(Format.open_hovbox
1), "[(---[(----[(---b)]b)]b)]" is printed as
follows:
(---
(----
(---)))
If we replace the packing boxes by structural boxes (Format.open_box
1), each
break hint that precedes a closing parenthesis can show the boxes structure, if
it leads to a new line; hence "[(---[(----[(---b)]b)]b)]" is printed like
this:
(---
(----
(---
)
)
)
Each break hint increases indentation by one, both for opening and closing parenthesis, because the parenthesis are just characters and don't have special meaning for the formatter. Notice that the closing parenthesis is not vertically aligned with the opening parenthesis.
We can line up the opening and closing parenthesis vertically in two ways:
In both cases the output would be this:
(---
(----
(---
)
)
)
When writing a pretty-printing routine, follow these simple rules:
open_*
and
Format.close_box
must be nested like parentheses).print_space ()
),
unless you explicitly don't want the line to be broken here. For instance,
imagine you want to pretty print an OCaml definition, more precisely a let rec
ident = expression
value definition. You will probably treat the first three
spaces as "unbreakable spaces" and write them directly in the string constants
for keywords, and print "let rec"
before the identifier, and similarly write
=
to get an unbreakable space after the identifier; in contrast, the space
after the =
sign is certainly a break hint, since breaking the line after =
is a usual (and elegant) way to indent the expression part of a definition. In
short, it is often necessary to print unbreakable spaces; however, most of the
time a space should be considered a break hint.Format.force_newline
: this
procedure effectively leads to a newline, but it also as the unfortunate side
effect to partially reinitialise the pretty-printing engine, so that the rest
of the printing material is noticeably messed up.print_newline ()
call, that flushes the
pretty-printer tables (hence the output). (Note that the top-level loop of the
interactive system does it as well, just before a new input.)The format module provides a general printing facility "a la" printf. In addition to the usual conversion facility provided by printf, you can write pretty-printing indications directly inside the format string (opening and closing boxes, indicating breaking hints, etc).
Pretty-printing annotations are introduced by the @
symbol, directly into the
string format. Almost any function of the Format
module can be called from
within a printf
format string. For instance
@[
" open a box (open_box 0). You may precise the type as an extra
argument. For instance @[<hov n>
is equivalent to open_hovbox n
.@]
" close a box (close_box ()
).@
" output a breakable space (print_space ()
).@,
" output a break hint (print_cut ()
).@;<n m>
" emit a "full" break hint (print_break n m
).@.
" end the pretty-printing, closing all the boxes still opened
(print_newline ()
).For instance
printf "@[<1>%s@ =@ %d@ %s@]@." "Prix TTC" 100 "Euros";; Prix TTC = 100 Euros - : unit = ()
Let me give a full example: the shortest non trivial example you could imagine, that is the lambda calculus :)
Thus the problem is to pretty-print the values of a concrete data type that models a language of expressions that defines functions and their applications to arguments.
First, I give the abstract syntax of lambda-terms:
type lambda = | Lambda of string * lambda | Var of string | Apply of lambda * lambda ;;
I use the format library to print the lambda-terms:
open Format;; let ident = print_string;; let kwd = print_string;; val ident : string -> unit = <fun> val kwd : string -> unit = <fun> let rec print_exp0 = function | Var s -> ident s | lam -> open_hovbox 1; kwd "("; print_lambda lam; kwd ")"; close_box () and print_app = function | e -> open_hovbox 2; print_other_applications e; close_box () and print_other_applications f = match f with | Apply (f, arg) -> print_app f; print_space (); print_exp0 arg | f -> print_exp0 f and print_lambda = function | Lambda (s, lam) -> open_hovbox 1; kwd "\\"; ident s; kwd "."; print_space(); print_lambda lam; close_box() | e -> print_app e;; val print_app : lambda -> unit = <fun> val print_other_applications : lambda -> unit = <fun> val print_lambda : lambda -> unit = <fun>
We use the fprintf
function to write the most versatile version of the
pretty-printing functions for lambda-terms. Now, the functions get an extra
argument, namely a pretty-printing formatter (the ppf argument) where printing
will occur. This way the printing routines are more general, since they can
print on any formatter defined in the program (either printing to a file, or to
stdout
, to stderr
, or even to a string). Furthermore, the pretty-printing
functions are now compositional, since they may be used in conjunction with the
special %a
conversion, that prints a fprintf
argument with a user's supplied
function (these user's supplied functions also have a formatter as first
argument).
Using fprintf
, the lambda-terms printing routines can be written as follows:
open Format;; let ident ppf s = fprintf ppf "%s" s;; let kwd ppf s = fprintf ppf "%s" s;; val ident : Format.formatter -> string -> unit val kwd : Format.formatter -> string -> unit let rec pr_exp0 ppf = function | Var s -> fprintf ppf "%a" ident s | lam -> fprintf ppf "@[<1>(%a)@]" pr_lambda lam and pr_app ppf = function | e -> fprintf ppf "@[<2>%a@]" pr_other_applications e and pr_other_applications ppf f = match f with | Apply (f, arg) -> fprintf ppf "%a@ %a" pr_app f pr_exp0 arg | f -> pr_exp0 ppf f and pr_lambda ppf = function | Lambda (s, lam) -> fprintf ppf "@[<1>%a%a%a@ %a@]" kwd "\\" ident s kwd "." pr_lambda lam | e -> pr_app ppf e ;; val pr_app : Format.formatter -> lambda -> unit val pr_other_applications : Format.formatter -> lambda -> unit val pr_lambda : Format.formatter -> lambda -> unit
Given those general printing routines, procedures to print to stdout
or
stderr
is just a matter of partial application:
let print_lambda = pr_lambda std_formatter;; let eprint_lambda = pr_lambda err_formatter;; val print_lambda : lambda -> unit val eprint_lambda : lambda -> unit