This is a set of reasonable guidelines for formatting Caml
programs—guidelines which reflect the consensus among veteran Caml
programmers. Nevertheless, all detailed notifications of possible
errors or omissions will be noted with pleasure. To send your
comments
using GitHub issues.
Original translation from French:
Ruchira Datta 〈datta(at)math.berkeley.edu〉.
Thanks to all those who have already participated in the critique of this page: Daniel de Rauglaudre, Luc Maranget, Jacques Garrigue, Damien Doligez, Xavier Leroy, Bruno Verlyck, Bruno Petazzoni, Francois Maltey, Basile Starynkevitch, Toby Moth, Pierre Lescanne.
The time you spend typing the programs is neglectable compared to the time spent reading them. That's the reason why you save a lot of time if you work hard to optimize readability.
All the time you are "wasting" to get a simpler program today, will return a hundred times in the future during the uncountably many modifications and readings of the program (starting with the first debugging).
Writing programs law: A program is written once, modified ten times, and read 100 times. So simplify its writing, always keep future modifications in mind, and never jeopardize readability.
Pseudo spaces law: never hesitate to separate words of your programs with spaces; the space bar is the easiest key to find on the keyboard, press it as often as necessary!
(1, 2), let
triplet = (x, y, z)...
let (x, y) =
..., you can write
let x, y = ....
Justification: The point is to define several values simultaneously, not to construct a tuple. Moreover, the pattern is set off nicely between
letand=.
match x, y with
| 1, _ -> ...
| x, 1 -> ...
| x, y -> ...
Justification: The point is to match several values in parallel, not to construct a tuple. Moreover, the expressions being matched are set off by
matchandwith, while the patterns are set off nicely by|and->.
x :: l with spaces around the ::
(since
:: is an infix operator, hence surrounded by spaces) and
[1; 2; 3] (since ; is a delimiter, hence
followed by a space).
!'' and ``.'' are not separated
from their arguments.) x + 1
or x + !y.
Justification: If you left out the spaces then
x+1would be understood, butx+!ywould change its meaning since ``+!'' would be interpreted as a multi-character operator.
Criticism: The absence of spaces around an operator improves the readability of formulas when you use it to reflect the relative precedences of operators. For example
x*y + 2*zmakes it very obvious that multiplication takes precedence over addition.
Response: This is a bad idea, a chimera, because nothing in the language ensures that the spaces properly reflect the meaning of the formula. For example
x * z-1means(x * z) - 1, and notx * (z - 1)as the proposed interpretation of spaces would seem to suggest. Besides, the problem of multi-character symbols would keep you from using this convention in a uniform way: you couldn't leave out the spaces around the multiplication to writex*!y + 2*!z. Finally, this playing with the spaces is a subtle and flimsy convention, a subliminal message which is difficult to grasp on reading. If you want to make the precedences obvious, use the expressive means brought to you by the language: write parentheses.
Additional justification: Systematically surrounding operators with spaces simplify the treatment of infix operators which are no more a complex particular case; in effect, whereas you can write
(+)without spaces, you evidently cannot write(*)since(*is read as the beginning of a comment. You must write at least one space as in “( *)”, although an extra space after*is definitively preferable if you want to avoid that*)could be read, in some contexts, as the end of a comment. All those difficulties are easily avoided if you adopt the simple rule proposed here: keep operator symbols well separated by spaces.
In fact you will quickly feel that this rule is not so difficult to follow: the space bar is the greatest and best situated key of the keyboard, it is the easiest to enter and you cannot miss it!
\ character at the end of the line that omits white
spaces on the beginning of next line):
let universal_declaration =
"-1- Programs are born and remain free and equal under the law;\n\
distinctions can only be based on the common good." in
...
Landin's pseudo law: Treat the indentation of your programs as if it determines the meaning of your programs.
I would add to this law: carefully treat the indentation of programs because in some cases it really gives the meaning of the program!
The indentation of programs is an art which excites many strong opinions. Here several indentation styles are given which are drawn from experience and which have not been severely criticized.
When a justification for the adopted style has seemed obvious to me, I have indicated it. On the other hand, criticisms are also noted.
So each time, you have to choose between the different styles
suggested.
The only absolute rule is the first below.
Justification: This width makes it possible to read the code on all displays and to print it in a legible font on a standard sheet.
Justification: When a function goes beyond one screenful, it's time to divide it into subproblems and handle them independently. Beyond a screenful, one gets lost in the code. The indentation is not readable and is difficult to keep correct.
Justification: Between one display and another, the indentation of the program changes completely; it can also becomes completely wrong, if the programmer used both tabulations and spaces to indent the program.
Criticism: The purpose of using tabulations is just to allow the readers of the program to indent more or less by changing the tabs stops. The overall indentation remains correct and the reader is glad to easily customize the indentation amount.
Answer: It seems almost impossible to use this method since you should always use tabulations to indent, which is hard and unnatural.
let ... ;; definitionsThe body of a function defined globally in a module is generally indented normally. However, it's okay to treat this case specially to set off the definition better.
With a regular indentation of 1 or 2 spaces:
let f x = function
| C ->
| D ->
...
let g x =
let tmp =
match x with
| C -> 1
| x -> 0 in
tmp + 1
Justification: No exception to the amount of indentation.
Other conventions are acceptable, for example:
let f x = function
| C ->
| D ->
...
Justification: The vertical bars separating the patterns stop when the definition is done, so it's still easy to pass on to the following definition.
Criticism: An unpleasant exception to the normal indentation.
let f x =
let tmp = ... in
try g x with
| Not_found ->
...
Justification: The first line of the definition is set off nicely, so it's easier to pass from definition to definition.
Criticism: You run into the right margin too quickly.
let ... in constructsThe expression following a definition introduced
by let is indented to the same level as the
keyword let, and the keyword
in which introduces it is written at the end of the line:
let expr1 = ... in
expr1 + expr1
In the case of a series of let definitions, the
preceding rule implies that these definitions should be placed at
the same indentation level:
let expr1 = ... in
let n = ... in
...
Justification: It is suggested that a series of “let ... in” constructs is analogous to a set of assumptions in a mathematical text, whence the same indentation level for all the assumptions.
Variation: some write the keyword in alone on one line to set
apart the final expression of the computation:
let e1 = ... in
let e2 = ... in
let new_expr =
let e1' = derive_expression e1
and e2' = derive_expression e2 in
Add_expression e1' e2'
in
Mult_expression (new_expr, new_expr)
Criticism: Lack of consistency.
if ... then
... else ... Write conditions with multiple branches at the same level of indentation:
if cond1 ...
if cond2 ...
if cond3 ...
Justification: Analogous treatment to pattern-matching clauses, all aligned to the same tab stop.
If the sizes of the conditions and the expressions allow, write for example:
if cond1 then e1 else
if cond2 then e2 else
if cond3 then e3 else
e4
If expressions in the branches of multiple conditions have to be enclosed (when they include statements for instance), write:
if cond then begin
e1
end else
if cond2 then begin
e2
end else
if cond3 then ...
Some suggest another method for multiple conditionals,
starting each line by the keyword else:
if cond1 ...
else if cond2 ...
else if cond3 ...
Justification:
elsifis a keyword in many languages, so use indentation andelse ifto bring it to mind. Morever, you do not have to look to the end of line to know whether the condition is continued or another test is performed.
Yet again, choose your style and use it systematically.Criticism: Lack of consistency in the treatment of all the conditions. Why a special case for the first condition?
Several styles are possible for single branches, according to the
size of the expressions in question and especially the presence of
begin end
or ( )
delimiters
for these expressions.
In the case of delimiting the branches of a conditional, several styles are used:
style
(at end of line:if cond then ( e1 ) else ( e2 )
Or alternatively first
beginat beginning of line:if cond then begin e1 end else begin e2 end
In fact the indentation of conditionals depends on the sizes of the expressions which make them up.
If
cond,e1ande2are small, simply write them on one line:if cond then e1 else e2
If the expressions making up a conditional are purely functional (without side effects), we advocate binding them within the conditional with
let e = ... inwhen they're too big to fit on a line.Justification: This way you get back the simple indentation on one line which is the most readable. As a side benefit, the naming acts as an aid to comprehension.
So now we consider the case in which the expressions in question do have side effects, which keeps us from simply binding them with a
let e = ... in.If
e1andcondare small, bute2large:if cond then e1 else e2If
e1andcondare large ande2small:if cond then e1 else e2If all the expressions are large:
if cond then e1 else e2If there are
( )delimitersif cond then ( e1 ) else ( e2 )A mixture where
e1requires( )bute2is smallif cond then ( e1 ) else e2
Criticism: The first vertical bar is not mandatory: hence, there is no need to write it.
Answer to criticism: If you omit the first bar the indentation seems unnatural : the first case gets an indentation that is greater than a normal new line would necessitate. It is thus a useless exception to the correct indentation rule. It also insists not to use the same syntax for the whole set of clauses, writing the first clause as an exception with a slightly different syntax. Last, aesthetic value is doubtful dubious (some people would say “awful” instead of “doubtful”).
Align all the pattern-matching clauses at the level of the vertical bar which begins each clause, including the first one.
If an expression in a clause is too large to fit on the line, you must break the line immediately after the arrow of the corresponding clause. Then indent normally, starting from the beginning of the pattern of the clause.
Arrows of pattern matching clauses should not be aligned.
match
or trymatch or a
try align the clauses with the beginning of the
construct:
match lam with
| Abs (x, body) -> 1 + size_lambda body
| App (lam1, lam2) -> size_lambda lam1 + size_lambda lam2
| Var v -> 1
try f x with
| Not_found -> ...
| Failure "not yet implemented" -> ...
Put the keyword with at the end of the
line. If the preceding expression extends beyond one line, put
with on a line by itself:
try
let y = f x in
if ...
with
| Not_found -> ...
| Failure "not yet implemented" -> ...
Justification: The keyword
with, on a line by itself shows that the program enters the pattern matching part of the construct.
match lam with
| Abs (x, body) ->
1 + size_lambda body
| App (lam1, lam2) ->
size_lambda lam1 + size_lambda lam2
| Var v ->
Some programmers generalize this rule to all clauses, as soon as one expressions overflows. They will then indent the last clause like this:
| Var v ->
1
Other programmers go one step further and apply this rule systematically to any clause of any pattern matching.
let rec fib = function
| 0 ->
1
| 1 ->
1
| n ->
fib (n - 1) + fib ( n - 2)
Criticism: May be not compact enough; for simple pattern matchings (or simple clauses in complex matchings), the rule does not add any good to readability.
Justitification: I don't see any good reason for this rule, unless you are paid proportionally to the number of lines of code: in this case use this rule to get more money without adding more bugs in your Caml programs!
match
or try, pattern matching of
anonymous functions, starting
by function, are indented with
respect to the function keyword:
map
(function
| Abs (x, body) -> 1 + size_lambda 0 body
| App (lam1, lam2) -> size_lambda (size_lambda 0 lam1) lam2
| Var v -> 1)
lambda_list
let or
let rec gives rise
to several reasonable styles which obey the
preceding rules for pattern matching (the one for anonymous
functions being evidently excepted). See above for recommanded styles.
let rec size_lambda accu = function
| Abs (x, body) -> size_lambda (succ accu) body
| App (lam1, lam2) -> size_lambda (size_lambda accu lam1) lam2
| Var v -> succ accu
let rec size_lambda accu = function
| Abs (x, body) -> size_lambda (succ accu) body
| App (lam1, lam2) -> size_lambda (size_lambda accu lam1) lam2
| Var v -> succ accu
match or
function which has previously
been pushed to the right. Don't write:
let rec f x = function
| [] -> ...
...
but choose to indent the line under
the let keyword:
let rec f x = function
| [] -> ...
...
Justification: You bump into the margin. The aesthetic value is doubtful...
-> symbols in pattern-matching clauses.
let f = function
| C1 -> 1
| Long_name _ -> 2
| _ -> 3
Justification: This makes it harder to maintain the program (the addition of a supplementary case can lead the indentations of all the lines to change and so ... we often give up alignment at that time, then it is better not to align the arrows in the first place!).
As far as possible, avoid arguments which consist of complex
expressions: in these cases define the ``large'' argument by
a let construction.
Justification: No indentation problem; if the name given to the expressions is meaningful, the code is more readable as well.
Additional justification: If the evaluation of the arguments produces side effects, the
letbinding is in fact necessary to explicitly define the order of evaluation.
let temp =
f x y z
``large
expression''
``other large
expression'' in
...
write
let t =
``large
expression''
and u =
``other large
expression'' in
let temp =
f x y z t u in
...
let binding as
well.
In place of
List.map
(function x ->
blabla
blabla
blabla)
l
write
let f x =
blabla
blabla
blabla in
List.map f l
Justification: Much clearer, in particular if the name given to the function is meaningful.
x + y + z
+ t + u
In the case of a ``large expression'' in such an operation sequence, to define the ``large expression'' with the help of aJustification: When the operator starts the line, it is clear that the operation continues on this line.
let in construction
is preferable to having to indent the line.
In place of
x + y + z
+ ``large
expression''
write
let t =
``large
expression'' in
x + y + z + t
You most certainly must bind those expressions too large to be
written in one operation in the case of a combination of operators.
In place of the unreadable expression
(x + y + z * t)
/ (``large
expression'')
write
let u =
``large
expression'' in
(x + y + z * t) / u
These guidelines extend to all operators. For example:
let u =
``large
expression'' in
x :: y
:: z + 1 :: t :: u
Always put your handiwork back on the bench,
and then polish it and re-polish it.
Moreover, it is difficult to identify that the same set of 10 lines of code is repeated 20 times throughout the program. By contrast, if an auxiliary function defines those 10 lines, it is fairly easy to see and find where those lines are used: that's simply where the function is called. If code is copy-pasted all over the place then the program is more difficult to understand.
In conclusion, copy-pasting code leads to programs that are more difficult to read and more difficult to maintain: it has to be banished.
An example of what to avoid: the following comment uses technical words and is thus masquerade into a real comment when it has no additional information of interest;
(*
Function print_lambda:
print a lambda-expression given as argument.
Arguments: lam, any lambda-expression.
Returns: nothing.
Remark: print_lambda can only be used for its side effect.
*)
let rec print_lambda lam =
match lam with
| Var s -> printf "%s" s
| Abs l -> printf "\\ %a" print_lambda l
| App (l1, l2) ->
printf "(%a %a)" print_lambda l1 print_lambda l2
Use assertions as much as possible: they let you avoid verbose comments, while allowing a useful verification upon execution.
For example, the conditions for the arguments of a function to be valid are usefully verified by assertions.
let f x =
assert (x >= 0);
...
Note as well that an assertion is often preferable to a comment because it's more trustworthy: an assertion is forced to be pertinent because it is verified upon each execution, while a comment can quickly become obsolete and then becomes actually detrimental to the comprehension of the program.
When writing difficult code, and particularly in case of highly imperative code with a lot of memory modifications (physical mutations in data structures), it is sometime mandatory to comment inside the body of functions to explain the implementation of the algorithm encoded here, or to follow successive modifications of invariants that the function must maintain. Once more, if there is some difficulty commenting is mandatory, for each program line if necessary.
It's hard to choose identifiers whose name evokes the meaning of the corresponding portion of the program. This is why you must devote particular care to this, emphasizing clarity and regularity of nomenclature.
Global identifiers (including especially the names of functions) can be long, because it's important to understand what purpose they serve far from their definition.
int_of_string,
not intOfString)
Case modifications are meaningful in Caml: in effect capitalized
words are reserved for constructors and module names in OCaml;
in contrast regular variables (functions or identifiers) must
start by a lowercase letter. Those rules prevent proper usage of
case modification for words separation in identifiers: the first word
starts the identifier, hence it must be lower case and it is forbidden
to choose IntOfString as the name of a function.
If necessary, make this nomenclature explicit in a comment at the top of the file); if there are several arguments with the same meaning then attach numeral suffixes to them.
This augments regularity of style. Avoid using identifiers
whose appearance can lead to confusion such as l or
O, easy to confuse with 1 and 0.
Example:
let add_expression expr1 expr2 = ...
let print_expression expr = ...
An exception to the recommendation not to use capitalization to separate words within identifiers is tolerated in the case of interfacing with existing libraries which use this naming convention: this lets Caml users of the library to orient themselves in the original library documentation more easily.
Parentheses are meaningful: they indicate the necessity of using an unusual precedence. So they should be used wisely and not sprinkled randomly throughout programs. To this end, you should know the usual precedences, that is, the combinations of operations which do not require parentheses. Quite fortunately this is not complicated if you know a little mathematics or strive to follow the following rules:
For example:
1 + 2 * x means 1 + (2 * x).
In mathematics you write
sin x to mean sin (x). In the same way
sin x + cos x means
(sin x) + (cos x) not
sin (x + (cos x)). Use the same conventions in Caml: write
f x + g x to mean (f x) + (g x).
This convention generalizes to all (infix) operators:
f x :: g x means (f x) :: (g x),
f x @ g x means (f x) @ (g x), and
failwith s ^ s' means (failwith s) ^ s',
not failwith (s ^ s').
Comparisons are infix operators, so
the preceding rules apply.
This is why f x < g x means
(f x) < (g x).
For type reasons (no other sensible interpretation) the expression
f x < x + 2 means (f x) < (x + 2).
In the same way f x < x + 2 && x > 3 means
((f x) < (x + 2)) && (x > 3).
Although mathematicians have a tendency to overuse parens in
this case, the boolean ``or'' operator is analogous to addition and the
``and'' to multiplication. So, just as
1 + 2 * x means
1 + (2 * x),
true || false && x
means
true || (false && x).
When it is necessary to delimit syntactic constructs in programs, use as
delimiters the keywords
begin
and end rather than parentheses.
However using parentheses is acceptable if you do it in a consistent,
that is, systematic, way.
This explicit delimiting of constructs essentially concerns
pattern-matching constructs or sequences embedded
within if then
else constructs.
match construct in
a match construct
When a match ... with or
try ... with construct
appears in a pattern-matching clause, it is absolutely necessary to
delimit this embedded construct (otherwise subsequent clauses of the
enclosing pattern-matching construct will automatically be associated
with the enclosed pattern-matching construct). For example:
match x with
| 1 ->
begin match y with
| ...
end
| 2 ->
...
if
In the same way, a sequence which appears in
the then
or else part of a conditional must be
delimited:
if cond then begin
e1;
e2
end else begin
e3;
e4
end
You must subdivide your programs into coherent modules.
For each module, you must explicitly write an interface.
For each interface, you must document the things defined by the module: functions, types, exceptions, etc.
Avoid open directives, using instead
the qualified identifier notation. Thus you will prefer short but
meaningful module names.
Justification: The use of unqualified identifiers is ambiguous and gives rise to difficult-to-detect semantic errors.
let lim = String.length name - 1 in
...
let lim = Array.length v - 1 in
...
... List.map succ ...
... Array.map succ ...
You can consider it normal to open a module which modifies the environment,
and brings other versions of an important set of functions. For example, the
Format module provides automatically indented printing. This
module redefines the usual printing functions print_string,
print_int, print_float, etc. So when you use
Format, open it systematically at the top of the file.
If you don't open Format you could miss the
qualification of a printing function, and this could be perfectly
silent, since many Format's functions have a
correspondent in the default environment
(Pervasives). Mixing printing functions from
Format and Pervasives leads to subtle bugs
in the display, that are difficult to trace. For instance:
let f () =
Format.print_string "Hello World!"; print_newline ()
is bogus since it does not call Format.print_newline to
flush the pretty-printer queue and output "Hello World!".
Instead "Hello World!" is stuck into the pretty-printer
queue, while Pervasives.print_newline outputs a carriage
return on the standard output ... If Format is printing
on a file and standard output is the terminal, the user will have a
bad time finding that a carriage return is missing in the file (and
the display of material on the file is strange, since boxes that
should be closed by Format.print_newline are still open),
while a spurious carriage return appeared on the screen!
For the same reason, open large libraries such as the one with arbitrary-precision integers so as not to burden the program which uses them.
open Num
let rec fib n =
if n <= 2 then Int 1 else fib (n - 1) +/ fib (n - 2)
Justification: The program would be less readable if you had to qualify all the identifiers.
In a program where type definitions are shared, it is good to gather these definitions into one or more module(s) without implementations (containing only types). Then it's acceptable to systematically open the module which exports the shared type definitions.
Complete them with care, without using a “catch-all” clause such
as
| _ -> ... or | x -> ... when it's
possible to do without it (for example when matching a concrete type
defined within the program). See also the next section:
compiler warnings.
Compiler warnings are meant to prevent potential errors; this is why you absolutely must heed them and correct your programs if compiling them produces such warnings. Besides, programs whose compilation produces warnings have an odor of amateurism which certainly doesn't suit your own work!
Warnings about pattern-matching must be treated with the upmost care:
| _ -> ... , but with an
explicit list of the constructors not examined by the rest of the
construct, for example | Cn _ | Cn1 _ -> ... .
Justification: It's not really any more complicated to write it this way, and this allows the program to evolve more safely. In effect the addition of a new constructor to the datatype being matched will produce an alert anew, which will allow the programmer to add a clause corresponding to the new constructor if that is warranted. On the contrary, the ``catch-all'' clause will make the function compile silently and it might be thought that the function is correct as the new constructor will be handled by the default case.
let bindings
[Translator's note: a
``de-structuring let binding'' is one
which binds several names to several expressions simultaneously. You
pack all the names you want bound into a collection such as a tuple or
a list, and you correspondingly pack all the expressions into a
collective expression. When the
let binding is evaluated, it unpacks
the collections on both sides and binds each expression to its
corresponding name. For example,
let x, y = 1, 2 is a
de-structuring let binding which
performs both the bindings let x = 1
and let y = 2 simultaneously.]
The let binding is not limited to
simple identifier definitions: you can use it with more complex or
simpler patterns. For instance
let with complex patterns:let [x; y] as l = ...l and its two elements
x and y.
let with simple pattern:let _ = ... does not define anything,
it just evaluate the expression on the right hand side of
the = symbol.
let must be
exhaustive
Only use de-structuring let bindings
in the case where the pattern-matching is exhaustive (the pattern
can never fail to match). Typically, you will thus be limited to
definitions of product types (tuples or records) or definitions of
variant type with a single case. In any other case, you should use
an explicit match
... with construct.
let ... in:
de-structuring let that give a
warning must be replaced by an explicit pattern matching. For
instance, instead of
let [x; y] as l = List.map succ
(l1 @ l2) in expression write:
match List.map succ (l1 @ l2) with
| [x; y] as l -> expression
| _ -> assert false
let x, y, l =
match List.map succ (l1 @ l2) with
| [x; y] as l -> x, y, l
| _ -> assert false
Justification: There is no way to make the pattern-matching exhaustive if you use general de-structuring
letbindings.
let _ = ...When the compiler emits a warning about the type of an expression in a sequence, you have to explicitly indicate that you want to ignore the result of this expression. To this end:
List.map f l;
print_newline ()
write
let _ = List.map f l in
print_newline ()
ignore : 'a
-> unit that ignores its argument to
return unit.
ignore (List.map f l);
print_newline ()
iter instead of
map, and simply write
List.iter f l;
print_newline ()
In actual programs, the suitable (side-effect only) function may
not exist and has to be written: very often, a careful separation of
the procedural part from the functional part of the function at hand
elegantly solves the problem, and the resulting program just looks
better afterwards! For instance, you would turn the problematic
definition:
let add x y =
if x > 1 then print_int x;
print_newline ();
x + y;;
into the clearer separate definitions
and change old calls to add accordingly.
In any case, use the let _ = ...
construction exactly in those cases where you want to ignore a
result. Don't systematically replace sequences with this
construction.
Justification: Sequences are much clearer! Compare
e1; e2; e3tolet _ = e1 in let _ = e2 in e3
hd
and tl functions
Don't use the hd and tl functions, but
pattern-match the list argument explicitly.
Justification: This is just as brief as and much clearer than using
hdandtlwhich must of necessity be protected bytry... with...to catch the exception which might be raised by these functions.
for loops
To simply traverse an array or a string, use
a for loop.
for i = 0 to Array.length v - 1 do
...
done
If the loop is complex or returns a result, use a recursive function.
let find_index e v =
let rec loop i =
if i >= Array.length v then raise Not_found else
if v.(i) = e then i else loop (i + 1) in
loop 0;;
Justification: The recursive function lets you code any loop whatsoever simply, even a complex one, for example with multiple exit points or with strange index steps (steps depending on a data value for example).
Besides, the recursive loop avoids the use of mutables whose value can be modified in any part of the body of the loop whatsoever (or even outside): on the contrary the recursive loop explicitly takes as arguments the values susceptible to change during the recursive calls.
while loopsWhile loops law: Beware: usually a while loop is wrong, unless its loop invariant has been explicitly written.
The main use of the while loop is the
infinite loop
while true do
.... You get out of it through an exception, generally on
termination of the program.
Other while loops are hard to use,
unless they come from canned programs from algorithms courses where
they were proved.
Justification:
whileloops require one or more mutables in order that the loop condition change value and the loop finally terminate. To prove their correctness, you must therefore discover the loop invariants, an interesting but difficult sport.
Don't be afraid to define your own exceptions in your programs, but
on the other hand use as much as possible the exceptions predefined by
the system. For example every search function which fails should
raise the predefined exception Not_found. Be careful to
handle the exceptions which may be raised by a function call with the
help of a try ... with.
Handling all exceptions by try
... with _ -> is usually reserved for the
main function of the program. If you need to catch all exceptions to
maintain an invariant of an algorithm, be careful to name the
exception and re-raise it, after having reset the invariant.
Typically:
let ic = open_in ...
and oc = open_out ... in
try
treatment ic oc;
close_in ic; close_out oc
with x -> close_in ic; close_out oc; raise x
Justification:
try ... with _ ->silently catches all exceptions, even those which have nothing to do with the computation at hand (for example an interruption will be captured and the computation will continue anyway!).
One of the great strengths of Caml is the power of the data structures which can be defined and the simplicity of manipulating them. So you must take advantage of this to the fullest extent; don't hesitate to define your own data structures. In particular, don't systematically represent enumerations by whole numbers, nor enumerations with two cases by booleans. Examples:
type figure =
| Triangle | Square | Circle | Parallelogram
type convexity =
| Convex | Concave | Other
type type_of_definition =
| Recursive | Non_recursive
Justification: A boolean value often prevents intuitive understanding of the corresponding code. For example, if
type_of_definitionis coded by a boolean, what doestruesignify? A “normal” definition (that is, non-recursive) or a recursive definition?In the case of an enumerated type encode by an integer, it is very difficult to limit the range of acceptable integers: one must define construction functions that will ensure the mandatory invariants of the program (and verify afterwards that no values has been built directly), or add assertions in the program and guards in pattern matchings. This is not good practice, when the definition of a sum type elegantly solves this problem, with the additional benefit of firing the full power of pattern matching and compiler's verifications of exhaustiveness.
Criticism: For binary enumerations, one can systematically define predicates whose names carry the semantics of the boolean that implements the type. For instance, we can adopt the convention that a predicate ends by the letter
p. Then, in place of defining a new sum type fortype_of_definition, we will use a predicate functionrecursivepthat returns true if the definition is recursive.
Answer: This method is specific to binary enumeration and cannot be easily extended; moreover it is not well suited to pattern matching. For instance, for definitions encoded by
| Let of bool * string * expressiona typical pattern matching would look like:| Let (_, v, e) as def -> if recursivep def then code_for_recursive_case else code_for_non_recursive_caseor, if
recursivepcan be applied to booleans:| Let (b, v, e) -> if recursivep def then code_for_recursive_case else code_for_non_recursive_casecontrast with an explicit encoding:
| Let (Recursive, v, e) -> code_for_recursive_case | Let (Non_recursive, v, e) -> code_for_non_recursive_caseThe difference between the two programs is subtle and you may think that this is just a matter of taste; however the explicit encoding is definitively more robust to modifications and fits better with the language.
A contrario, it is not necessary to systematically define
new types for boolean flags, when the interpretation of constructors
true
and false is clear. The usefulness of
the definition of the following types is then questionable:
type switch = On | Off
type bit = One | Zero
The same objection is admissible for enumerated types represented as integers, when those integers have an evident interpretation with respect to the data to be represented.
Mutable values are useful and sometimes indispensable to simple and clear programming. Nevertheless, you must use them with discernment: Caml's normal data structures are immutable. They are to be preferred for the clarity and safety of programming which they allow.
Caml's iterators are a powerful and useful feature. However you should not overuse them, nor a contrario neglect them: they are provided to you by libraries and have every chance of being correct and well-thought-out by the author of the library. So it's useless to reinvent them.
So write
let square_elements elements = List.map carre elements
rather than:
let rec square_elements = function
| [] -> []
| elem :: elements -> square elem :: square_elements elements
On the other hand avoid writing:
let iterator f x l =
List.fold_right (List.fold_left f) [List.map x l] l
even though you get:
# let iterator f x l = List.fold_right (List.fold_left f) [List.map x l] l;;
val iterator : ('a list -> 'a -> 'a list) -> ('a -> 'a) -> 'a list -> 'a list = <fun> # iterator (fun l x -> x :: l) (fun l -> List.rev l) [[1; 2; 3]];;
- : int list list = [[1; 2; 3]; [3; 2; 1]]
In case of express need, you must be careful to add an explanatory comment: in my opinion it's absolutely necessary!
Pseudo law of optimization: No optimization a priori.
No optimization a posteriori either.
Above all program simply and clearly. Don't start optimizing until the program bottleneck has been identified (in general a few routines). Then optimization consists above all of changing the complexity of the algorithm used. This often happens through redefining the data structures being manipulated and completely rewriting the part of the program which poses a problem.
Justification: Clarity and correctness of programs take precedence. Besides, in a substantial program, it is practically impossible to identify a priori the parts of the program whose efficiency is of prime importance.
You should use OCaml classes when you need inheritance, that is, incremental refinement of data and their functionality.
You should use conventional data structures (in particular, variant types) when you need pattern-matching.
You should modules when the data structures are fixed and their functionality is equally fixed or it's enough to add new functions in the programs which use them.
The Caml language includes powerful constructs which allow simple and clear programming. The main problem to obtain crystal clear programs it to use them appropriately.
The language features numerous programming styles (or programming paradigms): imperative programming (based on the notion of state and assignment), functional programming (based on the notion of function, function results, and calculus), object oriented programming (based of the notion of objects encapsulating a state and some procedures or methods that can modify the state). The first work of the programmer is to choose the programming paradigm that fits the best the problem at hand. When using one of those programming paradigms, the difficulty is to use the language construct that expresses in the most natural and easiest way the computation that implements the algorithm.
Concerning programming styles, one can usually observe the two symmetrical problematic behaviors: on the one hand, the ``all imperative'' way (systematic usage of loops and assignment), and on the other hand the ``purely functional'' way (never use loops nor assignments); the ``100% object'' style will certainly appear in the next future, but (fortunately) it is too new to be discussed here.
let list_length l =
let l = ref l in
let res = ref 0 in
while !l <> [] do
incr res; l := List.tl !l
done;
!res;;
in place of the following recursive function, so simple and clear:
let rec list_length = function
| [] -> 0
| _ :: l -> 1 + list_length l
(For those that would contest the equivalence of those two
versions, see the note
below).
for loop to iter on the
element of a vector, but instead to use a complex
while loop, with one or two
references (too many useless assignments, too many opportunity
for errors).
mutable keyword in the record
type definitions should be implicit.
for loops with recursive
functions, usage of lists in contexts where imperative data
structures seem to be mandatory to anyone, passing numerous
global parameters of the problem to every functions, even if a
global reference would be perfect to avoid these spurious
parameters that are mainly invariants that must be passed all
over the place.
mutable keyword in the record
types definitions should be suppressed from the language.
The Caml language includes powerful constructs which allow simple and clear programming. However the power of these constructs also lets you write uselessly complicated code, to the point where you get a perfectly unreadable program.
Here are a number of known ways:
if then else,
as in
let flush_ps () =
if not !psused then psused := true
or (more subtle)
let sync b =
if !last_is_dvi <> b then last_is_dvi := b
let ... in by the
application of an anonymous function to an
argument. You would write(fun x y -> x + y)
e1 e2
instead of simply writing
let x = e1
and y = e2 in
x + y
let in
bindings.
List.map
or List.iter than to write
their equivalents in-line using specific recursive functions of
your own. Even worse, you don't use
List.map
or List.iter but write
their equivalents in terms of List.fold_right and
List.fold_left.
(fun u -> print_string "world"; print_string u)
(let temp = print_string "Hello"; "!" in
((fun x -> print_string x; flush stdout) " ";
temp));;
If you naturally write the program
print_string "Hello world!" in this way, you can without
a doubt submit your work to the Obfuscated OCaml Contest 〈Pierre.Weis(at)inria.fr〉.
We give here tips from veteran Caml programmers, which have served in developing the compilers which are good examples of large complex programs developed by small teams.
Many developers nurture a kind of veneration towards the Emacs editor (gnu-emacs in general) which they use to write their programs. The editor interfaces well with the language since it is capable of syntax coloring Caml source code (rendering different categories of words in color, coloring keywords for example).
The following two commands are considered indispensable:
CTRL-C-CTRL-C or Meta-X compile:
launches re-compilation from within the editor (using the
make command).
CTRL-X-`: puts the cursor in the file and at the
exact place where the Caml compiler has signaled an error.
Developers describe thus how to use these features:
CTRL-C-CTRL-C combination recompiles the whole
application; in case of errors, a succession of
CTRL-X-` commands permits correction of all the errors
signaled; the cycle begins again with a new re-compilation launched
by CTRL-C-CTRL-C.
The ESC-/ command (dynamic-abbrev-expand) automatically
completes the word in front of the cursor with one of the words
present in one of the files being edited. Thus this lets you always
choose meaningful identifiers without the tedium of having to type
extended names in your programs: the
ESC-/ easily completes the identifier after typing the first
letters. In case it brings up the wrong completion, each subsequent
ESC-/ proposes an alternate completion.
Under Unix, the CTRL-C-CTRL-C or Meta-X
compile combination, followed by CTRL-X-` is also
used to find all occurrences of a certain string in a Caml program.
Instead of launching make to recompile, you launch the
grep command; then all the ``error messages'' from
grep are compatible with the CTRL-X-` usage
which automatically takes you to the file and the place where the
string is found.
Under Unix: use the line editor ledit which offers
great editing capabilities ``à la emacs'' (including
ESC-/!), as well as a history mechanism which lets you
retrieve previously typed commands and even retrieve commands from one
session in another. ledit is written in Caml and can be
freely down-loaded
here.
The make utility is indispensable for managing the
compilation and re-compilation of programs. Sample make
files can be found
on The
Hump. You can also consult the Makefiles for the
Caml compilers.
Users of the Git software version
control system are never run out of good things to say about the
productivity gains it brings. This system supports managing
development by a team of programmers while imposing consistency
among them, and also maintains a log of changes made to the
software.
Git also supports simultaneous development by
several teams, possibly dispersed among several sites linked on the
Net.
An anonymous Git read-only mirror contains the working sources of the OCaml compilers, and the sources of other software related to OCaml.
list_length
The two versions of list_length are not completely
equivalent in term of complexity, since the imperative version uses
a constant amount of stack room to execute, whereas the functional
version needs to store return addresses of suspended recursive calls
(whose maximum number is equal to the length of the list argument).
If you want to retrieve a constant space requirement to run the
functional program you just have to write a function that is
recursive in its tail (or tail-rec), that is a function
that just ends by a recursive call (which is not the case here since
a call to
+ has to be perform after the recursive call has
returned). Just use an accumulator for intermediate results, as in:
let list_length l =
let rec loop accu = function
| [] -> accu
| _ :: l -> loop (accu + 1) l in
loop 0 l
This way, you get a program that has the same computational properties as the imperative program with the additional clarity and natural looking of an algorithm that performs pattern matching and recursive calls to handle an argument that belongs to a recursive sum data type.