Data Types and Matching
In this tutorial, we learn how to build our own types in OCaml and to write functions that process this new data.
Please note throughout this tutorial the code is written in the ocaml
toplevel.
Whereas #
denoted a comment and $
the command prompt in the
Up & Running document,
when in the ocaml
or utop
toplevel, the command prompt appears as a #
, as shown
in the following examples.
Also remember that an expression must end with ;;
for OCaml to evaluate it. Unless these examples start with a #
toplevel prompt and
end with ;;
, it isn't an expression to evaluate but rather an example of code structure.
Built-in Compound Types
We have already seen simple data types such as int
, float
, string
, and
bool
. Let's recap the built-in compound data types we can use in OCaml to
combine such values. First, we have lists, which are ordered collections of any
number of elements sharing the same type:
# [];;
- : 'a list = []
# [1; 2; 3];;
- : int list = [1; 2; 3]
# [[1; 2]; [3; 4]; [5; 6]];;
- : int list list = [[1; 2]; [3; 4]; [5; 6]]
# [false; true; false];;
- : bool list = [false; true; false]
Next, we have tuples, which collect a fixed number of elements together:
# (5.0, 6.5);;
- : float * float = (5., 6.5)
# (true, 0.0, 0.45, 0.73, "french blue");;
- : bool * float * float * float * string =
(true, 0., 0.45, 0.73, "french blue")
We have records, which are like labeled tuples. In tuples, a field sits at some place: first, second, third, etc. In records, a field has a name; order is no longer relevant. Therefore, a record definition is a set of field name and type:
# type point = {x : float; y : float};;
type point = { x : float; y : float; }
# let a = {x = 5.0; y = 6.5};;
val a : point = {x = 5.; y = 6.5}
# type colour = {websafe : bool; r : float; g : float; b : float; name : string};;
type colour = {
websafe : bool;
r : float;
g : float;
b : float;
name : string;
}
# let b = {websafe = true; r = 0.0; g = 0.45; b = 0.73; name = "french blue"};;
val b : colour =
{websafe = true; r = 0.; g = 0.45; b = 0.73; name = "french blue"}
A record must contain all fields:
# let c = {name = "puce"};;
Line 1, characters 9-24:
Error: Some record fields are undefined: websafe r g b
Records may be mutable:
# type person =
{first_name : string;
surname : string;
mutable age : int};;
type person = { first_name : string; surname : string; mutable age : int; }
# let birthday p =
p.age <- p.age + 1;;
val birthday : person -> unit = <fun>
Please note: in the above example, we chose "colour" as the name of a type that holds RGB-values. If you wanted to, you could also use the US spelling "color." Another mutable compound data type is the fixed-length array which, just as a list, must contain elements of like type. However, its elements may be accessed in constant time:
# let arr = [|1; 2; 3|];;
val arr : int array = [|1; 2; 3|]
# arr.(0);;
- : int = 1
# arr.(0) <- 0;;
- : unit = ()
# arr;;
- : int array = [|0; 2; 3|]
In this tutorial, we will define our own compound data types using the type
keyword, and we'll use some of these built-in structures as building blocks.
A Simple Custom Type
We can define a new data type colour
which can take one of four values.
type colour = Red | Green | Blue | Yellow
Our new type is called colour
, and has four constructors Red
, Green
,
Blue
and Yellow
. The name of the type must begin with a lower case letter,
and the names of the constructors with upper case letters. We can use our new
type anywhere a built-in type could be used:
# let additive_primaries = (Red, Green, Blue);;
val additive_primaries : colour * colour * colour = (Red, Green, Blue)
# let pattern = [(1, Red); (3, Green); (1, Red); (2, Green)];;
val pattern : (int * colour) list =
[(1, Red); (3, Green); (1, Red); (2, Green)]
Notice the types inferred by OCaml for these expressions. We can pattern-match on our new type, just as with any built-in type:
# let example c =
match c with
| Red -> "rose"
| Green -> "grass"
| Blue -> "sky"
| Yellow -> "banana";;
val example : colour -> string = <fun>
Notice the type of the function includes the name of our new type colour
. We
can make the function shorter and elide its parameter c
by using the
alternative function
keyword which allows direct matching:
# let example = function
| Red -> "rose"
| Green -> "grass"
| Blue -> "sky"
| Yellow -> "banana";;
val example : colour -> string = <fun>
We can match on more than one case at a time too:
# let rec is_primary = function
| Red | Green | Blue -> true
| _ -> false;;
val is_primary : colour -> bool = <fun>
Constructors With Data
Each constructor in a data type can carry additional information with it. Let's
extend our colour
type to allow arbitrary RGB triples, each element begin a
number from 0 (no colour) to 1 (full colour):
# type colour =
| Red
| Green
| Blue
| Yellow
| RGB of float * float * float;;
type colour = Red | Green | Blue | Yellow | RGB of float * float * float
# [Red; Blue; RGB (0.5, 0.65, 0.12)];;
- : colour list = [Red; Blue; RGB (0.5, 0.65, 0.12)]
Types, just like functions, may be recursively-defined. We extend our data type to allow mixing of colours:
# type colour =
| Red
| Green
| Blue
| Yellow
| RGB of float * float * float
| Mix of float * colour * colour;;
type colour =
Red
| Green
| Blue
| Yellow
| RGB of float * float * float
| Mix of float * colour * colour
# Mix (0.5, Red, Mix (0.5, Blue, Green));;
- : colour = Mix (0.5, Red, Mix (0.5, Blue, Green))
Here is a function over our new colour
data type:
# let rec rgb_of_colour = function
| Red -> (1.0, 0.0, 0.0)
| Green -> (0.0, 1.0, 0.0)
| Blue -> (0.0, 0.0, 1.0)
| Yellow -> (1.0, 1.0, 0.0)
| RGB (r, g, b) -> (r, g, b)
| Mix (p, a, b) ->
let (r1, g1, b1) = rgb_of_colour a in
let (r2, g2, b2) = rgb_of_colour b in
let mix x y = x *. p +. y *. (1.0 -. p) in
(mix r1 r2, mix g1 g2, mix b1 b2);;
val rgb_of_colour : colour -> float * float * float = <fun>
We can use records directly in the data type instead to label our components:
# type colour =
| Red
| Green
| Blue
| Yellow
| RGB of {r : float; g : float; b : float}
| Mix of {proportion : float; c1 : colour; c2 : colour};;
type colour =
Red
| Green
| Blue
| Yellow
| RGB of { r : float; g : float; b : float; }
| Mix of { proportion : float; c1 : colour; c2 : colour; }
Example: Trees
Data types may be polymorphic as well as recursive. Here is an OCaml data type for a binary tree carrying any kind of data:
# type 'a tree =
| Leaf
| Node of 'a tree * 'a * 'a tree;;
type 'a tree = Leaf | Node of 'a tree * 'a * 'a tree
# let t =
Node (Node (Leaf, 1, Leaf), 2, Node (Node (Leaf, 3, Leaf), 4, Leaf));;
val t : int tree =
Node (Node (Leaf, 1, Leaf), 2, Node (Node (Leaf, 3, Leaf), 4, Leaf))
Notice that we give the type parameter 'a
before the type name (if there is
more than one, we write ('a, 'b)
etc). A Leaf
holds no information,
just like an empty list. A Node
holds a left tree, a value of type 'a
,
and a right tree. In our example, we built an integer tree, but any type can be
used. Now we can write recursive and polymorphic functions over these trees by
pattern matching on our new constructors:
# let rec total = function
| Leaf -> 0
| Node (l, x, r) -> total l + x + total r;;
val total : int tree -> int = <fun>
# let rec flip = function
| Leaf -> Leaf
| Node (l, x, r) -> Node (flip r, x, flip l);;
val flip : 'a tree -> 'a tree = <fun>
Here, flip
is polymorphic while total
operates only on trees of type int tree
. Let's try our new functions out:
# let all = total t;;
val all : int = 10
# let flipped = flip t;;
val flipped : int tree =
Node (Node (Leaf, 4, Node (Leaf, 3, Leaf)), 2, Node (Leaf, 1, Leaf))
# t = flip flipped;;
- : bool = true
Instead of integers, we could build a tree of key-value pairs. Then, if we insist that the keys are unique and that a smaller key is always left of a larger key, we have a data structure for dictionaries which performs better than a simple list of pairs. It is known as a binary search tree:
# let rec insert (k, v) = function
| Leaf -> Node (Leaf, (k, v), Leaf)
| Node (l, (k', v'), r) ->
if k < k' then Node (insert (k, v) l, (k', v'), r)
else if k > k' then Node (l, (k', v'), insert (k, v) r)
else Node (l, (k, v), r);;
val insert : 'a * 'b -> ('a * 'b) tree -> ('a * 'b) tree = <fun>
Similar functions can be written to look up values in a dictionary to convert a list of pairs to or from a tree dictionary, and so on.
Example: Options
A simple, yet very common, use of polymorphic type with constructors is the
option
type. Like list
, it's predefined. Here is how:
# #show option;;
type 'a option = None | Some of 'a
option
used to wrap some data, if available, or absence of data otherwise.
Here is 42, stored inside an option
using the data carrying constructor
Some
:
# Some 42;;
- : int option = Some 42
The None
constructor means no data is available.
In other words, a value of type t option
for some type t
represents:
- either a value
v
of typet
, wrapped asSome v
- no such value, then
o
has the valueNone
The option type is very useful when lack of data is better handled as a special
value (i.e., None
) rather than an exception. It is the type-safe version of
returning error values such as in C, for instance. Since no data has any special
meaning, confusion between regular values and absence of value is impossible. In
computer science, this type is called the option
type. OCaml has supported option
since its inception.
The function Sys.getenv : string -> string
from the OCaml standard library
allows us to query the value of an environment variable; however, it throws an
exception if the variable is not defined. On the other hand, the function
Sys.getenv_opt : string -> string option
does the same, except it returns
None
as the variable is not defined. Here is what may happen if we try to
access an undefined environment variable:
# Sys.getenv "UNDEFINED_ENVIRONMENT_VARIABLE";;
Exception: Not_found.
# Sys.getenv_opt "UNDEFINED_ENVIRONMENT_VARIABLE";;
- : string option = None
Using pattern-matching, it is possible to define functions, allowing users to easily
work with option values. Here is map
of type ('a -> 'b) -> 'a option -> 'b option
. It allows us to apply a function to the value wrapped inside an option
,
if present:
# let map f = function
| None -> None
| Some v -> Some (f v);;
val map : ('a -> 'b) -> 'a option -> 'b option = <fun>
map
takes two parameters: the function f
to be applied and an option value.
map f o
returns Some (f v)
if o
is Some v
and None
if o
is None
.
Here is join
of type 'a option option -> 'a option
. It peels off one layer
from a doubly wrapped option:
# let join = function
| Some Some v -> Some v
| Some None -> None
| None -> None;;
val join : 'a option option -> 'a option = <fun>
join
takes a single option option
parameter and returns an option
parameter.
The function get
of type 'a option -> 'a
allows access to the value
contained inside an option
.
# let get = function
| Some v -> v
| None -> raise (Invalid_argument "option is None");;
val get : 'a option -> 'a = <fun>
But beware, get o
throws an exception if o
is None
. To access the content
of an option
without risking to raise an exception, the function value
of
type 'a option -> 'a -> 'a
can be used
# let value default = function
| Some v -> v
| None -> default;;
val value : 'a -> 'a option -> 'a = <fun>
However it takes a default value as an additional parameter.
The function fold
of type fold : ('a -> 'b) -> 'b -> 'a option -> 'b
combines map
and value
# let fold f default o = o |> map f |> value default;;
val fold : ('a -> 'b) -> 'b -> 'a option -> 'b = <fun>
To build a function going the other way round, which creates an option
one can
define unfold
of type ('a -> bool) -> ('a -> 'b) -> 'a -> 'b option
the
following way:
# let unfold p f x =
if p x then
Some (f x)
else
None;;
val unfold : ('a -> bool) -> ('a -> 'b) -> 'a -> 'b option = <fun>
Most of those functions as well as other useful ones are provided by the
OCaml standard library in the Stdlib.Option
supporting module.
By the way, any type where map
and join
functions can be implemented, with
similar behaviour, can be called a monad, and option
is often used to
introduce monads. But don't freak out! You absolutely don't need to know what a
monad is to use the option
type.
Example: Mathematical Expressions
We wish to represent simple mathematical expressions like n * (x + y)
and
multiply them out symbolically to get n * x + n * y
.
Let's define a type for these expressions:
type expr =
| Plus of expr * expr (* a + b *)
| Minus of expr * expr (* a - b *)
| Times of expr * expr (* a * b *)
| Divide of expr * expr (* a / b *)
| Var of string (* "x", "y", etc. *);;
The expression n * (x + y)
would be written:
# Times (Var "n", Plus (Var "x", Var "y"));;
- : expr = Times (Var "n", Plus (Var "x", Var "y"))
Let's write a function which prints out Times (Var "n", Plus (Var "x", Var "y"))
as something more like n * (x + y)
.
# let rec to_string e =
match e with
| Plus (left, right) ->
"(" ^ to_string left ^ " + " ^ to_string right ^ ")"
| Minus (left, right) ->
"(" ^ to_string left ^ " - " ^ to_string right ^ ")"
| Times (left, right) ->
"(" ^ to_string left ^ " * " ^ to_string right ^ ")"
| Divide (left, right) ->
"(" ^ to_string left ^ " / " ^ to_string right ^ ")"
| Var v -> v;;
val to_string : expr -> string = <fun>
# let print_expr e =
print_endline (to_string e);;
val print_expr : expr -> unit = <fun>
(The ^
operator concatenates strings). We separate it into two functions, so
our to_string
function is usable in other contexts. Here's the
print_expr
function in action:
# print_expr (Times (Var "n", Plus (Var "x", Var "y")));;
(n * (x + y))
- : unit = ()
We can write a function to multiply out expressions of the form n * (x + y)
or (x + y) * n
, and for this we will use a nested pattern:
# let rec multiply_out e =
match e with
| Times (e1, Plus (e2, e3)) ->
Plus (Times (multiply_out e1, multiply_out e2),
Times (multiply_out e1, multiply_out e3))
| Times (Plus (e1, e2), e3) ->
Plus (Times (multiply_out e1, multiply_out e3),
Times (multiply_out e2, multiply_out e3))
| Plus (left, right) ->
Plus (multiply_out left, multiply_out right)
| Minus (left, right) ->
Minus (multiply_out left, multiply_out right)
| Times (left, right) ->
Times (multiply_out left, multiply_out right)
| Divide (left, right) ->
Divide (multiply_out left, multiply_out right)
| Var v -> Var v;;
val multiply_out : expr -> expr = <fun>
Here it is in action:
# print_expr (multiply_out (Times (Var "n", Plus (Var "x", Var "y"))));;
((n * x) + (n * y))
- : unit = ()
How does the multiply_out
function work? The key is in the first two
patterns. The first pattern is Times (e1, Plus (e2, e3))
which matches
expressions of the form e1 * (e2 + e3)
. Now look at the right-hand side of
this first pattern match and convince yourself that it is the equivalent of
(e1 * e2) + (e1 * e3)
. The second pattern does the same thing, except for
expressions of the form (e1 + e2) * e3
.
The remaining patterns don't change the expressions's form, but they
crucially do call the multiply_out
function recursively on their
subexpressions. This ensures that all subexpressions within the expression get
multiplied out too (if you only wanted to multiply out the very top level of an
expression, then you could replace all the remaining patterns with a simple e -> e
rule).
Can we do the reverse (i.e., factorizing out common subexpressions)? We can! (But it's a bit more complicated). The following version only works for the top level expression. You could certainly extend it to cope with all levels of an expression and more complex cases:
# let factorize e =
match e with
| Plus (Times (e1, e2), Times (e3, e4)) when e1 = e3 ->
Times (e1, Plus (e2, e4))
| Plus (Times (e1, e2), Times (e3, e4)) when e2 = e4 ->
Times (Plus (e1, e3), e4)
| e -> e;;
val factorize : expr -> expr = <fun>
# factorize (Plus (Times (Var "n", Var "x"),
Times (Var "n", Var "y")));;
- : expr = Times (Var "n", Plus (Var "x", Var "y"))
The factorize function above introduces another couple of features. You can add
what are known as guards to each pattern match. A guard is the conditional
which follows the when
, and it means that the pattern match only happens if
the pattern matches and the condition in the when
-clause is satisfied.
The second feature is the =
operator, which tests for "structural equality"
between two expressions. That means it goes recursively into each expression,
checking they're exactly the same at all levels.
Another useful feature for building more complicated nested patterns
is the as
keyword, which can be used to name part of an expression. For
example:
| Name ("/DeviceGray" | "/DeviceRGB" | "/DeviceCMYK") as n -> n
| Node (l, ((k, _) as pair), r) when k = k' -> Some pair
Mutually Recursive Data Types
Data types may be mutually-recursive when declared with and
:
type t = A | B of t' and t' = C | D of t
One common use for mutually-recursive data types is to decorate a tree by adding information to each node using mutually-recursive types, one of which is a tuple or record. For example:
# type t' = Int of int | Add of t * t
and t = {annotation : string; data : t'}
Values of such mutually-recursive data type are manipulated by accompanying mutually-recursive functions. Add the rest of this code to the above block:
# let rec sum_t' = function
| Int i -> i
| Add (i, i') -> sum_t i + sum_t i'
and sum_t {annotation; data} =
if annotation <> "" then Printf.printf "Touching %s\n" annotation;
sum_t' data;;
val sum_t' : t' -> int = <fun>
val sum_t : t -> int = <fun>
A Note on Tupled Constructors
There is a difference between RGB of float * float * float
and
RGB of (float * float * float)
.
The first is a constructor with three pieces of data
associated with it, and the second is a constructor with one tuple associated with
it. There are two ways this matters: the memory layout differs between the two
(a tuple is an extra indirection) and the ability to create or match using a
tuple:
# type t = T of int * int;;
type t = T of int * int
# type t2 = T2 of (int * int);;
type t2 = T2 of (int * int)
# let pair = (1, 2);;
val pair : int * int = (1, 2)
# T2 pair;;
- : t2 = T2 (1, 2)
# T pair;;
Error: The constructor T expects 2 argument(s),
but is applied here to 1 argument(s)
# match T2 (1, 2) with T2 x -> fst x;;
- : int = 1
# match T (1, 2) with T x -> fst x;;
Line 1, characters 21-24:
Error: The constructor T expects 2 argument(s),
but is applied here to 1 argument(s)
Please note, however, that OCaml allows us to use the always-matching _
in either
version:
# match T2 (1, 2) with T2 _ -> 0;;
- : int = 0
# match T (1, 2) with T _ -> 0;;
- : int = 0
Types and Modules
Often, a module will provide a single type and one or more operations on that type. For
example, a module for a file format like PNG might have the following png.mli
interface. Below is an example of how to structure the code:
type t
val of_file : filename -> t
val to_file : t -> filename -> unit
val flip_vertical : t -> t
val flip_horizontal : t -> t
val rotate : float -> t -> t
Traditionally, we name the type t
. In the program using this library, it
would then be Png.t
, which is shorter, reads better than Png.png
, and avoids
confusion if the library also defines other types.
Help Improve Our Documentation
All OCaml docs are open source. See something that's wrong or unclear? Submit a pull request.
Contribute