Data Types and Matching

In this tutorial, we learn how to build our own types in OCaml and to write functions that process this new data.

Please note throughout this tutorial the code is written in the ocaml toplevel. Whereas # denoted a comment and $ the command prompt in the Up & Running document, when in the ocaml or utop toplevel, the command prompt appears as a #, as shown in the following examples.

Also remember that an expression must end with ;; for OCaml to evaluate it. Unless these examples start with a # toplevel prompt and end with ;;, it isn't an expression to evaluate but rather an example of code structure.

Built-in Compound Types

We have already seen simple data types such as int, float, string, and bool. Let's recap the built-in compound data types we can use in OCaml to combine such values. First, we have lists, which are ordered collections of any number of elements sharing the same type:

# [];;
- : 'a list = []
# [1; 2; 3];;
- : int list = [1; 2; 3]
# [[1; 2]; [3; 4]; [5; 6]];;
- : int list list = [[1; 2]; [3; 4]; [5; 6]]
# [false; true; false];;
- : bool list = [false; true; false]

Next, we have tuples, which collect a fixed number of elements together:

# (5.0, 6.5);;
- : float * float = (5., 6.5)
# (true, 0.0, 0.45, 0.73, "french blue");;
- : bool * float * float * float * string =
(true, 0., 0.45, 0.73, "french blue")

We have records, which are like labeled tuples. In tuples, a field sits at some place: first, second, third, etc. In records, a field has a name; order is no longer relevant. Therefore, a record definition is a set of field name and type:

# type point = {x : float; y : float};;
type point = { x : float; y : float; }
# let a = {x = 5.0; y = 6.5};;
val a : point = {x = 5.; y = 6.5}
# type colour = {websafe : bool; r : float; g : float; b : float; name : string};;
type colour = {
  websafe : bool;
  r : float;
  g : float;
  b : float;
  name : string;
}
# let b = {websafe = true; r = 0.0; g = 0.45; b = 0.73; name = "french blue"};;
val b : colour =
  {websafe = true; r = 0.; g = 0.45; b = 0.73; name = "french blue"}

A record must contain all fields:

# let c = {name = "puce"};;
Line 1, characters 9-24:
Error: Some record fields are undefined: websafe r g b

Records may be mutable:

# type person =
  {first_name : string;
   surname : string;
   mutable age : int};;
type person = { first_name : string; surname : string; mutable age : int; }
# let birthday p =
  p.age <- p.age + 1;;
val birthday : person -> unit = <fun>

Please note: in the above example, we chose "colour" as the name of a type that holds RGB-values. If you wanted to, you could also use the US spelling "color." Another mutable compound data type is the fixed-length array which, just as a list, must contain elements of like type. However, its elements may be accessed in constant time:

# let arr = [|1; 2; 3|];;
val arr : int array = [|1; 2; 3|]
# arr.(0);;
- : int = 1
# arr.(0) <- 0;;
- : unit = ()
# arr;;
- : int array = [|0; 2; 3|]

In this tutorial, we will define our own compound data types using the type keyword, and we'll use some of these built-in structures as building blocks.

A Simple Custom Type

We can define a new data type colour which can take one of four values.

type colour = Red | Green | Blue | Yellow

Our new type is called colour, and has four constructors Red, Green, Blue and Yellow. The name of the type must begin with a lower case letter, and the names of the constructors with upper case letters. We can use our new type anywhere a built-in type could be used:

# let additive_primaries = (Red, Green, Blue);;
val additive_primaries : colour * colour * colour = (Red, Green, Blue)
# let pattern = [(1, Red); (3, Green); (1, Red); (2, Green)];;
val pattern : (int * colour) list =
  [(1, Red); (3, Green); (1, Red); (2, Green)]

Notice the types inferred by OCaml for these expressions. We can pattern-match on our new type, just as with any built-in type:

# let example c =
  match c with
  | Red -> "rose"
  | Green -> "grass"
  | Blue -> "sky"
  | Yellow -> "banana";;
val example : colour -> string = <fun>

Notice the type of the function includes the name of our new type colour. We can make the function shorter and elide its parameter c by using the alternative function keyword which allows direct matching:

# let example = function
  | Red -> "rose"
  | Green -> "grass"
  | Blue -> "sky"
  | Yellow -> "banana";;
val example : colour -> string = <fun>

We can match on more than one case at a time too:

# let rec is_primary = function
  | Red | Green | Blue -> true
  | _ -> false;;
val is_primary : colour -> bool = <fun>

Constructors With Data

Each constructor in a data type can carry additional information with it. Let's extend our colour type to allow arbitrary RGB triples, each element begin a number from 0 (no colour) to 1 (full colour):

# type colour =
  | Red
  | Green
  | Blue
  | Yellow
  | RGB of float * float * float;;
type colour = Red | Green | Blue | Yellow | RGB of float * float * float

# [Red; Blue; RGB (0.5, 0.65, 0.12)];;
- : colour list = [Red; Blue; RGB (0.5, 0.65, 0.12)]

Types, just like functions, may be recursively-defined. We extend our data type to allow mixing of colours:

# type colour =
  | Red
  | Green
  | Blue
  | Yellow
  | RGB of float * float * float
  | Mix of float * colour * colour;;
type colour =
    Red
  | Green
  | Blue
  | Yellow
  | RGB of float * float * float
  | Mix of float * colour * colour
# Mix (0.5, Red, Mix (0.5, Blue, Green));;
- : colour = Mix (0.5, Red, Mix (0.5, Blue, Green))

Here is a function over our new colour data type:

# let rec rgb_of_colour = function
  | Red -> (1.0, 0.0, 0.0)
  | Green -> (0.0, 1.0, 0.0)
  | Blue -> (0.0, 0.0, 1.0)
  | Yellow -> (1.0, 1.0, 0.0)
  | RGB (r, g, b) -> (r, g, b)
  | Mix (p, a, b) ->
      let (r1, g1, b1) = rgb_of_colour a in
      let (r2, g2, b2) = rgb_of_colour b in
      let mix x y = x *. p +. y *. (1.0 -. p) in
        (mix r1 r2, mix g1 g2, mix b1 b2);;
val rgb_of_colour : colour -> float * float * float = <fun>

We can use records directly in the data type instead to label our components:

# type colour =
  | Red
  | Green
  | Blue
  | Yellow
  | RGB of {r : float; g : float; b : float}
  | Mix of {proportion : float; c1 : colour; c2 : colour};;
type colour =
    Red
  | Green
  | Blue
  | Yellow
  | RGB of { r : float; g : float; b : float; }
  | Mix of { proportion : float; c1 : colour; c2 : colour; }

Example: Trees

Data types may be polymorphic as well as recursive. Here is an OCaml data type for a binary tree carrying any kind of data:

# type 'a tree =
  | Leaf
  | Node of 'a tree * 'a * 'a tree;;
type 'a tree = Leaf | Node of 'a tree * 'a * 'a tree
# let t =
    Node (Node (Leaf, 1, Leaf), 2, Node (Node (Leaf, 3, Leaf), 4, Leaf));;
val t : int tree =
  Node (Node (Leaf, 1, Leaf), 2, Node (Node (Leaf, 3, Leaf), 4, Leaf))

Notice that we give the type parameter 'a before the type name (if there is more than one, we write ('a, 'b) etc). A Leaf holds no information, just like an empty list. A Node holds a left tree, a value of type 'a, and a right tree. In our example, we built an integer tree, but any type can be used. Now we can write recursive and polymorphic functions over these trees by pattern matching on our new constructors:

# let rec total = function
  | Leaf -> 0
  | Node (l, x, r) -> total l + x + total r;;
val total : int tree -> int = <fun>
# let rec flip = function
  | Leaf -> Leaf
  | Node (l, x, r) -> Node (flip r, x, flip l);;
val flip : 'a tree -> 'a tree = <fun>

Here, flip is polymorphic while total operates only on trees of type int tree. Let's try our new functions out:

# let all = total t;;
val all : int = 10
# let flipped = flip t;;
val flipped : int tree =
  Node (Node (Leaf, 4, Node (Leaf, 3, Leaf)), 2, Node (Leaf, 1, Leaf))
# t = flip flipped;;
- : bool = true

Instead of integers, we could build a tree of key-value pairs. Then, if we insist that the keys are unique and that a smaller key is always left of a larger key, we have a data structure for dictionaries which performs better than a simple list of pairs. It is known as a binary search tree:

# let rec insert (k, v) = function
  | Leaf -> Node (Leaf, (k, v), Leaf)
  | Node (l, (k', v'), r) ->
      if k < k' then Node (insert (k, v) l, (k', v'), r)
      else if k > k' then Node (l, (k', v'), insert (k, v) r)
      else Node (l, (k, v), r);;
val insert : 'a * 'b -> ('a * 'b) tree -> ('a * 'b) tree = <fun>

Similar functions can be written to look up values in a dictionary to convert a list of pairs to or from a tree dictionary, and so on.

Example: Options

A simple, yet very common, use of polymorphic type with constructors is the option type. Like list, it's predefined. Here is how:

# #show option;;
type 'a option = None | Some of 'a

option used to wrap some data, if available, or absence of data otherwise. Here is 42, stored inside an option using the data carrying constructor Some:

# Some 42;;
- : int option = Some 42

The None constructor means no data is available.

In other words, a value of type t option for some type t represents:

  • either a value v of type t, wrapped as Some v
  • no such value, then o has the value None

The option type is very useful when lack of data is better handled as a special value (i.e., None) rather than an exception. It is the type-safe version of returning error values such as in C, for instance. Since no data has any special meaning, confusion between regular values and absence of value is impossible. In computer science, this type is called the option type. OCaml has supported option since its inception.

The function Sys.getenv : string -> string from the OCaml standard library allows us to query the value of an environment variable; however, it throws an exception if the variable is not defined. On the other hand, the function Sys.getenv_opt : string -> string option does the same, except it returns None as the variable is not defined. Here is what may happen if we try to access an undefined environment variable:

# Sys.getenv "UNDEFINED_ENVIRONMENT_VARIABLE";;
Exception: Not_found.
# Sys.getenv_opt "UNDEFINED_ENVIRONMENT_VARIABLE";;
- : string option = None

Using pattern-matching, it is possible to define functions, allowing users to easily work with option values. Here is map of type ('a -> 'b) -> 'a option -> 'b option. It allows us to apply a function to the value wrapped inside an option, if present:

# let map f = function
  | None -> None
  | Some v -> Some (f v);;
val map : ('a -> 'b) -> 'a option -> 'b option = <fun>

map takes two parameters: the function f to be applied and an option value. map f o returns Some (f v) if o is Some v and None if o is None.

Here is join of type 'a option option -> 'a option. It peels off one layer from a doubly wrapped option:

# let join = function
  | Some Some v -> Some v
  | Some None -> None
  | None -> None;;
val join : 'a option option -> 'a option = <fun>

join takes a single option option parameter and returns an option parameter.

The function get of type 'a option -> 'a allows access to the value contained inside an option.

# let get = function
  | Some v -> v
  | None -> raise (Invalid_argument "option is None");;
val get : 'a option -> 'a = <fun>

But beware, get o throws an exception if o is None. To access the content of an option without risking to raise an exception, the function value of type 'a option -> 'a -> 'a can be used

# let value default = function
  | Some v -> v
  | None -> default;;
val value : 'a -> 'a option -> 'a = <fun>

However it takes a default value as an additional parameter.

The function fold of type fold : ('a -> 'b) -> 'b -> 'a option -> 'b combines map and value

# let fold f default o = o |> map f |> value default;;
val fold : ('a -> 'b) -> 'b -> 'a option -> 'b = <fun>

To build a function going the other way round, which creates an option one can define unfold of type ('a -> bool) -> ('a -> 'b) -> 'a -> 'b option the following way:

# let unfold p f x =
    if p x then
      Some (f x)
    else
      None;;
val unfold : ('a -> bool) -> ('a -> 'b) -> 'a -> 'b option = <fun>

Most of those functions as well as other useful ones are provided by the OCaml standard library in the Stdlib.Option supporting module.

By the way, any type where map and join functions can be implemented, with similar behaviour, can be called a monad, and option is often used to introduce monads. But don't freak out! You absolutely don't need to know what a monad is to use the option type.

Example: Mathematical Expressions

We wish to represent simple mathematical expressions like n * (x + y) and multiply them out symbolically to get n * x + n * y.

Let's define a type for these expressions:

type expr =
  | Plus of expr * expr        (* a + b *)
  | Minus of expr * expr       (* a - b *)
  | Times of expr * expr       (* a * b *)
  | Divide of expr * expr      (* a / b *)
  | Var of string              (* "x", "y", etc. *);;

The expression n * (x + y) would be written:

# Times (Var "n", Plus (Var "x", Var "y"));;
- : expr = Times (Var "n", Plus (Var "x", Var "y"))

Let's write a function which prints out Times (Var "n", Plus (Var "x", Var "y")) as something more like n * (x + y).

# let rec to_string e =
  match e with
  | Plus (left, right) ->
     "(" ^ to_string left ^ " + " ^ to_string right ^ ")"
  | Minus (left, right) ->
     "(" ^ to_string left ^ " - " ^ to_string right ^ ")"
  | Times (left, right) ->
   "(" ^ to_string left ^ " * " ^ to_string right ^ ")"
  | Divide (left, right) ->
   "(" ^ to_string left ^ " / " ^ to_string right ^ ")"
  | Var v -> v;;
val to_string : expr -> string = <fun>
# let print_expr e =
  print_endline (to_string e);;
val print_expr : expr -> unit = <fun>

(The ^ operator concatenates strings). We separate it into two functions, so our to_string function is usable in other contexts. Here's the print_expr function in action:

# print_expr (Times (Var "n", Plus (Var "x", Var "y")));;
(n * (x + y))
- : unit = ()

We can write a function to multiply out expressions of the form n * (x + y) or (x + y) * n, and for this we will use a nested pattern:

# let rec multiply_out e =
  match e with
  | Times (e1, Plus (e2, e3)) ->
     Plus (Times (multiply_out e1, multiply_out e2),
           Times (multiply_out e1, multiply_out e3))
  | Times (Plus (e1, e2), e3) ->
     Plus (Times (multiply_out e1, multiply_out e3),
           Times (multiply_out e2, multiply_out e3))
  | Plus (left, right) ->
     Plus (multiply_out left, multiply_out right)
  | Minus (left, right) ->
     Minus (multiply_out left, multiply_out right)
  | Times (left, right) ->
     Times (multiply_out left, multiply_out right)
  | Divide (left, right) ->
     Divide (multiply_out left, multiply_out right)
  | Var v -> Var v;;
val multiply_out : expr -> expr = <fun>

Here it is in action:

# print_expr (multiply_out (Times (Var "n", Plus (Var "x", Var "y"))));;
((n * x) + (n * y))
- : unit = ()

How does the multiply_out function work? The key is in the first two patterns. The first pattern is Times (e1, Plus (e2, e3)) which matches expressions of the form e1 * (e2 + e3). Now look at the right-hand side of this first pattern match and convince yourself that it is the equivalent of (e1 * e2) + (e1 * e3). The second pattern does the same thing, except for expressions of the form (e1 + e2) * e3.

The remaining patterns don't change the expressions's form, but they crucially do call the multiply_out function recursively on their subexpressions. This ensures that all subexpressions within the expression get multiplied out too (if you only wanted to multiply out the very top level of an expression, then you could replace all the remaining patterns with a simple e -> e rule).

Can we do the reverse (i.e., factorizing out common subexpressions)? We can! (But it's a bit more complicated). The following version only works for the top level expression. You could certainly extend it to cope with all levels of an expression and more complex cases:

# let factorize e =
  match e with
  | Plus (Times (e1, e2), Times (e3, e4)) when e1 = e3 ->
     Times (e1, Plus (e2, e4))
  | Plus (Times (e1, e2), Times (e3, e4)) when e2 = e4 ->
     Times (Plus (e1, e3), e4)
  | e -> e;;
val factorize : expr -> expr = <fun>
# factorize (Plus (Times (Var "n", Var "x"),
                   Times (Var "n", Var "y")));;
- : expr = Times (Var "n", Plus (Var "x", Var "y"))

The factorize function above introduces another couple of features. You can add what are known as guards to each pattern match. A guard is the conditional which follows the when, and it means that the pattern match only happens if the pattern matches and the condition in the when-clause is satisfied.

The second feature is the = operator, which tests for "structural equality" between two expressions. That means it goes recursively into each expression, checking they're exactly the same at all levels.

Another useful feature for building more complicated nested patterns is the as keyword, which can be used to name part of an expression. For example:

| Name ("/DeviceGray" | "/DeviceRGB" | "/DeviceCMYK") as n -> n

| Node (l, ((k, _) as pair), r) when k = k' -> Some pair

Mutually Recursive Data Types

Data types may be mutually-recursive when declared with and:

type t = A | B of t' and t' = C | D of t

One common use for mutually-recursive data types is to decorate a tree by adding information to each node using mutually-recursive types, one of which is a tuple or record. For example:

# type t' = Int of int | Add of t * t
  and t = {annotation : string; data : t'}

Values of such mutually-recursive data type are manipulated by accompanying mutually-recursive functions. Add the rest of this code to the above block:

# let rec sum_t' = function
  | Int i -> i
  | Add (i, i') -> sum_t i + sum_t i'
  and sum_t {annotation; data} =
    if annotation <> "" then Printf.printf "Touching %s\n" annotation;
    sum_t' data;;
val sum_t' : t' -> int = <fun>
val sum_t : t -> int = <fun>

A Note on Tupled Constructors

There is a difference between RGB of float * float * float and RGB of (float * float * float). The first is a constructor with three pieces of data associated with it, and the second is a constructor with one tuple associated with it. There are two ways this matters: the memory layout differs between the two (a tuple is an extra indirection) and the ability to create or match using a tuple:

# type t = T of int * int;;
type t = T of int * int

# type t2 = T2 of (int * int);;
type t2 = T2 of (int * int)

# let pair = (1, 2);;
val pair : int * int = (1, 2)

# T2 pair;;
- : t2 = T2 (1, 2)

# T pair;;
Error: The constructor T expects 2 argument(s),
       but is applied here to 1 argument(s)

# match T2 (1, 2) with T2 x -> fst x;;
- : int = 1

# match T (1, 2) with T x -> fst x;;
Line 1, characters 21-24:
Error: The constructor T expects 2 argument(s),
       but is applied here to 1 argument(s)

Please note, however, that OCaml allows us to use the always-matching _ in either version:

# match T2 (1, 2) with T2 _ -> 0;;
- : int = 0

# match T (1, 2) with T _ -> 0;;
- : int = 0

Types and Modules

Often, a module will provide a single type and one or more operations on that type. For example, a module for a file format like PNG might have the following png.mli interface. Below is an example of how to structure the code:

type t

val of_file : filename -> t

val to_file : t -> filename -> unit

val flip_vertical : t -> t

val flip_horizontal : t -> t

val rotate : float -> t -> t

Traditionally, we name the type t. In the program using this library, it would then be Png.t, which is shorter, reads better than Png.png, and avoids confusion if the library also defines other types.

Help Improve Our Documentation

All OCaml docs are open source. See something that's wrong or unclear? Submit a pull request.

Contribute