Sequences

Prerequisites

Introduction

Sequences are very much like lists. However, from a pragmatic perspective, one should imagine they can be either finite or infinite. That's the key intuition to understanding and using sequences. To achieve this, sequence elements are computed on demand and not stored in memory. Perhaps more frequently, sequences also allow for reducing memory consumption from linear to constant space

One way to look at a value of type 'a Seq.t is to consider it as a list, but it contains a twist when it's not empty: its tail is frozen. To understand this analogy, consider how sequences are defined in the Standard Library:

type 'a node =
  | Nil
  | Cons of 'a * 'a t
and 'a t = unit -> 'a node

This is the mutually recursive definition of two types: Seq.node, which is almost the same as list:

type 'a list =
  | []
  | (::) of 'a * 'a list

and Seq.t, which is merely a type alias for unit -> 'a Seq.node. The whole point of this definition is Seq.Cons second component's type, which is a function returning a sequence while its list counterpart is a list. Let's compare the constructors of list and Seq.node:

  1. Empty lists and sequences are defined the same way, a constructor without any parameters: Seq.Nil and [].
  2. Non-empty lists and sequences are both pairs whose former member is a piece of data.
  3. However, the latter member in lists is recursively a list, while in sequences, it is a function returning a Seq.node.

A value of type Seq.t is “frozen” because the data it contains isn't immediately available. A unit value has to be supplied to recover it, which we may see as “unfreezing.” However, unfreezing only gives access to the tip of the sequence, since the second argument of Seq.Cons is a function too.

Frozen-by-function tails explain why sequences may be considered potentially infinite. Until a Seq.Nil value has been found in the sequence, one can't say for sure if some will ever appear. The sequence could be a stream of incoming requests in a server, readings from an embedded sensor, or system logs. All have unforeseeable termination, and it is easier to consider them potentially infinite.

In OCaml, any value a of type t can be turned into a constant function by writing fun _ -> a or fun () -> a. The latter function is called a thunk. Using this terminology, Seq.t values are thunks. With the analogy used earlier, a is frozen in its thunk.

Constructing Sequences

With this understanding, we can manually construct a sequence like so:

let my_seq  =
  fun () ->
  Seq.Cons (1, fun () -> Seq.Cons (2, fun () -> Seq.Cons (3, fun () -> Seq.Nil)))

Note: The second component of each Seq.Con's tuple is a function. This has the effect of providing a means of acquiring a value rather than providing a value directly.

We can also construct sequences using functions. Here is how to build an infinite sequence of integers:

# let rec ints n : int Seq.t = fun () -> Seq.Cons (n, ints (n + 1));;
val ints : int -> int Seq.t = <fun>

The function ints n looks as if building the infinite sequence (n; n + 1; n + 2; n + 3;...). In reality, since machine integers have bounds, the sequence isn't indefinitely increasing. For technical reasons, when max_int is reached, it will circle down to min_int.

The OCaml Standard Library contains a module for sequences called Seq. It contains Seq.int, which we implemented above.

Iterating Over Sequences

The OCaml Standard Library also contains a Seq.iter function, which has the same behavior as List.iter. Writing this:

# Seq.iter print_int (ints 0);;

in an OCaml toplevel means “print integers forever,” and you have to press Ctrl-C to interrupt the execution. The following code is the same infinite loop without any output:

# Seq.iter ignore (ints 0);;

The key point is that it doesn't leak memory. This example runs in constant space. It is effectively nothing more than an infinite loop, which can be confirmed by monitoring the space consumption of the program and by noticing that it spins forever without crashing. Whereas a version of this with a list let rec ints n = n :: ints (n + 1) would allocate a list of length proportional to the running time, and thus would crash by running out of memory pretty quickly.

Taking Parts of a Sequence

The Seq module of the OCaml Standard Library contains the definition of the function Seq.take, which returns a specified number of elements from the beginning of a sequence. Here is a simplified implementation:

let rec take n seq () =
  if n <= 0 then
    Seq.Nil
  else
    match seq () with
    | Seq.Cons (x, seq) -> Seq.Cons (x, take (n - 1) seq)
    | _ -> Seq.Nil

take n seq returns, at most, the n first elements of the sequence seq. If seq contains less than n elements, an identical sequence is returned. In particular, if seq is empty, or n is negative, an empty sequence is returned.

Observe the first line of our take function. It is the common pattern for recursive functions over sequences. The last two parameters are:

  • a sequence called seq
  • a unit value

When executed, the function begins by unfreezing seq (that is, calling seq ()) and then pattern matching to look inside the data made available. However, this does not happen unless a unit parameter is passed to take. Writing take 10 seq does not compute anything. It is a partial application and returns a function needing a unit to produce a result.

This can be used to print integers without looping forever, as shown previously:

# Seq.ints 0 |> Seq.take 43 |> List.of_seq;;
- : int list =
[0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20; 21;
 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 35; 36; 37; 38; 39; 40;
 41; 42]

Filtering a Sequence

The Seq module also has a function Seq.filter:

# Seq.filter;;
- : ('a -> bool) -> 'a Seq.t -> 'a Seq.t = <fun>

It builds a sequence of elements satisfying a condition.

Using Seq.filter, taking inspiration from the trial division algorithm, it is possible to define a function which seemingly generates the list of all primes numbers.

let rec trial_div seq () = match seq () with
  | Seq.Cons (m, seq_rest) -> Seq.Cons (m, trial_div (Seq.filter (fun n -> n mod m > 0) seq_rest))
  | Seq.Nil -> Seq.Nil
let primes = Seq.ints 2 |> trial_div;;
val trial_div : int Seq.t -> int Seq.t = <fun>
val primes : int Seq.t = <fun>

For instance, here is a list of 100 first prime numbers:

# primes |> Seq.take 100 |> List.of_seq;;
- : int list =
[2; 3; 5; 7; 11; 13; 17; 19; 23; 29; 31; 37; 41; 43; 47; 53; 59; 61; 67; 71;
 73; 79; 83; 89; 97; 101; 103; 107; 109; 113; 127; 131; 137; 139; 149; 151;
 157; 163; 167; 173; 179; 181; 191; 193; 197; 199; 211; 223; 227; 229; 233;
 239; 241; 251; 257; 263; 269; 271; 277; 281; 283; 293; 307; 311; 313; 317;
 331; 337; 347; 349; 353; 359; 367; 373; 379; 383; 389; 397; 401; 409; 419;
 421; 431; 433; 439; 443; 449; 457; 461; 463; 467; 479; 487; 491; 499; 503;
 509; 521; 523; 541]

The function trial_div is recursive in OCaml and can be understood if we break it down into its constituent parts. It is defined using the rec keyword, allowing the function to call itself. For each loop in the recursive call, it pattern-matches on either Seq.Cons (m, seq) or the end of the sequence, Seq.Nil.

If it matches on the first branch Seq.Cons (m, seq), we filter the remaining sequence of all integers that are divisible by m before recursively calling trial_div on the filtered sequence. This branch is matched on until we reach the end of the sequence for every recursive call.

So far, we recursively traveled down our sequence until we reached the 100th prime number. Next, we retrace our steps up the recursive trail, wherein we construct our result by calling Seq.Cons on m and the previously constructed filtered sequence beginning with Seq.Nil.

Side Note: It may be interesting to learn that trial_div, while it can colloquially be called a recursive, is an example of a kind of recursion called corecursion. Corecursion differs from recursion in that it constructs results incrementally rather than consuming it's input incrementally. Unlike traditional recursion, which works towards a base case, corecursive functions must indefinitely produce values as a stream. The trial_div function is corecursive because it does not immediately compute the complete sequence of primes. Instead, it produces prime numbers on-demand, filtering and deferring further computation until more elements are requested. This allows the sequence to be processed incrementally rather than requiring a complete traversal upfront.

Unfolding Sequences

Standard higher-order iteration functions are available on sequences. For instance:

  • Seq.iter
  • Seq.map
  • Seq.fold_left

All of these kinds of higher-order functions are also available for Array, List, and Set and behave essentially the same. Observe that there is no fold_right function. Since OCaml 4.11, there is something which isn't (yet) available on other types: unfold. Here is how it is implemented:

let rec unfold f x () = match f x with
  | None -> Seq.Nil
  | Some (x, seq) -> Seq.Cons (x, unfold f seq)

And here is its type:

val unfold : ('a -> ('b * 'a) option) -> 'a -> 'b Seq.t = <fun>

Unlike previously mentioned iterators, Seq.unfold does not have a sequence parameter, but a sequence result. unfold provides a general means to build sequences. The result returned by Seq.unfold f x is the sequence built by accumulating the results of successive calls to f until it returns None. This is:

(fst p₀, fst p₁, fst p₂, fst p₃, fst p₄, ...)

where Some p₀ = f x and Some pₙ₊₁ = f (snd pₙ).

For instance, Seq.ints can be implemented using Seq.unfold in a fairly compact way:

# let ints = Seq.unfold (fun n -> Some (n, n + 1));;
val ints : int -> int Seq.t = <fun>

As a fun fact, we can observe that a map over sequences can be implemented using Seq.unfold. Here is how to write it:

# let map f = Seq.unfold (fun seq -> seq |> Seq.uncons |> Option.map (fun (x, seq) -> (f x, seq)));;
val map : ('a -> 'b) -> 'a Seq.t -> 'b Seq.t = <fun>

We can check our map function by applying a square root function to a sequence:

# Seq.ints 0 |> map (fun x -> x * x) |> Seq.take 10 |> List.of_seq;;
- : int list = [0; 1; 4; 9; 16; 25; 36; 49; 64; 81]

The function Seq.uncons returns the head and tail of a sequence if it is not empty. Otherwise, it returns None.

Reading a File with Seq.Unfold

For the next example, we will demonstrate the versatility of Seq.unfold by using it to read a file.

Before doing so, let's define a function that reads a file's line from a provided channel, with the type signature needed by Seq.unfold.

# let input_line_opt chan =
    try Some (In_Channel.input_line chan, chan)
    with End_of_file -> None;;
val input_line_opt : in_channel -> (string * in_channel) option = <fun>

Note: To make the code in the next section work, create a file named "README.md" and add dummy content. We use a file generated by the following command:

cat > README.md <<EOF
This is the first line.
This is the second line.
EOF

Finally, let's read the file's contents using Seq.unfold. Mind that cin is a local definition.

# let cin = open_in "README.md" in
    cin |> Seq.unfold In_channel.input_line_opt |> Seq.iter print_endline;
    close_in cin;;
This is the first line.
This is the second line.
- : unit = ()

Sequences Are Functions

The Seq module contains this definition:

val cons : 'a -> 'a Seq.t -> 'a Seq.t

Although Seq.cons x seq and Seq.Cons (x, seq) are the same, Seq.cons is a function and Seq.Cons is a variant's constructor, which is not the same in OCaml. This can lead to subtle bugs. This section illustrates this.

Although this looks like a possible way to define the Fibonacci sequence:

# let rec fibs m n = Seq.cons m (fibs n (n + m));;
val fibs : int -> int -> int Seq.t = <fun>

It actually isn't. It's an unending recursion which blows away the stack.

# fibs 0 1;;
Stack overflow during evaluation (looping recursion?).

This definition is behaving as expected (spot the differences, there are four): <!-- How do you count four?

  1. Seq.Cons vs Seq.cons
  2. Input is a tuple vs being a pair of parameters
  3. Possesses a Unit value parameter
  4. ? -->
# let rec fibs m n () = Seq.Cons (m, fibs n (n + m));;
val fibs : int -> int -> int Seq.t = <fun>

It can be used to produce some Fibonacci numbers:

# fibs 0 1 |> Seq.take 10 |> List.of_seq;;
- : int list = [0; 1; 1; 2; 3; 5; 8; 13; 21; 34]

Why is it so? The key difference lies in the recursive call fibs n (n + m). In the former definition, the application is complete because fibs is provided with all the arguments it expects. In the latter definition, the application is partial because the () argument is missing. Since evaluation is eager in OCaml, in the former case evaluation of the recursive call is triggered again and again, without ever terminating (this is what "looping recursion" in the error message refers to). In the latter case, the partially applied function is immediately returned as a closure.

Sequences are functions, as stated by their type:

# #show Seq.t;;
type 'a t = unit -> 'a Seq.node

Functions working with sequences must be written accordingly.

  • Sequence consumer: partially applied function parameter
  • Sequence producer: partially applied function result

When code dealing with sequences does not behave as expected, like if it is crashing or hanging, there's a fair chance a mistake like in the first definition of fibs was made.

Sequences for Conversions

Throughout the standard library, sequences are used as a bridge to perform conversions between many datatypes. For instance, here are the signatures of some of those functions:

  • Lists

    val List.to_seq : 'a list -> 'a Seq.t
    val List.of_seq : 'a Seq.t -> 'a list
    
  • Arrays

    val Array.to_seq : 'a array -> 'a Seq.t
    val Array.of_seq : 'a Seq.t -> 'a array
    
  • Strings

    val String.to_seq : string -> char Seq.t
    val String.of_seq : char Seq.t -> string
    

Similar functions are also provided for sets, maps, hash tables (Hashtbl), and others. When implementing a datatype module, it is advised to expose to_seq and of_seq functions.

Miscellaneous Considerations

There are a couple of related libraries, all providing means to handle large flows of data:

  • Rizo I Streaming
  • Simon Cruanes and Gabriel Radanne Iter
  • Simon Cruanes OSeq (an extension of Seq with more functions)
  • Jane Street Base.Sequence

There used to be a module called Stream in the OCaml standard library. It was removed in 2021 with the release of OCaml 4.14. Beware books and documentation written before may still mention it.

Help Improve Our Documentation

All OCaml docs are open source. See something that's wrong or unclear? Submit a pull request.

OCaml

Innovation. Community. Security.