package ocaml-base-compiler
Sequences.
A sequence of type 'a Seq.t
can be thought of as a delayed list, that is, a list whose elements are computed only when they are demanded by a consumer. This allows sequences to be produced and transformed lazily (one element at a time) rather than eagerly (all elements at once). This also allows constructing conceptually infinite sequences.
The type 'a Seq.t
is defined as a synonym for unit -> 'a Seq.node
. This is a function type: therefore, it is opaque. The consumer can query a sequence in order to request the next element (if there is one), but cannot otherwise inspect the sequence in any way.
Because it is opaque, the type 'a Seq.t
does not reveal whether a sequence is:
- persistent, which means that the sequence can be used as many times as desired, producing the same elements every time, just like an immutable list; or
- ephemeral, which means that the sequence is not persistent. Querying an ephemeral sequence might have an observable side effect, such as incrementing a mutable counter. As a common special case, an ephemeral sequence can be affine, which means that it must be queried at most once.
It also does not reveal whether the elements of the sequence are:
- pre-computed and stored in memory, which means that querying the sequence is cheap;
- computed when first demanded and then stored in memory, which means that querying the sequence once can be expensive, but querying the same sequence again is cheap; or
- re-computed every time they are demanded, which may or may not be cheap.
It is up to the programmer to keep these distinctions in mind so as to understand the time and space requirements of sequences.
For the sake of simplicity, most of the documentation that follows is written under the implicit assumption that the sequences at hand are persistent. We normally do not point out when or how many times each function is invoked, because that would be too verbose. For instance, in the description of map
, we write: "if xs
is the sequence x0; x1; ...
then map f xs
is the sequence f x0; f x1; ...
". If we wished to be more explicit, we could point out that the transformation takes place on demand: that is, the elements of map f xs
are computed only when they are demanded. In other words, the definition let ys = map f xs
terminates immediately and does not invoke f
. The function call f x0
takes place only when the first element of ys
is demanded, via the function call ys()
. Furthermore, calling ys()
twice causes f x0
to be called twice as well. If one wishes for f
to be applied at most once to each element of xs
, even in scenarios where ys
is queried more than once, then one should use let ys = memoize (map f xs)
.
As a general rule, the functions that build sequences, such as map
, filter
, scan
, take
, etc., produce sequences whose elements are computed only on demand. The functions that eagerly consume sequences, such as is_empty
, find
, length
, iter
, fold_left
, etc., are the functions that force computation to take place.
When possible, we recommend using sequences rather than dispensers (functions of type unit -> 'a option
that produce elements upon demand). Whereas sequences can be persistent or ephemeral, dispensers are always ephemeral, and are typically more difficult to work with than sequences. Two conversion functions, to_dispenser
and of_dispenser
, are provided.
type 'a t = unit -> 'a node
A sequence xs
of type 'a t
is a delayed list of elements of type 'a
. Such a sequence is queried by performing a function application xs()
. This function application returns a node, allowing the caller to determine whether the sequence is empty or nonempty, and in the latter case, to obtain its head and tail.
A node is either Nil
, which means that the sequence is empty, or Cons (x, xs)
, which means that x
is the first element of the sequence and that xs
is the remainder of the sequence.
Consuming sequences
The functions in this section consume their argument, a sequence, either partially or completely:
is_empty
anduncons
consume the sequence down to depth 1. That is, they demand the first argument of the sequence, if there is one.iter
,fold_left
,length
, etc., consume the sequence all the way to its end. They terminate only if the sequence is finite.for_all
,exists
,find
, etc. consume the sequence down to a certain depth, which is a priori unpredictable.
Similarly, among the functions that consume two sequences, one can distinguish two groups:
iter2
andfold_left2
consume both sequences all the way to the end, provided the sequences have the same length.for_all2
,exists2
,equal
,compare
consume the sequences down to a certain depth, which is a priori unpredictable.
The functions that consume two sequences can be applied to two sequences of distinct lengths: in that case, the excess elements in the longer sequence are ignored. (It may be the case that one excess element is demanded, even though this element is not used.)
None of the functions in this section is lazy. These functions are consumers: they force some computation to take place.
val is_empty : 'a t -> bool
is_empty xs
determines whether the sequence xs
is empty.
It is recommended that the sequence xs
be persistent. Indeed, is_empty xs
demands the head of the sequence xs
, so, if xs
is ephemeral, it may be the case that xs
cannot be used any more after this call has taken place.
If xs
is empty, then uncons xs
is None
.
If xs
is nonempty, then uncons xs
is Some (head xs, tail xs)
, that is, a pair of the head and tail of the sequence xs
.
This equivalence holds if xs
is persistent. If xs
is ephemeral, then uncons
must be preferred over separate calls to head
and tail
, which would cause xs
to be queried twice.
val length : 'a t -> int
length xs
is the length of the sequence xs
.
The sequence xs
must be finite.
val iter : ('a -> unit) -> 'a t -> unit
iter f xs
invokes f x
successively for every element x
of the sequence xs
, from left to right.
It terminates only if the sequence xs
is finite.
val fold_left : ('a -> 'b -> 'a) -> 'a -> 'b t -> 'a
fold_left f _ xs
invokes f _ x
successively for every element x
of the sequence xs
, from left to right.
An accumulator of type 'a
is threaded through the calls to f
.
It terminates only if the sequence xs
is finite.
val iteri : (int -> 'a -> unit) -> 'a t -> unit
iteri f xs
invokes f i x
successively for every element x
located at index i
in the sequence xs
.
It terminates only if the sequence xs
is finite.
iteri f xs
is equivalent to iter (fun (i, x) -> f i x) (zip (ints 0) xs)
.
val fold_lefti : ('b -> int -> 'a -> 'b) -> 'b -> 'a t -> 'b
fold_lefti f _ xs
invokes f _ i x
successively for every element x
located at index i
of the sequence xs
.
An accumulator of type 'b
is threaded through the calls to f
.
It terminates only if the sequence xs
is finite.
fold_lefti f accu xs
is equivalent to fold_left (fun accu (i, x) -> f accu i x) accu (zip (ints 0) xs)
.
val for_all : ('a -> bool) -> 'a t -> bool
for_all p xs
determines whether all elements x
of the sequence xs
satisfy p x
.
The sequence xs
must be finite.
val exists : ('a -> bool) -> 'a t -> bool
exists xs p
determines whether at least one element x
of the sequence xs
satisfies p x
.
The sequence xs
must be finite.
val find : ('a -> bool) -> 'a t -> 'a option
find p xs
returns Some x
, where x
is the first element of the sequence xs
that satisfies p x
, if there is such an element.
It returns None
if there is no such element.
The sequence xs
must be finite.
val find_map : ('a -> 'b option) -> 'a t -> 'b option
find_map f xs
returns Some y
, where x
is the first element of the sequence xs
such that f x = Some _
, if there is such an element, and where y
is defined by f x = Some y
.
It returns None
if there is no such element.
The sequence xs
must be finite.
iter2 f xs ys
invokes f x y
successively for every pair (x, y)
of elements drawn synchronously from the sequences xs
and ys
.
If the sequences xs
and ys
have different lengths, then iteration stops as soon as one sequence is exhausted; the excess elements in the other sequence are ignored.
Iteration terminates only if at least one of the sequences xs
and ys
is finite.
iter2 f xs ys
is equivalent to iter (fun (x, y) -> f x y) (zip xs ys)
.
fold_left2 f _ xs ys
invokes f _ x y
successively for every pair (x, y)
of elements drawn synchronously from the sequences xs
and ys
.
An accumulator of type 'a
is threaded through the calls to f
.
If the sequences xs
and ys
have different lengths, then iteration stops as soon as one sequence is exhausted; the excess elements in the other sequence are ignored.
Iteration terminates only if at least one of the sequences xs
and ys
is finite.
fold_left2 f accu xs ys
is equivalent to fold_left (fun accu (x, y) -> f accu x y) (zip xs ys)
.
for_all2 p xs ys
determines whether all pairs (x, y)
of elements drawn synchronously from the sequences xs
and ys
satisfy p x y
.
If the sequences xs
and ys
have different lengths, then iteration stops as soon as one sequence is exhausted; the excess elements in the other sequence are ignored. In particular, if xs
or ys
is empty, then for_all2 p xs ys
is true. This is where for_all2
and equal
differ: equal eq xs ys
can be true only if xs
and ys
have the same length.
At least one of the sequences xs
and ys
must be finite.
for_all2 p xs ys
is equivalent to for_all (fun b -> b) (map2 p xs ys)
.
exists2 p xs ys
determines whether some pair (x, y)
of elements drawn synchronously from the sequences xs
and ys
satisfies p x y
.
If the sequences xs
and ys
have different lengths, then iteration must stop as soon as one sequence is exhausted; the excess elements in the other sequence are ignored.
At least one of the sequences xs
and ys
must be finite.
exists2 p xs ys
is equivalent to exists (fun b -> b) (map2 p xs ys)
.
Provided the function eq
defines an equality on elements, equal eq xs ys
determines whether the sequences xs
and ys
are pointwise equal.
At least one of the sequences xs
and ys
must be finite.
Provided the function cmp
defines a preorder on elements, compare cmp xs ys
compares the sequences xs
and ys
according to the lexicographic preorder.
For more details on comparison functions, see Array.sort
.
At least one of the sequences xs
and ys
must be finite.
Constructing sequences
The functions in this section are lazy: that is, they return sequences whose elements are computed only when demanded.
val empty : 'a t
empty
is the empty sequence. It has no elements. Its length is 0.
val return : 'a -> 'a t
return x
is the sequence whose sole element is x
. Its length is 1.
cons x xs
is the sequence that begins with the element x
, followed with the sequence xs
.
Writing cons (f()) xs
causes the function call f()
to take place immediately. For this call to be delayed until the sequence is queried, one must instead write (fun () -> Cons(f(), xs))
.
val init : int -> (int -> 'a) -> 'a t
init n f
is the sequence f 0; f 1; ...; f (n-1)
.
n
must be nonnegative.
If desired, the infinite sequence f 0; f 1; ...
can be defined as map f (ints 0)
.
val unfold : ('b -> ('a * 'b) option) -> 'b -> 'a t
unfold
constructs a sequence out of a step function and an initial state.
If f u
is None
then unfold f u
is the empty sequence. If f u
is Some (x, u')
then unfold f u
is the nonempty sequence cons x (unfold f u')
.
For example, unfold (function [] -> None | h :: t -> Some (h, t)) l
is equivalent to List.to_seq l
.
val repeat : 'a -> 'a t
repeat x
is the infinite sequence where the element x
is repeated indefinitely.
repeat x
is equivalent to cycle (return x)
.
val forever : (unit -> 'a) -> 'a t
forever f
is an infinite sequence where every element is produced (on demand) by the function call f()
.
For instance, forever Random.bool
is an infinite sequence of random bits.
forever f
is equivalent to map f (repeat ())
.
cycle xs
is the infinite sequence that consists of an infinite number of repetitions of the sequence xs
.
If xs
is an empty sequence, then cycle xs
is empty as well.
Consuming (a prefix of) the sequence cycle xs
once can cause the sequence xs
to be consumed more than once. Therefore, xs
must be persistent.
val iterate : ('a -> 'a) -> 'a -> 'a t
iterate f x
is the infinite sequence whose elements are x
, f x
, f (f x)
, and so on.
In other words, it is the orbit of the function f
, starting at x
.
Transforming sequences
The functions in this section are lazy: that is, they return sequences whose elements are computed only when demanded.
map f xs
is the image of the sequence xs
through the transformation f
.
If xs
is the sequence x0; x1; ...
then map f xs
is the sequence f x0; f x1; ...
.
mapi
is analogous to map
, but applies the function f
to an index and an element.
mapi f xs
is equivalent to map2 f (ints 0) xs
.
filter p xs
is the sequence of the elements x
of xs
that satisfy p x
.
In other words, filter p xs
is the sequence xs
, deprived of the elements x
such that p x
is false.
filter_map f xs
is the sequence of the elements y
such that f x = Some y
, where x
ranges over xs
.
filter_map f xs
is equivalent to map Option.get (filter Option.is_some (map f xs))
.
If xs
is a sequence [x0; x1; x2; ...]
, then scan f a0 xs
is a sequence of accumulators [a0; a1; a2; ...]
where a1
is f a0 x0
, a2
is f a1 x1
, and so on.
Thus, scan f a0 xs
is conceptually related to fold_left f a0 xs
. However, instead of performing an eager iteration and immediately returning the final accumulator, it returns a sequence of accumulators.
For instance, scan (+) 0
transforms a sequence of integers into the sequence of its partial sums.
If xs
has length n
then scan f a0 xs
has length n+1
.
take n xs
is the sequence of the first n
elements of xs
.
If xs
has fewer than n
elements, then take n xs
is equivalent to xs
.
n
must be nonnegative.
drop n xs
is the sequence xs
, deprived of its first n
elements.
If xs
has fewer than n
elements, then drop n xs
is empty.
n
must be nonnegative.
drop
is lazy: the first n+1
elements of the sequence xs
are demanded only when the first element of drop n xs
is demanded. For this reason, drop 1 xs
is not equivalent to tail xs
, which queries xs
immediately.
take_while p xs
is the longest prefix of the sequence xs
where every element x
satisfies p x
.
drop_while p xs
is the sequence xs
, deprived of the prefix take_while p xs
.
Provided the function eq
defines an equality on elements, group eq xs
is the sequence of the maximal runs of adjacent duplicate elements of the sequence xs
.
Every element of group eq xs
is a nonempty sequence of equal elements.
The concatenation concat (group eq xs)
is equal to xs
.
Consuming group eq xs
, and consuming the sequences that it contains, can cause xs
to be consumed more than once. Therefore, xs
must be persistent.
The sequence memoize xs
has the same elements as the sequence xs
.
Regardless of whether xs
is ephemeral or persistent, memoize xs
is persistent: even if it is queried several times, xs
is queried at most once.
The construction of the sequence memoize xs
internally relies on suspensions provided by the module Lazy
. These suspensions are not thread-safe. Therefore, the sequence memoize xs
must not be queried by multiple threads concurrently.
This exception is raised when a sequence returned by once
(or a suffix of it) is queried more than once.
The sequence once xs
has the same elements as the sequence xs
.
Regardless of whether xs
is ephemeral or persistent, once xs
is an ephemeral sequence: it can be queried at most once. If it (or a suffix of it) is queried more than once, then the exception Forced_twice
is raised. This can be useful, while debugging or testing, to ensure that a sequence is consumed at most once.
If xss
is a matrix (a sequence of rows), then transpose xss
is the sequence of the columns of the matrix xss
.
The rows of the matrix xss
are not required to have the same length.
The matrix xss
is not required to be finite (in either direction).
The matrix xss
must be persistent.
Combining sequences
append xs ys
is the concatenation of the sequences xs
and ys
.
Its elements are the elements of xs
, followed by the elements of ys
.
If xss
is a sequence of sequences, then concat xss
is its concatenation.
If xss
is the sequence xs0; xs1; ...
then concat xss
is the sequence xs0 @ xs1 @ ...
.
concat_map f xs
is equivalent to concat (map f xs)
.
concat_map
is an alias for flat_map
.
zip xs ys
is the sequence of pairs (x, y)
drawn synchronously from the sequences xs
and ys
.
If the sequences xs
and ys
have different lengths, then the sequence ends as soon as one sequence is exhausted; the excess elements in the other sequence are ignored.
zip xs ys
is equivalent to map2 (fun a b -> (a, b)) xs ys
.
map2 f xs ys
is the sequence of the elements f x y
, where the pairs (x, y)
are drawn synchronously from the sequences xs
and ys
.
If the sequences xs
and ys
have different lengths, then the sequence ends as soon as one sequence is exhausted; the excess elements in the other sequence are ignored.
map2 f xs ys
is equivalent to map (fun (x, y) -> f x y) (zip xs ys)
.
interleave xs ys
is the sequence that begins with the first element of xs
, continues with the first element of ys
, and so on.
When one of the sequences xs
and ys
is exhausted, interleave xs ys
continues with the rest of the other sequence.
If the sequences xs
and ys
are sorted according to the total preorder cmp
, then sorted_merge cmp xs ys
is the sorted sequence obtained by merging the sequences xs
and ys
.
For more details on comparison functions, see Array.sort
.
product xs ys
is the Cartesian product of the sequences xs
and ys
.
For every element x
of xs
and for every element y
of ys
, the pair (x, y)
appears once as an element of product xs ys
.
The order in which the pairs appear is unspecified.
The sequences xs
and ys
are not required to be finite.
The sequences xs
and ys
must be persistent.
The sequence map_product f xs ys
is the image through f
of the Cartesian product of the sequences xs
and ys
.
For every element x
of xs
and for every element y
of ys
, the element f x y
appears once as an element of map_product f xs ys
.
The order in which these elements appear is unspecified.
The sequences xs
and ys
are not required to be finite.
The sequences xs
and ys
must be persistent.
map_product f xs ys
is equivalent to map (fun (x, y) -> f x y) (product xs ys)
.
Splitting a sequence into two sequences
unzip
transforms a sequence of pairs into a pair of sequences.
unzip xs
is equivalent to (map fst xs, map snd xs)
.
Querying either of the sequences returned by unzip xs
causes xs
to be queried. Therefore, querying both of them causes xs
to be queried twice. Thus, xs
must be persistent and cheap. If that is not the case, use unzip (memoize xs)
.
partition_map f xs
returns a pair of sequences (ys, zs)
, where:
ys
is the sequence of the elementsy
such thatf x = Left y
, wherex
ranges overxs
;
zs
is the sequence of the elementsz
such thatf x = Right z
, wherex
ranges overxs
.
partition_map f xs
is equivalent to a pair of filter_map Either.find_left (map f xs)
and filter_map Either.find_right (map f xs)
.
Querying either of the sequences returned by partition_map f xs
causes xs
to be queried. Therefore, querying both of them causes xs
to be queried twice. Thus, xs
must be persistent and cheap. If that is not the case, use partition_map f (memoize xs)
.
partition p xs
returns a pair of the subsequence of the elements of xs
that satisfy p
and the subsequence of the elements of xs
that do not satisfy p
.
partition p xs
is equivalent to filter p xs, filter (fun x -> not (p x)) xs
.
Consuming both of the sequences returned by partition p xs
causes xs
to be consumed twice and causes the function f
to be applied twice to each element of the list. Therefore, f
should be pure and cheap. Furthermore, xs
should be persistent and cheap. If that is not the case, use partition p (memoize xs)
.
Converting between sequences and dispensers
A dispenser is a representation of a sequence as a function of type unit -> 'a option
. Every time this function is invoked, it returns the next element of the sequence. When there are no more elements, it returns None
. A dispenser has mutable internal state, therefore is ephemeral: the sequence that it represents can be consumed at most once.
val of_dispenser : (unit -> 'a option) -> 'a t
of_dispenser it
is the sequence of the elements produced by the dispenser it
. It is an ephemeral sequence: it can be consumed at most once. If a persistent sequence is needed, use memoize (of_dispenser it)
.
val to_dispenser : 'a t -> unit -> 'a option
to_dispenser xs
is a fresh dispenser on the sequence xs
.
This dispenser has mutable internal state, which is not protected by a lock; so, it must not be used by several threads concurrently.
Sequences of integers
val ints : int -> int t
ints i
is the infinite sequence of the integers beginning at i
and counting up.