Library
Module
Module type
Parameter
Class
Class type
The functor Make_
takes three parameters: H
, S
, K
.
The module H
provides the type H.t
of the set elements and equips this type with a hash function hash
and an equivalence test equal
. (Yes, the equivalence function is conventionally named equal
.) The hash function must respect this equivalence: that is, equiv x y
must imply hash x = hash y
.
The module S
provides two sentinel elements, which by convention are named void
and tomb
. These values must be distinct: void != tomb
must hold. Furthermore, whenever the user inserts or looks up an element x
, this element must not be a sentinel: that is, x != void && x != tomb
must hold.
The module K
is a minimal implementation of arrays. Only make
, copy
, length
, unsafe_get
, and unsafe_set
are needed.
module H : HashedType
type element = H.t
The type of the elements of a set.
The type of sets. At all times, a set s
contains at most one element of each equivalence class: that is, mem s x
and mem s y
and equiv x y
imply x = y
.
type t = set
t
is a synonym for set
.
val create : unit -> set
create()
creates a fresh empty set.
Time complexity: O(1)
.
copy s
returns a new set whose elements are the elements of s
.
Time complexity: O(c)
, where c
is the capacity of the set s
.
We provide two insertion functions, namely add_if_absent
and replace
.
If equivalence implies equality (that is, if equal x y
implies that x
and y
cannot be distinguished) then add_if_absent
and replace
behave in the same way.
Otherwise, add_if_absent
and replace
behave differently. Suppose that x
and y
are two distinct yet equivalent elements. If y
is already present in the set s
, then add_if_absent s x
has no effect, whereas replace s x
replaces y
with x
in the set s
.
If x
or some equivalent element is a member of the set s
, then add_if_absent s x
has no effect and returns false
. Otherwise, add_if_absent s x
inserts the element x
into the set s
and returns true
.
Thus, add_if_absent s x
returns true
if and only if the cardinality of the set s
increases as a result of this operation.
If necessary, the capacity of the set s
is increased.
Time complexity: the cost of an insertion operation is often O(1)
; however, if the capacity of the set must be increased, it is O(n)
. Because this costly event is infrequent, the amortized complexity of insertion is O(\log n)
.
If some element that is equivalent to x
is present in the set s
, then replace s x
removes this pre-existing element, inserts x
into the set s
, and returns false
. Otherwise, replace s x
inserts x
into the set s
and returns true
.
Thus, replace s x
returns true
if and only if the cardinality of the set s
increases as a result of this operation.
If necessary, the capacity of the set s
is increased.
Time complexity: the cost of an insertion operation is often O(1)
; however, if the capacity of the set must be increased, it is O(n)
. Because this costly event is infrequent, the amortized complexity of insertion is O(\log n)
.
mem s x
determines whether the element x
, or some element y
that is equivalent to x
, is a member of the set s
.
Time complexity: O(1)
.
find s x
determines whether some element y
that is equivalent to x
is a member of the set s
. If so, y
is returned. Otherwise, Not_found
is raised.
Time complexity: O(1)
.
If the set s
has nonzero cardinality, then choose s
returns an element of the set s
. This element is chosen at random. Otherwise, choose s
raises Not_found
.
choose
invokes Random.int
. Two successive calls to choose s
can return different results.
Time complexity: O(c)
in the worst case and O(c/n)
in expectation, where c
is the capacity of the set s
and n
is its cardinality.
If the occupancy rate n/c
remains above a certain fixed threshold, then these bounds can be written under the form O(n)
in the worst case and O(1)
in expectation.
If choose
is used in a loop where elements are repeatedly removed then it is recommended to repeatedly call tighten
so as to maintain a high occupancy rate.
find_else_add s x
determines whether some element y
that is equivalent to x
is a member of the set s
. If so, y
is returned. Otherwise, the element x
is inserted into the set s
, and Not_found
is raised.
find_else_add s x
is equivalent to try find s x with Not_found -> ignore (add_if_absent s x); raise Not_found
.
Time complexity: the cost of an insertion operation is often O(1)
; however, if the capacity of the set must be increased, it is O(n)
. Because this costly event is infrequent, the amortized complexity of insertion is O(\log n)
.
If some element y
that is equivalent to x
is a member of the set s
, then remove s x
removes y
from the set s
. Otherwise, nothing happens.
Time complexity: O(1)
.
If some element y
that is equivalent to x
is a member of the set s
, then find_and_remove s x
removes y
from the set s
and returns y
. Otherwise, the set s
is unaffected, and Not_found
is raised.
Time complexity: O(1)
.
foreach_key f s
applies the user-supplied function f
in turn to each element x
of the set s
. The function f
must not modify the set s
: that is, no elements can be inserted or deleted while iteration is ongoing.
Time complexity: O(c)
, where c
is the capacity of the set s
.
val cardinal : set -> int
cardinal s
returns the cardinality of the set s
, that is, the number of inhabitants of this set.
Time complexity: O(1)
.
val is_empty : set -> bool
is_empty s
is equivalent to cardinal s = 0
.
Time complexity: O(1)
.
val clear : set -> unit
clear s
empties the set s
. The internal data array is retained, and is erased. Thus, the capacity of the set s
is unchanged.
Time complexity: O(c)
, where c
is the capacity of the set s
.
val reset : set -> unit
reset s
empties the set s
. The internal data array is abandoned. Thus, the capacity of the set s
is reset to a small constant.
Time complexity: O(1)
.
val tighten : set -> unit
tighten s
decreases the capacity of the set s
, if necessary and if possible, so as to ensure that the occupancy rate n/c
is high enough. It guarantees either c = O(1)
, which means that the capacity is below a certain constant, or c = O(n)
, which means that the occupancy rate is above a certain constant.
Time complexity: O(c)
, where c
is the capacity of the set s
.
In the case where there is nothing to do, tighten
has constant cost. Thus, the amortized complexity of a call to tighten
, in a loop where elements are repeatedly removed, is O(\log n)
.
val cleanup : set -> unit
cleanup s
invokes tighten s
and eliminates the tombstones that earlier deletion operations may have created in the internal data array. This can speed up future insertions and lookups.
Time complexity: O(c)
, where c
is the capacity of the set s
.
show show_key s
returns a textual representation of the set s
. This representation is delimited with curly braces. Two consecutive elements are separated with a comma and a space. The user-supplied function show_key
is used to obtain a textual representation of each element.
Time complexity: O(c)
, where c
is the capacity of the set s
.
val capacity : set -> int
capacity s
returns the current capacity of the set s
, that is, the current size of its internal array.
Time complexity: O(1)
.
val occupation : set -> int
occupation s
returns the current occupation of the set s
, that is, the number of occupied entries in its internal data array. This number may be greater than cardinal s
.
Time complexity: O(1)
.
type histogram = int Map.Make(Int).t
Assume that the element x
is present in the set s
. We say that this element has search length k
if the function call mem s x
requires reading k+1
successive slots in the internal data array of the set s
. In the best case, an element has search length 0. If there are collisions, then some elements have search length greater than 0.
A present-key histogram for the set s
is a finite association map that maps a natural integer k
to the number of elements of the set s
that have search length k
. The cardinality of this histogram is n
, the cardinality of the set s
.
The average search length should be a good a predictor of the cost of searching for an element that is present in the set.
We say that the slot at index i
in an internal data array has insertion length k
if finding the first empty slot, beginning at index i
, requires reading k+1
successive slots. An empty slot has insertion length 0. A nonempty slot has insertion length greater than 0.
An absent-key histogram for the set s
is a finite association map that maps a natural integer k
to the number of slots in the data array of the set s
that have insertion length k
. The cardinality of this histogram is c
, the capacity of the set s
.
The average insertion length should be a good a predictor of the cost of inserting an element that is not present in the set.
present_key_histogram s
returns a present-key histogram for the set s
.
Time complexity: O(c \log c)
, where c
is the capacity of the set s
.
absent_key_histogram s
returns an absent-key histogram for the set s
.
Time complexity: O(c \log c)
, where c
is the capacity of the set s
.
val average : histogram -> float
average h
returns the average value of the histogram h
.
Time complexity: O(n)
, where n
is the cardinality of the histogram h
.
val statistics : set -> string
statistics s
returns a string of information about the set s
. This information includes the cardinality, capacity, occupancy rate, average search length, present-key histogram, average insertion length, and absent-key histogram.
Time complexity: O(c \log c)
, where c
is the capacity of the set s
.