Hashtbl.create n
creates a new, empty hash table, with initial size n
. For best results, n
should be on the order of the expected number of elements that will be in the table. The table grows as needed, so n
is just an initial guess.
The optional random
parameter (a boolean) controls whether the internal organization of the hash table is randomized at each execution of Hashtbl.create
or deterministic over all executions.
A hash table that is created with ~random:false
uses a fixed hash function (Hashtbl.hash
) to distribute keys among buckets. As a consequence, collisions between keys happen deterministically. In Web-facing applications or other security-sensitive applications, the deterministic collision patterns can be exploited by a malicious user to create a denial-of-service attack: the attacker sends input crafted to create many collisions in the table, slowing the application down.
A hash table that is created with ~random:true
uses the seeded hash function Hashtbl.seeded_hash
with a seed that is randomly chosen at hash table creation time. In effect, the hash function used is randomly selected among 2^{30}
different hash functions. All these hash functions have different collision patterns, rendering ineffective the denial-of-service attack described above. However, because of randomization, enumerating all elements of the hash table using Hashtbl.fold
or Hashtbl.iter
is no longer deterministic: elements are enumerated in different orders at different runs of the program.
If no ~random
parameter is given, hash tables are created in non-random mode by default. This default can be changed either programmatically by calling Hashtbl.randomize
or by setting the R
flag in the OCAMLRUNPARAM
environment variable.