Hash Tables
Introduction
The OCaml Standard Library's Hashtbl
module implements an associative array
similar to the standard library's Map
module.
While these modules share a similar structure and purpose, there are several differences that can be weighed when choosing between the two.
One of the benefits of a hash table over a map is that instead of having a
logarithmic time complexity (O(log n)), a hash table
is able to retrieve
information at a nearly instantaneous linear time complexity (O(1)).
A hash table data structure achieves efficient reads and writes by employing a
hashing function that converts the key of a key/value pair into an
algorithmically unique "fingerprint" known as a hash. OCaml has a built-in
hashing function within the Hashtabl
module that is available for each key
The Hashtbl
module implements an efficient, mutable lookup table.
Creating a Polymorphic Hash Table
To create a hash table we could write:
# let my_hash = Hashtbl.create 123456;;
val my_hash : ('_weak1, '_weak2) Hashtbl.t = <abstr>
The 123456 is the initial size (in terms of element count) of the hash table. This initial number is just your best guess as to the amount of data that you will be putting into the hash table. The hash table can grow if you under-estimate the size so don't worry about it too much. The type of my_hash is:
# my_hash;;
- : ('_weak1, '_weak2) Hashtbl.t = <abstr>
The '_weak1
and '_weak2
correspond to the key and value types, respectively.
There are no concrete types (e.g., int
or float * string
) filled in those
slots because the type of the key and value are not yet determined. The
underscore indicates that the key and data types, once chosen, will be fixed. In
other words, you can't sometimes use a given hash table with int
s for keys,
and then later use a string as a key in that same hash table.
Adding Data to a Hash Table
Lets add some data to my_hash
. Lets say I am working on a cross word
solving program and I want to find all words that start with a certain
letter. First I need to enter the data into my_hash
.
Note that a hash table is modified by in-place updates, so unlike a
map, another hash table is not created every
time you change the table. Thus, the code let my_hash = Hashtbl.add my_hash ...
does not make sense. Instead, using an imperative style we would write
something like this:
# Hashtbl.add my_hash "h" "hello";
Hashtbl.add my_hash "h" "hi";
Hashtbl.add my_hash "h" "hug";
Hashtbl.add my_hash "h" "hard";
Hashtbl.add my_hash "w" "wimp";
Hashtbl.add my_hash "w" "world";
Hashtbl.add my_hash "w" "wine";;
- : unit = ()
The return type is unit
, indicating that Hashtbl.add
produces a side effect.
Now that we put data into my_hash
, lets look at its type:
# my_hash;;
- : (string, string) Hashtbl.t = <abstr>
Where the type used to be the polymorphic (_weak1, _weak2)
, it now has the
concrete representation of (string * string)
.
Finding Data in Hash Tables
If we want to find one element in my_hash
that has an "h"
in it then we
would write:
# Hashtbl.find my_hash "h";;
- : string = "hard"
As we would expect, querying the hash table my_hash
with the key h
returns a
single value, "hard"
since this is the last element updated with the
value of "h"
.
However, the previous values associated with the key "h"
were not replaced.
What we may want instead is all the elements that start with "h"
. To do this we
want to find all of them. What better name for this than find_all
?
# Hashtbl.find_all my_hash "h";;
- : string list = ["hard"; "hug"p; "hi"; "hello"]
This returns ["hard"; "hug"; "hi"; "hello"]
, demonstrating that hashed key
collisions are associated with a list of values associated with that key.
Removing Data from Hash Tables
If you remove a key, its previous value becomes again the default one associated to the key.
# Hashtbl.remove my_hash "h";;
- : unit = ()
# Hashtbl.find my_hash "h";;
- : string = "hug"
This behavior is interesting for the above example or when, say, the keys represent variables that can be temporarily masked by a local variables of the same name.
Replacing Data in Hash Tables
In other contexts, one may prefer new values replace the previous ones. In
this case, one uses Hashtbl.replace
:
# Hashtbl.replace my_hash "t" "try";
Hashtbl.replace my_hash "t" "test";
Hashtbl.find_all my_hash "t";;
- : string list = ["test"]
# Hashtbl.remove my_hash "t";
Hashtbl.find my_hash "t";;
Exception: Not_found.
To find out whether there is an entry in my_hash
for a letter we would do:
# Hashtbl.mem my_hash "h";;
- : bool = true
Help Improve Our Documentation
All OCaml docs are open source. See something that's wrong or unclear? Submit a pull request.