Hash Tables

Introduction

The OCaml Standard Library's Hashtbl module implements an associative array similar to the standard library's Map module.

While these modules share a similar structure and purpose, there are several differences that can be weighed when choosing between the two.

One of the benefits of a hash table over a map is that instead of having a logarithmic time complexity (O(log n)), a hash table is able to retrieve information at a nearly instantaneous linear time complexity (O(1)).

A hash table data structure achieves efficient reads and writes by employing a hashing function that converts the key of a key/value pair into an algorithmically unique "fingerprint" known as a hash. OCaml has a built-in hashing function within the Hashtabl module that is available for each key The Hashtbl module implements an efficient, mutable lookup table.

Creating a Polymorphic Hash Table

To create a hash table we could write:

# let my_hash = Hashtbl.create 123456;;
val my_hash : ('_weak1, '_weak2) Hashtbl.t = <abstr>

The 123456 is the initial size (in terms of element count) of the hash table. This initial number is just your best guess as to the amount of data that you will be putting into the hash table. The hash table can grow if you under-estimate the size so don't worry about it too much. The type of my_hash is:

# my_hash;;
- : ('_weak1, '_weak2) Hashtbl.t = <abstr>

The '_weak1 and '_weak2 correspond to the key and value types, respectively. There are no concrete types (e.g., int or float * string) filled in those slots because the type of the key and value are not yet determined. The underscore indicates that the key and data types, once chosen, will be fixed. In other words, you can't sometimes use a given hash table with ints for keys, and then later use a string as a key in that same hash table.

Adding Data to a Hash Table

Lets add some data to my_hash. Lets say I am working on a cross word solving program and I want to find all words that start with a certain letter. First I need to enter the data into my_hash.

Note that a hash table is modified by in-place updates, so unlike a map, another hash table is not created every time you change the table. Thus, the code let my_hash = Hashtbl.add my_hash ... does not make sense. Instead, using an imperative style we would write something like this:

# Hashtbl.add my_hash "h" "hello";
  Hashtbl.add my_hash "h" "hi";
  Hashtbl.add my_hash "h" "hug";
  Hashtbl.add my_hash "h" "hard";
  Hashtbl.add my_hash "w" "wimp";
  Hashtbl.add my_hash "w" "world";
  Hashtbl.add my_hash "w" "wine";;
- : unit = ()

The return type is unit, indicating that Hashtbl.add produces a side effect.

Now that we put data into my_hash, lets look at its type:

# my_hash;;
- : (string, string) Hashtbl.t = <abstr>

Where the type used to be the polymorphic (_weak1, _weak2), it now has the concrete representation of (string * string).

Finding Data in Hash Tables

If we want to find one element in my_hash that has an "h" in it then we would write:

# Hashtbl.find my_hash "h";;
- : string = "hard"

As we would expect, querying the hash table my_hash with the key h returns a single value, "hard" since this is the last element updated with the value of "h".

However, the previous values associated with the key "h" were not replaced. What we may want instead is all the elements that start with "h". To do this we want to find all of them. What better name for this than find_all?

# Hashtbl.find_all my_hash "h";;
- : string list = ["hard"; "hug"p; "hi"; "hello"]

This returns ["hard"; "hug"; "hi"; "hello"], demonstrating that hashed key collisions are associated with a list of values associated with that key.

Removing Data from Hash Tables

If you remove a key, its previous value becomes again the default one associated to the key.

# Hashtbl.remove my_hash "h";;
- : unit = ()
# Hashtbl.find my_hash "h";;
- : string = "hug"

This behavior is interesting for the above example or when, say, the keys represent variables that can be temporarily masked by a local variables of the same name.

Replacing Data in Hash Tables

In other contexts, one may prefer new values replace the previous ones. In this case, one uses Hashtbl.replace:

# Hashtbl.replace my_hash "t" "try";
  Hashtbl.replace my_hash "t" "test";
  Hashtbl.find_all my_hash "t";;
- : string list = ["test"]

# Hashtbl.remove my_hash "t";
  Hashtbl.find my_hash "t";;
Exception: Not_found.

To find out whether there is an entry in my_hash for a letter we would do:

# Hashtbl.mem my_hash "h";;
- : bool = true

Help Improve Our Documentation

All OCaml docs are open source. See something that's wrong or unclear? Submit a pull request.

OCaml

Innovation. Community. Security.