package patricia-tree

  1. Overview
  2. Docs
Patricia Tree data structure in OCaml for maps and sets. Supports generic key-value pairs

Install

dune-project
 Dependency

Authors

Maintainers

Sources

patricia-tree-0.11.0.tbz
sha256=18fcde5d35d65c9bb2f9ec4ff732ecdd8969ba6fc2cf51d29ecb3be66e2fe664
sha512=da038d5096deb4bf3c02efd694e962ecf9b2571d140fa1fef17cce474f628ec070b93a44fd742748b9d3ba0e51041f864623d83e9cb0c72214abb0fb4a043625

doc/patricia-tree/PatriciaTree/MakeCustomHeterogeneousMap/index.html

Module PatriciaTree.MakeCustomHeterogeneousMapSource

Create an heterogeneous map with a custom NODE.

This is the same as MAP, but with simple type key being replaced by type constructor 'a key and 'b value being replaced by ('a,'b) value.

The main changes from MAP are:

  • The type of key is replaced by a type constructor 'k key. Because of that, most higher-order arguments require higher-ranking polymorphism, and we provide records that allows to pass them as arguments (e.g. polyiter, polymap, polyunion, etc.)
  • The type of the map (t) is still parameterized by an argument ('m t)
  • The type of value depend on both the type of the key and the type of the map, hence the type ('k,'m) value.
  • The type of some return values, like key-value pairs, must be concealed existentially, hence the KeyValue constructor.

Parameters

module Node : NODE with type 'a key = 'a Key.t and type ('key, 'map) value = ('key, 'map) Value.t

Signature

include BASE_MAP with type 'a key = 'a Key.t with type ('k, 'm) value = ('k, 'm) Value.t with type 'm t = 'm Node.t
include NODE_WITH_FIND with type 'a key = 'a Key.t with type ('k, 'm) value = ('k, 'm) Value.t with type 'm t = 'm Node.t
include NODE with type 'a key = 'a Key.t with type ('k, 'm) value = ('k, 'm) Value.t with type 'm t = 'm Node.t

Types

type 'a key = 'a Key.t

The type of keys.

type ('k, 'm) value = ('k, 'm) Value.t

The type of value, which depends on the type of the key and the type of the map.

type 'm t = 'm Node.t

The type of the map, which is parameterized by a type.

Constructors: build values

val empty : 'map t

The empty map

val leaf : 'key key -> ('key, 'map) value -> 'map t

A singleton leaf, similar to BASE_MAP.singleton

val branch : prefix:int -> branching_bit:int -> tree0:'map t -> tree1:'map t -> 'map t

A branch node. This shouldn't be called externally unless you know what you're doing! Doing so could easily break the data structure's invariants.

When called, it assumes that:

  • Neither tree0 nor tree1 should be empty.
  • branching_bit should have a single bit set
  • prefix should be normalized (bits below branching_bit set to zero)
  • All elements of tree0 should have their to_int start by prefix followed by 0 at position branching_bit).
  • All elements of tree1 should have their to_int start by prefix followed by 0 at position branching_bit).

Destructors: access the value

type 'map view = private
  1. | Empty : 'map view
    (*

    Can happen only at the toplevel: there is no empty interior node.

    *)
  2. | Branch : {
    1. prefix : int;
    2. branching_bit : int;
    3. tree0 : 'map t;
    4. tree1 : 'map t;
    } -> 'map view
    (*

    Same constraints as branch:

    • branching_bit contains only one bit set; the corresponding mask is (branching_bit - 1).
    • prefix is normalized: the bits below the branching_bit are set to zero (i.e. prefix & (branching_bit - 1) = 0).
    • All elements of tree0 should have their to_int start by prefix followed by 0 at position branching_bit).
    • All elements of tree1 should have their to_int start by prefix followed by 0 at position branching_bit).
    *)
  3. | Leaf : {
    1. key : 'key key;
    2. value : ('key, 'map) value;
    } -> 'map view
    (*

    A key -> value mapping.

    *)

This makes the map nodes accessible to the pattern matching algorithm; this corresponds 1:1 to the SimpleNode implementation. This just needs to be copy-and-pasted for every node type.

val is_empty : 'map t -> bool

Check if the map is empty. Should be constant time.

val view : 'a t -> 'a view

Convert the map to a view. Should be constant time.

Sourceval find : 'key key -> 'map t -> ('key, 'map) value

find key map returns the value associated with key in map if present.

Sourceval find_opt : 'key key -> 'map t -> ('key, 'map) value option

Same as find, but returns None for Not_found

Sourcetype 'map key_value_pair =
  1. | KeyValue : 'a key * ('a, 'map) value -> 'map key_value_pair

Existential wrapper for the 'a parameter in a 'a key, ('a,'map) value pair

Basic functions

Sourceval unsigned_min_binding : 'a t -> 'a key_value_pair

unsigned_min_binding m is minimal binding KeyValue(k,v) of the map, using the unsigned order on KEY.to_int.

Sourceval unsigned_max_binding : 'a t -> 'a key_value_pair

unsigned_max_binding m is maximal binding KeyValue(k,v) of the map, using the unsigned order on KEY.to_int.

Sourceval singleton : 'a key -> ('a, 'b) value -> 'b t

Create a map with a single binding.

Sourceval cardinal : 'a t -> int

The size of the map, O(n) complexity

Sourceval is_singleton : 'a t -> 'a key_value_pair option

is_singleton m returns Some(KeyValue(k,v)) if and only if m contains a unique binding k->v.

Sourceval mem : 'key key -> 'map t -> bool

mem key map returns true iff key is bound in map, O(log(n)) complexity.

Sourceval remove : 'key key -> 'map t -> 'map t

Returns a map with the element removed, O(log(n)) complexity. Returns a physically equal map if the element is absent.

Sourceval pop_unsigned_minimum : 'map t -> ('map key_value_pair * 'map t) option

pop_unsigned_minimum m returns None if is_empty m, or Some(key,value,m') where (key,value) = unsigned_min_binding m and m' = remove m key. Uses the unsigned order on KEY.to_int. O(log(n)) complexity.

Sourceval pop_unsigned_maximum : 'map t -> ('map key_value_pair * 'map t) option

pop_unsigned_maximum m returns None if is_empty m, or Some(key,value,m') where (key,value) = unsigned_max_binding m and m' = remove m key. Uses the unsigned order on KEY.to_int. O(log(n)) complexity.

Sourceval insert : 'a key -> (('a, 'map) value option -> ('a, 'map) value) -> 'map t -> 'map t

insert key f map modifies or insert an element of the map; f takes None if the value was not previously bound, and Some old where old is the previously bound value otherwise. The function preserves physical equality when possible. O(log(n)) complexity. Preserves physical equality if the new value is physically equal to the old.

Sourceval update : 'a key -> (('a, 'map) value option -> ('a, 'map) value option) -> 'map t -> 'map t

update key f map modifies, insert, or remove an element from the map; f takes None if the value was not previously bound, and Some old where old is the previously bound value otherwise. The function preserves physical equality when possible. It returns None if the element should be removed O(log(n)) complexity. Preserves physical equality if the new value is physically equal to the old.

Sourceval add : 'key key -> ('key, 'map) value -> 'map t -> 'map t

Unconditionally adds a value in the map (independently from whether the old value existed). O(log(n)) complexity. Preserves physical equality if the new value is physically equal to the old.

Iterators

Sourceval split : 'key key -> 'map t -> 'map t * ('key, 'map) value option * 'map t

split key map splits the map into:

  • submap of map whose keys are smaller than key
  • value associated to key (if present)
  • submap of map whose keys are bigger than key

Where the order is given by the unsigned order on KEY.to_int.

Sourcetype 'map polyiter = {
  1. f : 'a. 'a key -> ('a, 'map) value -> unit;
}
Sourceval iter : 'map polyiter -> 'map t -> unit

iter f m calls f.f on all bindings of m, in the unsigned order on KEY.to_int

Sourcetype ('acc, 'map) polyfold = {
  1. f : 'a. 'a key -> ('a, 'map) value -> 'acc -> 'acc;
}
Sourceval fold : ('acc, 'map) polyfold -> 'map t -> 'acc -> 'acc

fold f m acc returns f.f key_n value_n (... (f.f key_1 value_1 acc)) where (key_1, value_1) ... (key_n, value_n) are the bindings of m, in the unsigned order on KEY.to_int.

Sourcetype ('acc, 'map) polyfold2 = {
  1. f : 'a. 'a key -> ('a, 'map) value -> ('a, 'map) value -> 'acc -> 'acc;
}
Sourceval fold_on_nonequal_inter : ('acc, 'map) polyfold2 -> 'map t -> 'map t -> 'acc -> 'acc

fold_on_nonequal_inter f m1 m2 acc returns f.f key_n value1_n value2n (... (f.f key_1 value1_1 value2_1 acc)) where (key_1, value1_1, value2_1) ... (key_n, value1_n, value2_n) are the bindings that exist in both maps (m1 ∩ m2) whose values are physically different. Calls to f.f are performed in the unsigned order of KEY.to_int.

Sourcetype ('acc, 'map) polyfold2_union = {
  1. f : 'a. 'a key -> ('a, 'map) value option -> ('a, 'map) value option -> 'acc -> 'acc;
}
Sourceval fold_on_nonequal_union : ('acc, 'map) polyfold2_union -> 'map t -> 'map t -> 'acc -> 'acc

fold_on_nonequal_union f m1 m2 acc returns f.f key_n value1_n value2n (... (f.f key_1 value1_1 value2_1 acc)) where (key_1, value1_1, value2_1) ... (key_n, value1_n, value2_n) are the bindings that exists in either map (m1 ∪ m2) whose values are physically different. Calls to f.f are performed in the unsigned order of KEY.to_int.

Sourcetype 'map polypredicate = {
  1. f : 'a. 'a key -> ('a, 'map) value -> bool;
}
Sourceval filter : 'map polypredicate -> 'map t -> 'map t

filter f m returns the submap of m containing the bindings k->v such that f.f k v = true. f.f is called in the unsigned order of KEY.to_int

Sourceval for_all : 'map polypredicate -> 'map t -> bool

for_all f m checks that f holds on all bindings of m. Short-circuiting.

In the following, the *no_share function allows taking arguments of different types (but cannot share subtrees of the map), while the default functions attempt to preserve and benefit from sharing the subtrees (using physical equality to detect sharing).

Sourcetype ('map1, 'map2) polymap = {
  1. f : 'a. ('a, 'map1) value -> ('a, 'map2) value;
}
Sourceval map : ('map, 'map) polymap -> 'map t -> 'map t
Sourceval map_no_share : ('map1, 'map2) polymap -> 'map1 t -> 'map2 t

map f m and map_no_share f m replace all bindings (k,v) by (k, f.f v). Bindings are examined in the unsigned order of KEY.to_int.

Sourcetype ('map1, 'map2) polymapi = {
  1. f : 'a. 'a key -> ('a, 'map1) value -> ('a, 'map2) value;
}
Sourceval mapi : ('map, 'map) polymapi -> 'map t -> 'map t
Sourceval mapi_no_share : ('map1, 'map2) polymapi -> 'map1 t -> 'map2 t

mapi f m and mapi_no_share f m replace all bindings (k,v) by (k, f.f k v). Bindings are examined in the unsigned order of KEY.to_int.

Sourcetype ('map1, 'map2) polyfilter_map = {
  1. f : 'a. 'a key -> ('a, 'map1) value -> ('a, 'map2) value option;
}
Sourceval filter_map : ('map, 'map) polyfilter_map -> 'map t -> 'map t
Sourceval filter_map_no_share : ('map1, 'map2) polyfilter_map -> 'map1 t -> 'map2 t

filter_map m f and filter_map_no_share m f remove the bindings (k,v) for which f.f k v is None, and replaces the bindings (k,v) for which f.f k v is Some v' by (k,v'). Bindings are examined in the unsigned order of KEY.to_int.

Sourcetype 'map polypretty = {
  1. f : 'a. Format.formatter -> 'a key -> ('a, 'map) value -> unit;
}
Sourceval pretty : ?pp_sep:(Format.formatter -> unit -> unit) -> 'map polypretty -> Format.formatter -> 'map t -> unit

Pretty-prints a map using the given formatter. pp_sep is called once between each binding, it defaults to Format.pp_print_cut. Bindings are printed in the unsigned order of KEY.to_int

Functions on pairs of maps

This section regroups functions that act on pairs of maps.

These functions are where Patricia trees offer substantial speedup compared to Stdlib's Maps:

  • We can often avoid exploring physically equal subtrees (for equality tests, inclusion tests, union, intersection, difference). This yields important performance gains when combining maps that derive from a common ancestor or when using Hash-consed maps and sets maps which have a lot of elements in common.
  • We can also avoid visiting a subtree when merging with Empty (for union, intersection and difference).

To do so safely, we have specialized versions of these functions that assume properties of the function parameter (reflexive relation, idempotent operation, etc.)

When we cannot enjoy these properties, our functions explicitly say so (with a nonreflexive or nonidempotent prefix). The names are a bit long, but having these names avoids using an ineffective code by default, by forcing to know and choose between the fast and slow version.

In general, the fast versions of these function will be on O(log n + d) where n is the size of the maps being joined and d the size of their difference (number of keys bound in both maps to non-physically equal values). The slow version is O(n).

Many of these are high-order functions, taking as argument a function f that operates on elements. Due to restrictions with higher-order polymorphism, we need to wrap the function f in a record, which has a single field f. These is what the polyXXX types are for.

Comparing two maps

Functions for equality, inclusion, and test for disjointness.

Sourcetype ('map1, 'map2) polysame_domain_for_all2 = {
  1. f : 'a. 'a key -> ('a, 'map1) value -> ('a, 'map2) value -> bool;
}
Sourceval reflexive_same_domain_for_all2 : ('map, 'map) polysame_domain_for_all2 -> 'map t -> 'map t -> bool

reflexive_same_domain_for_all2 f m1 m2 is true if and only if

  • m1 and m2 have the same domain (set of keys)
  • for all bindings (k, v1) in m1 and (k, v2) in m2, f.f k v1 v2 holds

Assumes f.f is reflexive, i.e. f.f k v v = true to skip calls to equal subtrees. Calls f.f in ascending unsigned order of KEY.to_int. Exits early if the domains mismatch or if f.f returns false.

It is useful to implement equality on maps:

  # let equal m1 m2 = MyMap.reflexive_same_domain_for_all2
    { f = fun _ v1 v2 -> MyValue.equal v1 v2}
    m1 m2;;
  val equal : 'a MyMap.t -> 'a MyMap.t -> bool = <fun>
Sourceval nonreflexive_same_domain_for_all2 : ('map1, 'map2) polysame_domain_for_all2 -> 'map1 t -> 'map2 t -> bool

nonreflexive_same_domain_for_all2 f m1 m2 is the same as reflexive_same_domain_for_all2, but doesn't assume f.f is reflexive. It thus calls f.f on every binding, in ascending unsigned order of KEY.to_int. Exits early if the domains mismatch or if f.f returns false.

Sourceval reflexive_subset_domain_for_all2 : ('map, 'map) polysame_domain_for_all2 -> 'map t -> 'map t -> bool

reflexive_subset_domain_for_all2 f m1 m2 is true if and only if

  • m1's domain is a subset of m2's. (all keys defined in m1 are also defined in m2)
  • for all bindings (k, v1) in m1 and (k, v2) in m2, f.f k v1 v2 holds

Assumes f.f is reflexive, i.e. f.f k v v = true to skip calls to equal subtrees. Calls f.f in ascending unsigned order of KEY.to_int. Exits early if the domains mismatch.

It is useful to implement inclusion test on maps:

  # let is_submap m1 m2 = MyMap.reflexive_subset_domain_for_all2
    { f = fun _ v1 v2 -> MyValue.equal v1 v2}
    m1 m2;;
  val is_submap : 'a MyMap.t -> 'a MyMap.t -> bool = <fun>
Sourcetype 'map polycompare = {
  1. f : 'a. 'a key -> ('a, 'map) value -> ('a, 'map) value -> int;
}
Sourceval reflexive_compare : 'a polycompare -> 'a t -> 'a t -> int

reflexive_compare f m1 m2 is an order relation on maps. m1 and m2 are equal (return 0) if they have the same domain and for all bindings (k,v) in m1, (k,v') in m2, we have f v v' = 0.

m1 is considered striclty smaller than m2 (return a negative integer) when the first difference (lowest key in the unsigned order of KEY.to_int) is either a shared binding (k,v) in m1, (k,v') in m2 with f v v' < 0, or a binding that only occurs in m2.

Assumes that f v v = 0.

  • since v0.11.0
Sourceval disjoint : 'a t -> 'a t -> bool

disjoint m1 m2 is true iff m1 and m2 have disjoint domains

Combining two maps

We provide many functions that operate on pairs of maps for computing intersection, union, difference... Here is a short summary of what each of one returns when applied to two maps m1 and m2. Here y is physically the same value in m1 and m2.

Keys

a

b

c

d

m1

x

y

z

m2

y

u

v

idempotent_union f m1 m2

x

y

f c z u

v

slow_merge f m1 m2[1][2]

f a x _

f b y y

f c z u

f d _ v

idempotent_inter f m1 m2

y

f c z u

idempotent_inter_filter f m1 m2[1]

y

f c z u

nonidempotent_inter_no_share f m1 m2

f b y y

f c z u

difference f m1 m2[1]

x

f c z u

symmetric_difference f m1 m2[1]

x

f c z u

v

[1]: Here f returns an optional value, returning None removes the binding.

[2]: Here the function f actually takes option as arguments, omitted for brevity. _ is None.

Sourcetype ('map1, 'map2, 'map3) polyunion = {
  1. f : 'a. 'a key -> ('a, 'map1) value -> ('a, 'map2) value -> ('a, 'map3) value;
}
Sourceval idempotent_union : ('a, 'a, 'a) polyunion -> 'a t -> 'a t -> 'a t

idempotent_union f map1 map2 returns a map whose keys is the union of the keys of map1 and map2. f.f is used to combine the values of keys mapped in both maps.

Assumes f.f idempotent (i.e. f key value value == value) f.f is called in the unsigned order of KEY.to_int. f.f is never called on physically equal values. Preserves physical equality as much as possible. Complexity is O(log(n)*Delta) where Delta is the number of different keys between map1 and map2.

Sourcetype ('map1, 'map2, 'map3) polyinter = {
  1. f : 'a. 'a key -> ('a, 'map1) value -> ('a, 'map2) value -> ('a, 'map3) value;
}
Sourceval idempotent_inter : ('a, 'a, 'a) polyinter -> 'a t -> 'a t -> 'a t

idempotent_inter f map1 map2 returns a map whose keys is the intersection of the keys of map1 and map2. f.f is used to combine the values a key is mapped in both maps.

Assumes f.f idempotent (i.e. f.f key value value == value) f.f is called in the unsigned order of KEY.to_int. f.f is never called on physically equal values. Preserves physical equality as much as possible. Complexity is O(log(n)*Delta) where Delta is the number of different keys between map1 and map2.

Sourceval nonidempotent_inter_no_share : ('a, 'b, 'c) polyinter -> 'a t -> 'b t -> 'c t

nonidempotent_inter_no_share f map1 map2 is the same as idempotent_inter but doesn't preverse physical equality, doesn't assume f.f is idempotent, and can change the type of values. f.f is called on every shared binding. f.f is called in increasing unsigned order of KEY.to_int. O(n) complexity

Sourcetype ('map1, 'map2, 'map3) polyinterfilter = {
  1. f : 'a. 'a key -> ('a, 'map1) value -> ('a, 'map2) value -> ('a, 'map3) value option;
}
Sourceval idempotent_inter_filter : ('a, 'a, 'a) polyinterfilter -> 'a t -> 'a t -> 'a t

idempotent_inter_filter f map1 map2 is the same as idempotent_inter but f.f can return None to remove a binding from the resutling map.

Sourcetype ('map1, 'map2, 'map3) polymerge = {
  1. f : 'a. 'a key -> ('a, 'map1) value option -> ('a, 'map2) value option -> ('a, 'map3) value option;
}
Sourceval slow_merge : ('map1, 'map2, 'map3) polymerge -> 'map1 t -> 'map2 t -> 'map3 t

This is the same as Stdlib.Map.S.merge

Sourcetype ('a, 'b) polydifference = ('a, 'b, 'a) polyinterfilter
Sourceval symmetric_difference : ('a, 'a) polydifference -> 'a t -> 'a t -> 'a t

symmetric_difference f map1 map2 returns a map comprising of the bindings of map1 that aren't in map2, and the bindings of map2 that aren't in map1.

Bindings that are both in map1 and map2, but with non-physically equal values are passed to f.f. If f.f returns Some v then v is used as the new value, otherwise the binding is dropped.

Assumes f.f is none on equal values (i.e. f.f key value value == None) f.f is called in increasing unsigned order of KEY.to_int. f.f is never called on physically equal values.

Complexity is O(log n + d) where n is the size of the maps, and d the size of the difference.

  • since v0.11.0
Sourceval difference : ('a, 'a) polydifference -> 'a t -> 'a t -> 'a t

difference f map1 map2 returns the map containing the bindings of map1 that aren't in map2. For keys present in both maps but with different values, f.f is called. If it returns Some v, then binding k,v is kept, else k is dropped.

Assumes f.f is None on the diagonal: f.f k v v = None. f.f is called in the unsigned order of KEY.to_int. f.f is never called on physically equal values.

  • since v0.11.0

Min/max of intersection

Sourcetype ('a, 'b) key_value_value =
  1. | KeyValueValue : 'k key * ('k, 'a) value * ('k, 'b) value -> ('a, 'b) key_value_value

Existential wrapper for a key with two values

  • since v0.11.0
Sourceval min_binding_inter : 'a t -> 'b t -> ('a, 'b) key_value_value option

min_binding_inter m1 m2 is the minimal binding of the intersection. I.E. the KeyValueValue(k,v1,v2) such that (k,v1) is in m1, (k,v2) is in m2, and k is minimal using the unsigned order on keys.

Returns None if and only if the intersection is empty.

It is rougthly equivalent to calling unsigned_min_binding on nonindempotent_inter_no_share f m1 m2, but can be faster.

  • since v0.11.0
Sourceval max_binding_inter : 'a t -> 'b t -> ('a, 'b) key_value_value option

max_binding_inter m1 m2 is the same as min_binding_inter, but returns the maximum key instead of the minimum.

  • since v0.11.0

Conversion functions

Sourceval to_seq : 'a t -> 'a key_value_pair Seq.t

to_seq m iterates the whole map, in increasing unsigned order of KEY.to_int

Sourceval to_rev_seq : 'a t -> 'a key_value_pair Seq.t

to_rev_seq m iterates the whole map, in decreasing unsigned order of KEY.to_int

Sourceval add_seq : 'a key_value_pair Seq.t -> 'a t -> 'a t

add_seq s m adds all bindings of the sequence s to m in order.

Sourceval of_seq : 'a key_value_pair Seq.t -> 'a t

of_seq s creates a new map from the bindings of s. If a key is bound multiple times in s, the latest binding is kept

Sourceval of_list : 'a key_value_pair list -> 'a t

of_list l creates a new map from the bindings of l. If a key is bound multiple times in l, the latest binding is kept

Sourceval to_list : 'a t -> 'a key_value_pair list

to_list m returns the bindings of m as a list, in increasing unsigned order of KEY.to_int

Sourcemodule WithForeign (Map2 : NODE_WITH_FIND with type 'a key = 'a key) : sig ... end

Operation with maps/set of different types. Map2 must use the same KEY.to_int function.