package term-indexing

  1. Overview
  2. Docs

Term indexing

This library implements some tools to manipulate and index first-order terms. This page introduces the library with some basic examples. If you're ready to dive in the API, look at Term_indexing, especially the functor Term_indexing.Make.

Hash-consed first-order terms

A first-order term is either

  • a variable (represented as an integer), or
  • a primitive symbol applied to a list of sub-terms whose length correspond to the arity of the symbol.

Primitive symbols and their arities are specified by a signature.

For instance, arithmetic expressions such as x_0 + x_1 * x_1 can be represented as terms constructed with primitives for addition, multiplication, etc. Let us define the corresponding signature.

open Term_indexing

(* The type of primitive symbols *)
type prim = Add | Mul | Neg | Float of float

(* [Prim] implements [Intf.Signature].
   We use the [Stdlib]'s polymorphic comparison and hash operators for simplicity.  *)
module Prim : Intf.Signature with type t = prim = struct
  type t = prim

  let compare (x : t) (y : t) = Stdlib.compare x y

  let equal (x : t) (y : t) = Stdlib.( = ) x y

  let hash = Hashtbl.hash

  let pp fmtr = function
    | Add -> Format.fprintf fmtr "Add"
    | Mul -> Format.fprintf fmtr "Mul"
    | Neg -> Format.fprintf fmtr "Neg"
    | Float f -> Format.fprintf fmtr "%.1f" f

  (* Each primitive is associated to an arity, which maps each constructor to its number
     of expected arguments;
     e.g. addition is a binary operation, negation is a unary operation and
     constants are 0-ary.  *)
  let arity = function Add | Mul -> 2 | Neg -> 1 | Float _ -> 0
end

The functor Term_indexing.Make packs all features of the library under a single functor taking an argument of type Intf.Signature.

module Pack = Term_indexing.Make (Prim)
open Pack

The module Term contained in Pack provides operations to create and manipulate hash-consed terms over the given signature (see Intf.Term). Hash-consing is a technique that ensures that terms are allocated at most once: it is guaranteed that structurally equal terms are physically equal. Terms are constructed using the functions Term.prim for primitive applications and Term.var for variables.

Let us define some convenient wrappers to create terms. Note that the correctness of arities is dynamically checked by Term.prim.

let add x y = Term.prim Add [| x; y |]

let mul x y = Term.prim Mul [| x; y |]

let neg x = Term.prim Neg [| x |]

let float f = Term.prim (Prim.Float f) [||]

let var s = Term.var s

The mathematical expression x_0 + x_1 * x_1 can be represented by the term

let t = add (var 0) (mul (var 1) (var 1))

Folding over terms

Term implements functions to fold over terms and perform basic rewriting operations. Rewriting consists in replacing a subterm at a given Path.t with anoter term. For instance, for t = add (var 0) (mul (var 1) (var 1)):

  • the subterm at p = Path.root is t itself
  • the subterm at p = at_index 0 Path.root is var 0
  • the subterm at p = at_index 1 Path.root is mul (var 1) (var 1)
  • the subterm at p = at_index 0 (at_index 1 Path.root) is (var 1)

The function Term.fold allows to fold over all subterms of t along with their paths in t. Let's try:

let all_subterms = Term.fold (fun subt p acc -> (p, subt) :: acc) [] t

let () =
  List.iter
    (fun (p, subt) -> Format.printf "%a -> %a@." Path.pp p Term.pp subt)
    all_subterms
1 -> 1 -> * -> V(1)
0 -> 1 -> * -> V(1)
1 -> * -> Mul(V(1), V(1))
0 -> * -> V(0)
* -> Add(V(0), Mul(V(1), V(1)))

Rewriting

We will illustrate rewriting by implementing some toy constant folding. The Pattern module provides facilities to search for subterms having some particular shape.

We then define some patterns corresponding to terms that can be folded.

(* A pattern matching any float constant. [prim_pred] is a generic predicate on primitives. *)
let float_patt =
  Pattern.(prim_pred (function Float _ -> true | _ -> false) list_empty)

(* A pattern matching an addition of float constants. *)
let add_patt = Pattern.(prim Prim.Add (float_patt @. float_patt @. list_empty))

(* A pattern matching a multiplication of float constants. *)
let mul_patt = Pattern.(prim Prim.Add (float_patt @. float_patt @. list_empty))

(* A pattern matching negation of a float constant. *)
let neg_patt = Pattern.(prim Prim.Neg (float_patt @. list_empty))

Upon detecting such subterms, we will need to reduce them. The following illustrates how to do so using Term.destruct, which performs pattern matching on terms. It takes the following arguments:

  • a function to be called if the term is a primitive, to which the primitive and subterms are passed
  • a function to be called if the term is a variable, to which the variable is passed.
  • the term to be analyzed

get_float extracts the floating point value out of a Float term, or returns None if not possible.

let get_float (term : Term.t) : float option =
  Term.destruct
    (fun prim _ -> match prim with Prim.Float f -> Some f | _ -> None)
    (fun _ -> None)
    term

reduce performs a step of constant folding if possible.

let reduce (term : Term.t) : Term.t option =
  Term.destruct
    (fun prim operands ->
      match (prim, operands) with
      | ((Add | Mul), [| l; r |]) ->
          Option.bind (get_float l) @@ fun l ->
          Option.bind (get_float r) @@ fun r ->
          Option.some
            (match prim with
            | Add -> float (l +. r)
            | Mul -> float (l *. r)
            | _ -> assert false)
      | (Neg, [| x |]) ->
          Option.bind (get_float x) @@ fun x -> Option.some (float (-.x))
      | _ -> Option.none)
    (fun _ -> Option.none)
    term

Constant folding iteratively looks for subterms to simplify until none is left. Pattern.first_matches searches the term for an occurrence of a subterm matching any of the patterns in the provided list. If a pattern is found, we perform the rewrite, print the outcome and continue.

let rec rewrite_until_fixpoint term =
  let matches = Pattern.first_match [add_patt; mul_patt; neg_patt] term in
  match matches with
  | [] -> term
  | path :: _ ->
      let rewritten =
        Term.subst ~term ~path (fun e ->
            match reduce e with
            | Some reduced -> reduced
            | None -> failwith "can't happen")
      in
      Format.printf "%a -> %a@." Term.pp term Term.pp rewritten ;
      rewrite_until_fixpoint rewritten

Let's try this out on some dummy term.

let expression = add (float 1.0) (add (float 2.0) (mul (float 3.0) (float 4.0)))

let normalized = rewrite_until_fixpoint expression

The sequence of rewrites is:

Add(1.0, Add(2.0, Mul(3.0, 4.0))) -> Add(1.0, Add(2.0, 12.0))
Add(1.0, Add(2.0, 12.0)) -> Add(1.0, 14.0)
Add(1.0, 14.0) -> 15.0

Substitutions

Variables denote placeholders for terms that may replace them. This mechanism is mediated through substitutions, which are finitely supported functions from variables to terms. The following is a substitution mapping

  • the variable 0 to the term float 0.0
  • the variable 1 to the term neg (float 42.0)
  • the variable 2 to the term float 2.0
let subst =
  [(0, float 0.0); (1, neg (float 42.0)); (2, float 2.0)]
  |> List.to_seq |> Subst.of_seq

The terms associated to each variable in the domain of a substitution can be obtained through Subst.eval (or Subst.eval_exn for the raising variant).

let () =
  assert (Option.equal Term.equal (Subst.eval 0 subst) (Some (float 0.0)))

let () = assert (Option.equal Term.equal (Subst.eval 3 subst) None)

One can also apply a substitution to the variables contained in a term using Subst.lift.

let term = add (var 1) (mul (var 2) (var 2))

let substituted = Subst.lift subst term

The value substituted is equal to:

Add(Neg(42.0), Mul(2.0, 2.0))

Intermezzo: the refinement preorder on terms

Applying a substitution to a term intuitively "refines" it. More formally, one can define a preorder \le on terms where a term t_1 \le t_2 if there exists a substitution \sigma such that t_1 = \sigma(t_2). The maximal (equivalence class of) elements of this preorder are variables, and the minimal elements are ground terms (i.e. terms without variables). (This is a preorder and not a partial order because terms equal modulo variable renaming generalize each other).

Here is an increasing sequence of terms in the preorder \le:

  • the term t1 = add (float 2.0) (float 2.0) is ground
  • the term t1 refines t2 = add (var 1) (var 1) via the substitution 1 \mapsto float 2.0
  • the term t2 refines t3 = add (var 1) (var 2) via the substitution 2 \mapsto var 1
  • the term t3 refines var 3 via the substitution 3 \mapsto t3

Unification

Some pairs of terms t_1, t_2 admit a common refinement. Formally, a unifier of t_1 and t_2 is a substitution which equates the two terms. A unification problem is a conjunction of equations between terms and a solution is a substitution which when applied to all terms satisfies all equations.

The library provides a module to compute such solutions. Unification proceeds on a state that allows to accumulate equations. Let us create an empty state.

let uf_state = Subst.Unification.empty ()

One can unify terms using Subst.Unification.unify. This function returns None when no unifier can be found, or an updated state in the other case. At any point, we can get a solution from the state using Subst.Unification.subst which returns a substitution.

let t1 = add (mul (float 1.0) (float 2.0)) (var 1)

let t2 = add (var 2) (mul (float 3.0) (float 4.0))

let () =
  match Subst.Unification.unify t1 t2 uf_state with
  | None -> failwith "unification failed"
  | Some uf_state' ->
      let subst = Subst.Unification.subst uf_state' in
      Format.printf "%a@." Subst.pp subst
V(1) -> Mul(3.0, 4.0); V(2) -> Mul(1.0, 2.0)

Indexing

One sometimes need to associate data to terms. In those case, one may simply use a hash table. However, the preorder structure on terms suggests the possibility to perform richer queries:

  • finding all term-value pairs unifiable with the query term
  • finding all term-value pairs generalizing the query term
  • finding all term-value pairs specializing the query term

The module Index allows to associate terms to data and perform these queries. In the example, we use dummy integers as data.

An empty index is created using Index.create.

let index = Index.create ()

Keys-value mappings are inserted using Index.insert. Inserting a value at a given key overwrites the previous value, if any. One can also use Index.update to access the previous value.

let keys =
  [ float 42.0;
    add (float 1.0) (float 2.0);
    add (var 1) (mul (float 2.0) (float 3.0));
    mul (float 1.0) (mul (var 2) (float 4.0));
    neg (neg (add (float 1.0) (var 3)));
    neg (neg (float 1.0));
    neg (float 1.0) ]

let () = List.iteri (fun key -> Index.insert key i index) keys

The worst-case complexity of insertion is linear. In practice complexity depends heavily on term distribution. One can iterate on all terms stored in the index using Index.iter

let () =
  Index.iter
    (fun key v -> Format.printf "%a -> %d@." Term.pp key v)
    index

Note that the iteration order is not the insertion one:

Add(Mul(V(1), 2.0), 2.0) -> 1
Add(V(1), Mul(2.0, 3.0)) -> 6
Mul(1.0, Mul(V(2), 4.0)) -> 2
Neg(Neg(Add(1.0, V(3)))) -> 3
Neg(Neg(1.0)) -> 4
Neg(1.0) -> 5

One can ask for all terms unifiable with a query term, using Index.iter_unifiable:

let query = add (mul (float 3.0) (var 0)) (var 2)

let () =
  Index.iter_unifiable
    (fun key v -> Format.printf "%a -> %d@." Term.pp key)
    index
    query
Add(Mul(V(1), 2.0), 2.0)
Add(V(1), Mul(2.0, 3.0))

Note that there are no terms in keys that specialize nor generalize query. However we can try to find all terms specializing neg (var 0):

Neg(Neg(Add(1.0, V(3))))
Neg(Neg(1.0))
Neg(1.0)

Similarly, we may look for all terms generalizing neg (neg (add (float 1.0) (float 2.0))). There's a single one:

Neg(Neg(Add(1.0, V(3))))

If you need maximum performance, it is recommended to use the transient versions of these iterators, which expose directly the underlying term representation and do not perform the conversion to Term.t. These are

Have a look at the functions manipulating Index.internal_term. In some situations, one can expect one order of magnitude better performance compared to the non-transient iterators. However, do keep in mind that the lifetime of these internal terms ends when the closure passed to the iterator returns!

OCaml

Innovation. Community. Security.