package search

  1. Overview
  2. Docs

A functor for building a Tfidf search index over different types of document.

Parameters

module Uid : Uid

Signature

type t
type key = Uid.t
module Witness : sig ... end
module Uid : sig ... end
type 'v uid = 'v Uid.witness

A value of type 'v uid can be used to uniquely identify documents of type 'a.

type binding =
  1. | KV : ('v uid * 'v) -> binding
    (*

    A binding is returned when searching in a heterogeneous search index.

    *)
type doc = binding

Documents are bindings.

val index : t -> uid:'doc uid -> token:string -> doc -> unit

index t doc uid indexes a given document doc in t with a unique identifier uid.

val add_document : t -> 'doc uid -> key -> 'doc -> unit

Adds a new document to the indexer

val apply : 'v uid -> default:'a -> ('v -> 'a) -> doc -> 'a

apply uid ~default fn doc runs the function fn on doc if uid identifies the types as being the same, otherwise it returns default.

val apply_exn : 'v uid -> ('v -> 'a) -> doc -> 'a

Like apply except without a default return value so it may raise Invalid_argument _.

val add_index : t -> 'doc uid -> ('doc -> string) -> unit

Adds a new index and re-indexes everything.

val add_indexes : t -> 'doc uid -> ('doc -> string) list -> unit

Same as add_index but allows you to add multiple indexes at a time before re-indexing occurs.

search t k searches the index t using k returning the possible bindings.

val empty : ?santiser:(string -> string) -> ?strategy:(string -> string list) -> ?tokeniser:(string -> string list) -> unit -> t

Create a new empty search index.

  • parameter sanitiser

    Run on each token to normalise them, by default this is String.lowercase_ascii

  • parameter strategy

    The indexing strategy, by default this is a prefixing strategy such that abc is indexed with a, ab and abc

  • parameter tokeniser

    Turns your documents into tokens.

val pp : Stdlib.Format.formatter -> t -> unit

Dumps the search index.

OCaml

Innovation. Community. Security.