An module for fuzzy searching the first match in a sequence
This module is intended to be used for searching as well as an example of properly using the low-level functions in the WuManber
module.
Make (P) (M)
creates a set of functions for finding the first match in a sequnce and reporting the number of errors and the position at which the match ends. See the functor documentation for more details.
Since this module is supposed to serve as an example of working with the low level api, we provide the code for the module as follows.
Annotated Source code for module
open WuManber
module Make (P : Patterns.PatternWithElemEquality) (M : Matcher.Matcher with type pattern := P.t and type elem := P.elem) = struct
We are creating a functor that ranges over patterns and matchers for those patterns.
A patterns contains characters, and a matcher takes a character c
and outputs a bit-vector indicating which characters of the pattern match c
.
module WM = WuManber
We will use the low level functions initial_bvs
and next_bvs
from the WuManber.WuManber
module.
We think of initial_bvs
as an initial state of an automaton, and next_bvs
makes the automaton take steps to the next state using outputs of matchers as inputs. The WuManber.BitOps.match_error
and WuManber.BitOps.is_match
functions are used to check if the automaton is in a final state, i.e. if there is a fuzzy match with the pattern.
let first_match ~pattern ~k (s : P.elem Seq.t) =
let pattern_length = P.length pattern in
let matcher = new M.matcher pattern in
let rec find count bvs s =
match BitOps.match_error ~pattern_length bvs with
| Some n -> Some (count, n, bvs)
| None ->
begin match s () with
| Seq.Cons (c, s) -> find (count + 1) (WM.next_bvs ~mismatch:(matcher#mismatch c) bvs) s
| Seq.Nil -> None
end
in
find 0 (WM.initial_bvs ~k) s
The above function creates a matcher from the ~pattern
, and uses it to go through the sequence s
, one character at a time, until a match in found.
We next implement the right leaning variant of the function.
module WMR = RightLeaningWuManber
The WuManber.RightLeaningWuManber
module contains the low level functions initial_bvs
, next_bvs
, and feed_sentinel
. The functions and notions of automaton are similar to the regular version of the algorithm.
But this version of the algorithm does not match delete edits at the end of the pattern, and so matches at the end of the text won't be reported.
To recover the deletes edits at the very end of the text, we feed sentinel characters into the automaton using the feed_sentinel
function. We need feed a maximum of k
sentinel characters into the automaton to check for matches.
let first_right_leaning_match ~pattern ~k (s : P.elem Seq.t) =
let pattern_length = P.length pattern in
let matcher = new M.matcher pattern in
let rec find_sentinel count bvs n =
if n = 0 then
None
else
let bvs = WMR.feed_sentinel bvs in
match BitOps.match_error ~pattern_length bvs with
| Some n -> Some (count, n, bvs)
| None ->
find_sentinel count bvs (n - 1)
in
let rec find count bvs s =
match BitOps.match_error ~pattern_length bvs with
| Some n -> Some (count, n, bvs)
| None ->
begin match s () with
| Seq.Cons (c, s) -> find (count + 1) (WMR.next_bvs ~mismatch:(matcher#mismatch c) bvs) s
| Seq.Nil -> find_sentinel count bvs k
end
in
find 0 (WMR.initial_bvs ~k) s
The above function creates a matcher from the ~pattern
, and uses it to go through the sequence s
, one character at a time, until a match in found. If the end of text is reached, k
sentinel characters are fed into the automaton to search for a match at the end of the text.
let report = function
| None -> "Could not find pattern in text"
| Some (c, e, _) -> Printf.sprintf "Pattern matched with %d errors at character %d of text" e c
The above function is a utility function which creates a texual description of the output of the first match functions.
end