package orsetto

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type

Regular expression parsing, search and matching with Unicode text.

Overview

This module implements Unicode regular expression parsing, search and matching in pure Objective Caml. Implementation claims support for Requirements Level 1 (Basic Unicode Support) with the following exceptions:

  1. No support for line boundaries.
  2. No support for word boundaries.
  3. No support for case insensitive matching.
  4. Additional support for the Block enumerated property.

At present, there is no support for the Script_Extensions property.

Modules
module DFA : sig ... end

Deterministic finite automata for Unicode code points.

type t

The type of a compiled regular expression.

val of_text : Ucs_text.t -> t

Use of_text s to make a regular expression denoted by s. Raises Invalid_argment if s does not denote a valid regular expression.

val of_uchars : Uchar.t Seq.t -> t

Use of_uchars s to make a regular expression denoted by the Unicode codepoints in s. Raises Invalid_argment if the characters do not denote a valid regular expression.

val of_dfa_term : DFA.term -> t

Use of_dfa_term s to make a regular expression for recognizing the language term s.

val test : t -> Ucs_text.t -> bool

Use test r t to test whether the text t matches the regular expression r.

val contains : t -> Ucs_text.t -> bool

Use contains r t to test whether r recognizes any substring of t.

Use search r s to search with r in a confluently persistent sequence s for the first accepted subsequence. Returns None if s does not contain a matching subsequence. Otherwise, returns Some (start, limit) where start is the index of the first matching subsequence, and limit is the index after the end of the longest matching subsequence.

val split : t -> Ucs_text.t -> Ucs_text.t Seq.t

Use split r s to split s into a sequence of slices comprising the substrings in s that are separated by disjoint substrings matching r, which are found by searching from left to right. If r does not match any substring in s, then a sequence containing just s is returned, even if s is an empty slice.

val quote : Ucs_text.t -> Ucs_text.t

Use quote s to make a copy of s by converting all the special characters into escape sequences.

val unquote : Ucs_text.t -> Ucs_text.t

Use unquote s to make a copy of s by converting all the escape sequences into ordinary characters.