package orsetto

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type

Regular expression parsing, search and matching.

Overview

This module implements simple regular expression parsing, search and matching in pure Objective Caml for 8-bit extended ASCII text.

Use any of the following constructions in regular expressions:

  • \n Matches LF ("newline") character.
  • \t Matches TAB character.
  • \r Matches RETURN character.
  • \a Matches an alphabetical character.
  • \d Matches a decimal digit character.
  • \i Matches an alphanumerical character.
  • \s Matches a TAB, LF, VT, FF, CR or SPACE (whitespace) character.
  • \w Matches a character other than a whitespace character.
  • \xNN Matches the character with hexadecimal code NN.
  • \DDD Matches the character with decimal code DDD, where DDD is a three digit number between 000 and 255.
  • \c_ Matches the control character corresponding to the subsequent printable character, e.g. \cA is CONTROL-A, and \c[ is ESCAPE.
  • . Matches any character except newline.
  • * (postfix) Matches the preceding expression, zero, one or several times in sequence.
  • + (postfix) Matches the preceding expression, one or several times in sequence.
  • ? (postfix) Matches the preceding expression once or not at all.
  • [..] Character set. Ranges are denoted with '-', as in [a-z]. An initial '^', as in [^0-9], complements the set. Special characters in the character set syntax may be included in the set by escaping them with a backtick, e.g. [`^```]] is a set containing three characters: the carat, the backtick and the right bracket characters.
  • (..|..) Alternatives. Matches one of the expressions between the parentheses, which are separated by vertical bar characters.
  • \_ Escaped special character. The special characters are '\\', '.', '*', '+', '?', '(', '|', ')', '['.
Interface
module DFA : sig ... end

The deterministic finite automata on octet character symbols.

type t

The type of a compiled regular expression.

val of_string : string -> t

Use of_string s to make a regular expression denoted by s. Raises Invalid_argment if s does not denote a valid regular expression.

val of_chars : char Seq.t -> t

Use of_chars s to make a regular expression denoted by the characters in s. Raises Invalid_argment if the characters do not denote a valid regular expression.

val of_dfa_term : DFA.term -> t

Use of_dfa_term s to make a regular expression for recognizing the language term s.

val test : t -> string -> bool

Use test r s to test whether r recognizes s. Returns true if all the characters in s are not rejected and the DFA reaches at least one final state, otherwise returns false.

val contains : t -> string -> bool

Use contains r s to test whether r recognizes any substring of s.

Use search r s to search with r in a confluently persistent sequence s for the first accepted subsequence. Returns None if s does not contain a matching subsequence. Otherwise, returns Some (start, limit) where start is the index of the first matching subsequence, and limit is the index after the end of the longest matching subsequence.

val split : t -> string Cf_slice.t -> string Cf_slice.t Seq.t

Use split r s to split s into a sequence of slices comprising the substrings in s that are separated by disjoint substrings matching r, which are found by searching from left to right. If r does not match any substring in s, then a sequence containing just s is returned, even if s is an empty slice.

val quote : string -> string

Use quote s to make a copy of s by converting all the special characters into escape sequences.

val unquote : string -> string

Use unquote s to make a copy of s by converting all the escape sequences into ordinary characters.