package re

  1. Overview
  2. Docs
type t

Regular expression

type re

Compiled regular expression

module Group : sig ... end

Manipulate matching groups.

type groups = Group.t
  • deprecated Use Group.t

Compilation and execution of a regular expression

val compile : t -> re

Compile a regular expression into an executable version that can be used to match strings, e.g. with exec.

val exec : ?pos:int -> ?len:int -> re -> string -> Group.t

exec re str searches str for a match of the compiled expression re, and returns the matched groups if any.

More specifically, when a match exists, exec returns a match that starts at the earliest position possible. If multiple such matches are possible, the one specified by the match semantics described below is returned.

  • parameter pos

    optional beginning of the string (default 0)

  • parameter len

    length of the substring of str that can be matched (default -1, meaning to the end of the string)

  • raises Not_found

    if the regular expression can't be found in str

    Note that exec re str ~pos ~len is not equivalent to exec re (String.sub str pos len). This transformation changes the meaning of some constructs (bos, eos, whole_string and leol), and zero-width assertions like bow or eow look at characters before pos and after pos + len.

val exec_opt : ?pos:int -> ?len:int -> re -> string -> Group.t option

Similar to exec, but returns an option instead of using an exception.

val execp : ?pos:int -> ?len:int -> re -> string -> bool

Similar to exec, but returns true if the expression matches, and false if it doesn't. This function is more efficient than calling exec or exec_opt and ignoring the returned group.

val exec_partial : ?pos:int -> ?len:int -> re -> string -> [ `Full | `Partial | `Mismatch ]

More detailed version of exec_p. `Full is equivalent to true, while `Mismatch and `Partial are equivalent to false, but `Partial indicates the input string could be extended to create a match.

module Mark : sig ... end

Marks

High Level Operations

type split_token = [
  1. | `Text of string
    (*

    Text between delimiters

    *)
  2. | `Delim of Group.t
    (*

    Delimiter

    *)
]
val all : ?pos:int -> ?len:int -> re -> string -> Group.t list

Repeatedly calls exec on the given string, starting at given position and length.

type 'a gen = unit -> 'a option
val all_gen : ?pos:int -> ?len:int -> re -> string -> Group.t gen
  • deprecated Use Seq.all
val all_seq : ?pos:int -> ?len:int -> re -> string -> Group.t Seq.t
  • deprecated Use Seq.all
val matches : ?pos:int -> ?len:int -> re -> string -> string list

Same as all, but extracts the matched substring rather than returning the whole group. This basically iterates over matched strings

val matches_gen : ?pos:int -> ?len:int -> re -> string -> string gen
  • deprecated Use Seq.matches
val matches_seq : ?pos:int -> ?len:int -> re -> string -> string Seq.t
  • deprecated Use Seq.matches
val split : ?pos:int -> ?len:int -> re -> string -> string list

split re s splits s into chunks separated by re. It yields the chunks themselves, not the separator. For instance this can be used with a whitespace-matching re such as "[\t ]+".

val split_gen : ?pos:int -> ?len:int -> re -> string -> string gen
  • deprecated Use Seq.split
val split_seq : ?pos:int -> ?len:int -> re -> string -> string Seq.t
  • deprecated Use Seq.split
val split_full : ?pos:int -> ?len:int -> re -> string -> split_token list

split re s splits s into chunks separated by re. It yields the chunks along with the separators. For instance this can be used with a whitespace-matching re such as "[\t ]+".

val split_full_gen : ?pos:int -> ?len:int -> re -> string -> split_token gen
  • deprecated Use Seq.split_full
val split_full_seq : ?pos:int -> ?len:int -> re -> string -> split_token Seq.t
  • deprecated Use Seq.split_full
module Seq : sig ... end
val replace : ?pos:int -> ?len:int -> ?all:bool -> re -> f:(Group.t -> string) -> string -> string

replace ~all re ~f s iterates on s, and replaces every occurrence of re with f substring where substring is the current match. If all = false, then only the first occurrence of re is replaced.

val replace_string : ?pos:int -> ?len:int -> ?all:bool -> re -> by:string -> string -> string

replace_string ~all re ~by s iterates on s, and replaces every occurrence of re with by. If all = false, then only the first occurrence of re is replaced.

String expressions (literal match)

val str : string -> t
val char : char -> t

Basic operations on regular expressions

val alt : t list -> t

Alternative.

alt [] is equivalent to empty.

By default, the leftmost match is preferred (see match semantics below).

val seq : t list -> t

Sequence

val empty : t

Match nothing

val epsilon : t

Empty word

val rep : t -> t

0 or more matches

val rep1 : t -> t

1 or more matches

val repn : t -> int -> int option -> t

repn re i j matches re at least i times and at most j times, bounds included. j = None means no upper bound.

val opt : t -> t

0 or 1 matches

String, line, word

We define a word as a sequence of latin1 letters, digits and underscore.

val bol : t

Beginning of line

val eol : t

End of line

val bow : t

Beginning of word

val eow : t

End of word

val bos : t

Beginning of string. This differs from start because it matches the beginning of the input string even when using ~pos arguments:

let b = execp (compile (seq [ bos; str "a" ])) "aa" ~pos:1 in
assert (not b)
val eos : t

End of string. This is different from stop in the way described in bos.

val leol : t

Last end of line or end of string

val start : t

Initial position. This differs from bos because it takes into account the ~pos arguments:

let b = execp (compile (seq [ start; str "a" ])) "aa" ~pos:1 in
assert b
val stop : t

Final position. This is different from eos in the way described in start.

val word : t -> t

Word

val not_boundary : t

Not at a word boundary

val whole_string : t -> t

Only matches the whole string, i.e. fun t -> seq [ eos; t; bos ].

Match semantics

A regular expression frequently matches a string in multiple ways. For instance exec (compile (opt (str "a"))) "ab" can match "" or "a". Match semantic can be modified with the functions below, allowing one to choose which of these is preferable.

By default, the leftmost branch of alternations is preferred, and repetitions are greedy.

Note that the existence of matches cannot be changed by specifying match semantics. seq [ bos; str "a"; non_greedy (opt (str "b")); eos ] will match when applied to "ab". However if seq [ bos; str "a"; non_greedy (opt (str "b")) ] is applied to "ab", it will match "a" rather than "ab".

Also note that multiple match semantics can conflict. In this case, the one executed earlier takes precedence. For instance, any match of shortest (seq [ bos; group (rep (str "a")); group (rep (str "a")); eos ]) will always have an empty first group. Conversely, if we use longest instead of shortest, the second group will always be empty.

val longest : t -> t

Longest match semantics. That is, matches will match as many bytes as possible. If multiple choices match the maximum amount of bytes, the one respecting the inner match semantics is preferred.

val shortest : t -> t

Same as longest, but matching the least number of bytes.

val first : t -> t

First match semantics for alternations (not repetitions). That is, matches will prefer the leftmost branch of the alternation that matches the text.

val greedy : t -> t

Greedy matches for repetitions (opt, rep, rep1, repn): they will match as many times as possible.

val non_greedy : t -> t

Non-greedy matches for repetitions (opt, rep, rep1, repn): they will match as few times as possible.

Groups (or submatches)

val group : t -> t

Delimit a group. The group is considered as matching if it is used at least once (it may be used multiple times if is nested inside rep for instance). If it is used multiple times, the last match is what gets captured.

val no_group : t -> t

Remove all groups

val nest : t -> t

When matching against nest e, only the group matching in the last match of e will be considered as matching.

For instance:

let re = compile (rep1 (nest (alt [ group (str "a"); str "b" ]))) in
let group = Re.exec re "ab" in
assert (Group.get_opt group 1 = None);

(* same thing but without [nest] *)
let re = compile (rep1 (alt [ group (str "a"); str "b" ])) in
let group = Re.exec re "ab" in
assert (Group.get_opt group 1 = Some "a");
val mark : t -> Mark.t * t

Mark a regexp. the markid can then be used to know if this regexp was used.

Character sets

val set : string -> t

Any character of the string

val rg : char -> char -> t

Character ranges

val inter : t list -> t

Intersection of character sets

val diff : t -> t -> t

Difference of character sets

val compl : t list -> t

Complement of union

Predefined character sets

val any : t

Any character

val notnl : t

Any character but a newline

val alnum : t
val wordc : t
val alpha : t
val ascii : t
val blank : t
val cntrl : t
val digit : t
val graph : t
val lower : t
val print : t
val punct : t
val space : t
val upper : t
val xdigit : t

Case modifiers

val case : t -> t

Case sensitive matching. Note that this works on latin1, not ascii and not utf8.

val no_case : t -> t

Case insensitive matching. Note that this works on latin1, not ascii and not utf8.

Internal debugging

val pp : Format.formatter -> t -> unit
val pp_re : Format.formatter -> re -> unit
val print_re : Format.formatter -> re -> unit

Alias for pp_re. Deprecated

module View : sig ... end

Experimental functions

val witness : t -> string

witness r generates a string s such that execp (compile r) s is true.

Be warned that this function is buggy because it ignores zero-width assertions like beginning of words. As a result it can generate incorrect results.

Deprecated functions

type substrings = Group.t

Alias for Group.t. Deprecated

  • deprecated Use Group.t
val get : Group.t -> int -> string

Same as Group.get. Deprecated

  • deprecated Use Group.get
val get_ofs : Group.t -> int -> int * int

Same as Group.offset. Deprecated

  • deprecated Use Group.offset
val get_all : Group.t -> string array

Same as Group.all. Deprecated

  • deprecated Use Group.all
val get_all_ofs : Group.t -> (int * int) array

Same as Group.all_offset. Deprecated

  • deprecated Use Group.all_offset
val test : Group.t -> int -> bool

Same as Group.test. Deprecated

  • deprecated Use Group.test
type markid = Mark.t

Alias for Mark.t. Deprecated

  • deprecated Use Mark.
val marked : Group.t -> Mark.t -> bool

Same as Mark.test. Deprecated

  • deprecated Use Mark.test
val mark_set : Group.t -> Mark.Set.t

Same as Mark.all. Deprecated

  • deprecated Use Mark.all
module Emacs : sig ... end

Emacs-style regular expressions

module Glob : sig ... end

Shell-style regular expressions

module Perl : sig ... end

Perl-style regular expressions

module Pcre : sig ... end
module Posix : sig ... end

References:

module Str : sig ... end

Module Str: regular expressions and high-level string processing

OCaml

Innovation. Community. Security.