Compile a regular expression into an executable version that can be used to match strings, e.g. with exec.
val exec : ?pos:int ->?len:int ->re->string ->Group.t
exec re str searches str for a match of the compiled expression re, and returns the matched groups if any.
More specifically, when a match exists, exec returns a match that starts at the earliest position possible. If multiple such matches are possible, the one specified by the match semantics described below is returned.
parameterpos
optional beginning of the string (default 0)
parameterlen
length of the substring of str that can be matched (default -1, meaning to the end of the string)
Note that exec re str ~pos ~len is not equivalent to exec re
(String.sub str pos len). This transformation changes the meaning of some constructs (bos, eos, whole_string and leol), and zero-width assertions like bow or eow look at characters before pos and after pos + len.
val exec_opt : ?pos:int ->?len:int ->re->string ->Group.t option
Similar to exec, but returns an option instead of using an exception.
val execp : ?pos:int ->?len:int ->re->string -> bool
Similar to exec, but returns true if the expression matches, and false if it doesn't. This function is more efficient than calling exec or exec_opt and ignoring the returned group.
More detailed version of exec_p. `Full is equivalent to true, while `Mismatch and `Partial are equivalent to false, but `Partial indicates the input string could be extended to create a match.
val all : ?pos:int ->?len:int ->re->string ->Group.t list
Repeatedly calls exec on the given string, starting at given position and length.
type'a gen = unit ->'a option
val all_gen : ?pos:int ->?len:int ->re->string ->Group.tgen
deprecated Use Seq.all
val all_seq : ?pos:int ->?len:int ->re->string ->Group.tSeq.t
deprecated Use Seq.all
val matches : ?pos:int ->?len:int ->re->string ->string list
Same as all, but extracts the matched substring rather than returning the whole group. This basically iterates over matched strings
val matches_gen : ?pos:int ->?len:int ->re->string ->string gen
deprecated Use Seq.matches
val matches_seq : ?pos:int ->?len:int ->re->string ->string Seq.t
deprecated Use Seq.matches
val split : ?pos:int ->?len:int ->re->string ->string list
split re s splits s into chunks separated by re. It yields the chunks themselves, not the separator. For instance this can be used with a whitespace-matching re such as "[\t ]+".
val split_gen : ?pos:int ->?len:int ->re->string ->string gen
deprecated Use Seq.split
val split_seq : ?pos:int ->?len:int ->re->string ->string Seq.t
deprecated Use Seq.split
val split_full : ?pos:int ->?len:int ->re->string ->split_token list
split re s splits s into chunks separated by re. It yields the chunks along with the separators. For instance this can be used with a whitespace-matching re such as "[\t ]+".
val split_full_gen : ?pos:int ->?len:int ->re->string ->split_tokengen
deprecated Use Seq.split_full
val split_full_seq : ?pos:int ->?len:int ->re->string ->split_tokenSeq.t
val replace :
?pos:int ->?len:int ->?all:bool ->re->f:(Group.t-> string)->string ->
string
replace ~all re ~f s iterates on s, and replaces every occurrence of re with f substring where substring is the current match. If all = false, then only the first occurrence of re is replaced.
val replace_string :
?pos:int ->?len:int ->?all:bool ->re->by:string ->string ->
string
replace_string ~all re ~by s iterates on s, and replaces every occurrence of re with by. If all = false, then only the first occurrence of re is replaced.
Only matches the whole string, i.e. fun t -> seq [ eos; t; bos ].
Match semantics
A regular expression frequently matches a string in multiple ways. For instance exec (compile (opt (str "a"))) "ab" can match "" or "a". Match semantic can be modified with the functions below, allowing one to choose which of these is preferable.
By default, the leftmost branch of alternations is preferred, and repetitions are greedy.
Note that the existence of matches cannot be changed by specifying match semantics. seq [ bos; str "a"; non_greedy (opt (str "b")); eos ] will match when applied to "ab". However if seq [ bos; str "a"; non_greedy (opt
(str "b")) ] is applied to "ab", it will match "a" rather than "ab".
Also note that multiple match semantics can conflict. In this case, the one executed earlier takes precedence. For instance, any match of shortest (seq
[ bos; group (rep (str "a")); group (rep (str "a")); eos ]) will always have an empty first group. Conversely, if we use longest instead of shortest, the second group will always be empty.
Longest match semantics. That is, matches will match as many bytes as possible. If multiple choices match the maximum amount of bytes, the one respecting the inner match semantics is preferred.
Delimit a group. The group is considered as matching if it is used at least once (it may be used multiple times if is nested inside rep for instance). If it is used multiple times, the last match is what gets captured.
When matching against nest e, only the group matching in the last match of e will be considered as matching.
For instance:
let re = compile (rep1 (nest (alt [ group (str "a"); str "b" ]))) in
let group = Re.exec re "ab" in
assert (Group.get_opt group 1 = None);
(* same thing but without [nest] *)
let re = compile (rep1 (alt [ group (str "a"); str "b" ])) in
let group = Re.exec re "ab" in
assert (Group.get_opt group 1 = Some "a");