Page
Library
Module
Module type
Parameter
Class
Class type
Source
Pcre2SourcePerl Compatibility Regular Expressions for OCaml
7.5.1 - homepage
type error = | PartialString only matched the pattern partially
*)| BadPattern of string * intBadPattern (msg, pos) regular expression is malformed. The reason is in msg, the position of the error in the pattern in pos.
| BadUTFUTF string being matched is invalid
*)| BadUTFOffsetGets raised when a UTF string being matched with offset is invalid.
*)| MatchLimitMaximum allowed number of match attempts with backtracking or recursion is reached during matching. ALL FUNCTIONS CALLING THE MATCHING ENGINE MAY RAISE IT!!!
*)| DepthLimit| WorkspaceSizeRaised by pcre2_dfa_match when the provided workspace array is too small. See documention on pcre2_dfa_match for details on workspace array sizing.
| InternalError of stringInternalError msg C-library exhibits unknown/undefined behaviour. The reason is in msg.
Backtrack used in callout functions to force backtracking.
Regexp_or (pat, error) gets raised for sub-pattern pat by regexp_or if it failed to compile.
Internal representation of compilation flags
Internal representation of runtime flags
and cflag = [ | `ALLOW_EMPTY_CLASSAllow empty classes
*)| `ALT_BSUXAlternative handling of \u, \U, and \x
*)| `ALT_CIRCUMFLEXAlternative handling of ^ in multiline mode
*)| `ALT_VERBNAMESProcess backslashes in verb names
*)| `ANCHOREDPattern matches only at start of string
*)| `AUTO_CALLOUTAutomatically inserts callouts with id 255 before each pattern item
*)| `CASELESSCase insensitive matching
*)| `DOLLAR_ENDONLY'$' in pattern matches only at end of string
*)| `DOTALL'.' matches all characters (newlines, too)
*)| `DUPNAMESAllow duplicate names for subpatterns
*)| `ENDANCHOREDPattern can match only at end of subject
*)| `EXTENDEDIgnores whitespace and PERL-comments. Behaves like the '/x'-option in PERL
*)| `EXTENDED_MORE| `FIRSTLINEUnanchored patterns must match before/at first NL
*)| `LITERALPattern characters are all literal
*)| `MATCH_INVALID_UTFEnable support for matching invalid UTF
*)| `MATCH_UNSET_BACKREFMatch unset backreferences
*)| `MULTILINE'^' and '$' match before/after newlines, not just at the beginning/end of a string
*)| `NEVER_BACKSLASH_CLock out the use of \C in patterns
*)| `NEVER_UCPLock out UCP, e.g. via (\*UCP)
*)| `NEVER_UTFLock out UTF, e.g. via (\*UTF)
*)| `NO_AUTO_CAPTUREDisables the use of numbered capturing parentheses
*)| `NO_AUTO_POSSESSDisable auto-possessification
*)| `NO_DOTSTAR_ANCHORDisable automatic anchoring for .*
*)| `NO_START_OPTIMIZEDisable match-time start optimizations
*)| `NO_UTF_CHECKDo not check the pattern for UTF validity (only relevant if UTF is set) WARNING: with this flag enabled, invalid UTF strings may cause a crash, loop, or give incorrect results
*)| `UCPUse Unicode properties for \d, \w, etc.
*)| `UNGREEDYQuantifiers not greedy anymore, only if followed by '?'
*)| `USE_OFFSET_LIMITEnable offset limit for unanchored matching
*)| `UTFTreat pattern and subjects as UTF strings
*) ]Compilation flags
cflags cflag_list converts a list of compilation flags to their internal representation.
cflag_list cflags converts internal representation of compilation flags to a list.
type rflag = [ | `ANCHOREDMatch only at the first position
*)| `COPY_MATCHED_SUBJECTOn success, make a private subject copy
*)| `DFA_RESTARTCauses matching to proceed presuming the subject string is further to one partially matched previously using the same int-array working set. May only be used with pcre2_dfa_match or unsafe_pcre2_dfa_match, and should always be paired with `PARTIAL.
| `DFA_SHORTESTReturn only the shortest match
*)| `ENDANCHOREDPattern can match only at end of subject
*)| `NOTBOLBeginning of string is not treated as beginning of line
*)| `NOTEOLEnd of string is not treated as end of line
*)| `NOTEMPTYAn empty string is not a valid match
*)| `NOTEMPTY_ATSTARTAn empty string at the start of the subject is not a valid match
*)| `NO_JITDo not use JIT matching
*)| `NO_UTF_CHECKDo not check the subject for UTF validity (only relevant if PCRE2_UTF was set at compile time)
*)| `PARTIAL_HARDThrow Pcre2.Partial for a partial match even if there is a full match
*)| `PARTIAL_SOFTThrow Pcre2.Partial for a partial match if no full matches are found
*) ]Runtime flags
rflags rflag_list converts a list of runtime flags to their internal representation.
rflag_list rflags converts internal representation of runtime flags to a list.
Version information
Version of the PCRE2-C-library
Indicates whether unicode support is enabled
Character used as newline
Number of bytes used for internal linkage of regular expressions
Default limit for calls to internal matching function
Default limit for depth of nested backtracking
Indicates use of stack recursion in matching function
type firstcodeunit_info = [ | `Char of charFixed first character
*)| `Start_onlyPattern matches at beginning and end of newlines
*)| `ANCHOREDPattern is anchored
*) ]Information on matching of "first chars" in patterns
Compiled regular expressions
firstcodeunit regexp
get_match_limit rex
get_depth_limit rex
Alternative set of char tables for pattern matching
val regexp :
?limit:int ->
?depth_limit:int ->
?iflags:icflag ->
?flags:cflag list ->
?chtables:chtables ->
string ->
regexpregexp ?limit ?depth_limit ?iflags ?flags ?chtables pattern compiles pattern with flags when given, with iflags otherwise, and with char tables chtables. If limit is specified, this sets a limit to the amount of recursion and backtracking (only lower than the builtin default!). If this limit is exceeded, MatchLimit will be raised during matching.
For detailed documentation on how you can specify PERL-style regular expressions (= patterns), please consult the PCRE2-documentation ("man pcre2pattern") or PERL-manuals.
val regexp_or :
?limit:int ->
?depth_limit:int ->
?iflags:icflag ->
?flags:cflag list ->
?chtables:chtables ->
string list ->
regexpregexp_or ?limit ?depth_limit ?iflags ?flags ?chtables patterns like regexp, but combines patterns as alternatives (or-patterns) into one regular expression.
quote str
Information on substrings after pattern matching
get_subject substrings
num_of_subs substrings
get_substring substrings n
get_substring_ofs substrings n
get_substrings ?full_match substrings
get_opt_substrings ?full_match substrings
get_named_substring rex name substrings
get_named_substring_ofs rex name substrings
type callout_data = {callout_number : int;Callout number
*)substrings : substrings;Substrings matched so far
*)start_match : int;Subject start offset of current match attempt
*)current_position : int;Subject offset of current match pointer
*)capture_top : int;Number of the highest captured substring so far
*)capture_last : int;Number of the most recently captured substring
*)pattern_position : int;Offset of next match item in pattern string
*)next_item_length : int;Length of next match item in pattern string
*)}Type of callout functions
Callouts are referred to in patterns as "(?Cn)" where "n" is a callout_number ranging from 0 to 255. Substrings captured so far are accessible as usual via substrings. You will have to consider capture_top and capture_last to know about the current state of valid substrings.
By raising exception Backtrack within a callout function, the user can force the pattern matching engine to backtrack to other possible solutions. Other exceptions will terminate matching immediately and return control to OCaml.
val pcre2_match :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
string ->
int arraypcre2_match ?iflags ?flags ?rex ?pat ?pos ?callout subj
val pcre2_dfa_match :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
?workspace:int array ->
string ->
int arraypcre2_dfa_match ?iflags ?flags ?rex ?pat ?pos ?callout ?workspace subj invokes the "alternative" DFA matching function.
Note that the returned array of offsets are quite different from those returned by pcre2_match et al. The motivating use case for the DFA match function is to be able to restart a partial match with N additional input segments. Because the match function/workspace does not store segments seen previously, the offsets returned when a match completes will refer only to the matching portion of the last subject string provided. Thus, returned offsets from this function should not be used to support extracting captured submatches. If you need to capture submatches from a series of inputs incrementally matched with this function, you'll need to concatenate those inputs that yield a successful match here and re-run the same pattern against that single subject string.
Aside from an absolute minimum of 20, PCRE does not provide any guidance regarding the size of workspace array needed by any given pattern. Therefore, it is wise to appropriately handle the possible WorkspaceSize error. If raised, you can allocate a new, larger workspace array and begin the DFA matching process again.
val exec :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
string ->
substringsexec ?iflags ?flags ?rex ?pat ?pos ?callout subj
val exec_all :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
string ->
substrings arrayexec_all ?iflags ?flags ?rex ?pat ?pos ?callout subj
val next_match :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
substrings ->
substringsnext_match ?iflags ?flags ?rex ?pat ?pos ?callout substrs
val extract :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?full_match:bool ->
?callout:callout ->
string ->
string arrayextract ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj
val extract_opt :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?full_match:bool ->
?callout:callout ->
string ->
string option arrayextract_opt ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj
val extract_all :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?full_match:bool ->
?callout:callout ->
string ->
string array arrayextract_all ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj
val extract_all_opt :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?full_match:bool ->
?callout:callout ->
string ->
string option array arrayextract_all_opt ?iflags ?flags ?rex ?pat ?pos ?full_match ?callout subj
val pmatch :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
string ->
boolpmatch ?iflags ?flags ?rex ?pat ?pos ?callout subj
Information on substitution patterns
subst str converts the string str representing a substitution pattern to the internal representation
The contents of the substitution string str can be normal text mixed with any of the following (mostly as in PERL):
0-9+" from an immediately following other number.val replace :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?itempl:substitution ->
?templ:string ->
?callout:callout ->
string ->
stringreplace ?iflags ?flags ?rex ?pat ?pos ?itempl ?templ ?callout subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the substitution string templ when given, itempl otherwise. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val qreplace :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?templ:string ->
?callout:callout ->
string ->
stringqreplace ?iflags ?flags ?rex ?pat ?pos ?templ ?callout subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the string templ. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val substitute_substrings :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
subst:(substrings -> string) ->
string ->
stringsubstitute_substrings ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the substrings of the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val substitute :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
subst:(string -> string) ->
string ->
stringsubstitute ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces all substrings of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val replace_first :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?itempl:substitution ->
?templ:string ->
?callout:callout ->
string ->
stringreplace_first ?iflags ?flags ?rex ?pat ?pos ?itempl ?templ ?callout subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the substitution string templ when given, itempl otherwise. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val qreplace_first :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?templ:string ->
?callout:callout ->
string ->
stringqreplace_first ?iflags ?flags ?rex ?pat ?pos ?templ ?callout subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the string templ. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val substitute_substrings_first :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
subst:(substrings -> string) ->
string ->
stringsubstitute_substrings_first ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the substrings of the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val substitute_first :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?callout:callout ->
subst:(string -> string) ->
string ->
stringsubstitute_first ?iflags ?flags ?rex ?pat ?pos ?callout ~subst subj replaces the first substring of subj matching pattern pat when given, regular expression rex otherwise, starting at position pos with the result of function subst applied to the match. Uses flags when given, the precompiled iflags otherwise. Callouts are handled by callout.
val split :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?max:int ->
?callout:callout ->
string ->
string listsplit ?iflags ?flags ?rex ?pat ?pos ?max ?callout subj splits subj into a list of at most max strings, using as delimiter pattern pat when given, regular expression rex otherwise, starting at position pos. Uses flags when given, the precompiled iflags otherwise. If max is zero, trailing empty fields are stripped. If it is negative, it is treated as arbitrarily large. If neither pat nor rex are specified, leading whitespace will be stripped! Should behave exactly as in PERL. Callouts are handled by callout.
val asplit :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?max:int ->
?callout:callout ->
string ->
string arrayasplit ?iflags ?flags ?rex ?pat ?pos ?max ?callout subj same as Pcre2.split but
Result of a Pcre2.full_split
val full_split :
?iflags:irflag ->
?flags:rflag list ->
?rex:regexp ->
?pat:string ->
?pos:int ->
?max:int ->
?callout:callout ->
string ->
split_result listfull_split ?iflags ?flags ?rex ?pat ?pos ?max ?callout subj splits subj into a list of at most max elements of type "split_result", using as delimiter pattern pat when given, regular expression rex otherwise, starting at position pos. Uses flags when given, the precompiled iflags otherwise. If max is zero, trailing empty fields are stripped. If it is negative, it is treated as arbitrarily large. Should behave exactly as in PERL. Callouts are handled by callout.
foreach_line ?ic f applies f to each line in inchannel ic until the end-of-file is reached.
foreach_file filenames f opens each file in the list filenames for input and applies f to each filename and the corresponding channel. Channels are closed after each operation (even when exceptions occur - they get reraised afterwards!).
val unsafe_pcre2_match :
irflag ->
regexp ->
pos:int ->
subj_start:int ->
subj:string ->
int array ->
callout option ->
unitunsafe_pcre2_match flags rex ~pos ~subj_start ~subj offset_vector callout. You should read the C-source to know what happens. If you do not understand it - don't use this function!
make_ovector regexp calculates the tuple (subgroups2, ovector) which is the number of subgroup offsets and the offset array.
val unsafe_pcre2_dfa_match :
irflag ->
regexp ->
pos:int ->
subj_start:int ->
subj:string ->
int array ->
callout option ->
workspace:int array ->
unitunsafe_pcre2_dfa_match flags rex ~pos ~subj_start ~subj offset_vector callout ~workpace. You should read the C-source to know what happens. If you do not understand it - don't use this function!