package tyre

  1. Overview
  2. Docs
Typed Regular Expressions

Install

dune-project
 Dependency

Authors

Maintainers

Sources

1.0.tar.gz
sha256=63ca1915da896640534b5cf928d220198709ec74b899d55b830fb0ceccebd633
sha512=536440d090046569449c7752315d568b3447e84c8c0e555a35a20a504a96a538ed9bc4e8e5f78e5860744ba863023331aa0a9893bf2028ae280cad678ec8d59c

doc/tyre/Tyre/index.html

Module TyreSource

Typed regular expressions

Tyre is a set of combinators to build type-safe regular expressions, allowing automatic extraction and modification of matched groups.

Tyre is bi-directional: a typed regular expressions can be used both for matching and evaluation. Multiple tyregexs can be combined in order to do routing in similar manner as switches/pattern matching. Typed regular expressions are strictly as expressive as regular expressions from re (and are, as such, regular expressions, not PCREs). Performances should be exactly the same.

# let dim = Tyre.( str"dim:" *> int <&> str"x" *> int ) ;;
val dim : (int * int) Tyre.t

# let dim_re = Tyre.compile dim ;;
val dim_re : (int * int) Tyre.re

# Tyre.exec dim_re "dim:3x4" ;;
- : (int * int, (int * int) Tyre.error) result = Result.Ok (3, 4)

# Tyre.eval dim (2, 5) ;;
- : string = "dim:2x5"

ppx_tyre allows to use the usual regular syntax, if prefered:

# let dim = [%tyre "dim:(?&int)x(?&int)"] ;;
val dim : (int * int) Tyre.t
Sourcetype non_evaluable = [
  1. | `NE
  2. | `E
]
Sourcetype evaluable = [
  1. | `E
]
Sourcetype (+'evaluable, 'a) t

A typed regular expression.

The type variable is the type of the returned value when the typed regular expression (tyregex) is executed.

For example tyre : (_, int) t can be used to return an int. In the rest of the documentation, we will use «tyre» to designate a value of type t.

A value of type (non_evaluable, a) t can only be used with functions that match a string, it can't be used to produce an example string. Because its only usable for matching, it is called a pattern.

A value of type (evaluable, a) t can be used with the eval function to returns a string s such that exec (compile tyre) s = v. We call such a value expressions, but they don't have a type binding because every expression is also a pattern.

Sourcetype 'a pattern = (non_evaluable, 'a) t

A regexp only usable for matching

Sourceval lift : ('a -> string) -> 'a pattern -> ('e, 'a) t

lift f p makes the pattern p evaluable by providing a conversion function.

Correctness is not checked, if you provide a string that does not match the regex, eval will just return that.

Sourceval liftpp : (Format.formatter -> 'a -> unit) -> ('b, 'a) t -> ('c, 'a) t

liftpp is equivalent to lift, but uses Stdlib.Format for better performance.

Sourceval unlift : (evaluable, 'a) t -> 'a pattern

unlift e Turn an expression into a pattern. Equivalent to (e :> _ pattern)

Combinators

Sourceval pcre : string -> (_, string) t

pcre s is a tyregex that matches the PCRE s and return the corresponding string. Groups in s are ignored.

Sourceval regex : Re.t -> (_, string) t

regex re is a tyregex that matches re and return the corresponding string. Groups inside re are erased.

Sourceval matched_string : (_, 'a) t -> (_, string) t

matched_string t matches the same string as t, but return the matched text, discarding the result computed by t.

Sourceval conv : ('a -> 'b) -> ('b -> 'a) -> ('e, 'a) t -> ('e, 'b) t

conv to_ from_ tyre matches the same text as tyre, but converts back and forth to a different data type.

to_ is allowed to raise an exception exn. In this case, exec will return `ConverterFailure exn.

For example, this is the implementation of pos_int:

let pos_int =
  Tyre.conv
    int_of_string string_of_int
    (Tyre.regex (Re.rep1 Re.digit))
Sourceval map : ('a -> 'b) -> (_, 'a) t -> 'b pattern

map f tyre is a regexp that matches tyre and returns f v. It cannot be used for evaluating.

Sourceval app : ('e, 'a -> 'b) t -> ('e, 'a) t -> 'b pattern

app f v matches seq f v and returns the application of the value returned by f with the value returned by v

Sourceval const : 'a -> ('e, unit) t -> ('e, 'a) t

const v tyre matches tyre but has value v. Is a simplification of conv for unit regular expressions.

Sourceval discard : (_, 'a) t -> unit pattern

discard tyre matches tyre but has value ()

Sourceval opt : ('e, 'a) t -> ('e, 'a option) t

opt tyre matches either tyre or the empty string. Similar to Re.opt.

Sourceval either : ('e, 'a) t -> ('e, 'b) t -> ('e, ('a, 'b) Either.t) t

either tyreL tyreR matches either tyreL (and will then return Left v) or tyreR (and will then return Right v).

Sourceval alt : (_, 'a) t -> (_, 'a) t -> 'a pattern

alt l r matches either l or r and return the value of the one that matched.

It is not compatible with eval, either might be used instead.

The reason is that when evaluating alt l r with a value v, eval has no way to know if the value could have been returned by l or by r.

Sourceval alt_eval : ('a -> [ `Left | `Right ]) -> ('e, 'a) t -> ('e, 'a) t -> ('e, 'a) t

alt_eval from_ l r is alt l r but uses from_ when eval is called on it. from_ v should indicate whether v is compatible with l or with r.

Repetitions

Sourceval rep : ('e, 'a) t -> ('e, 'a Seq.t) t

rep tyre matches tyre zero or more times. Similar to Re.rep.

For matching, rep tyre will matches the string a first time, then tyre will be used to walk the matched part to extract values.

Sourceval rep1 : ('e, 'a) t -> ('e, 'a * 'a Seq.t) t

rep1 tyre is seq tyre (rep tyre). Similar to Re.rep1.

Sequences

Sourceval seq : ('e, 'a) t -> ('e, 'b) t -> ('e, 'a * 'b) t

seq tyre1 tyre2 matches tyre1 then tyre2 and return both values.

Sourceval prefix : (_, _) t -> ('e, 'a) t -> ('e, 'a) t

prefix tyre_i tyre matches tyre_i, ignores the result, and then matches tyre and returns its result. Converters in tyre_i are never called.

Sourceval suffix : ('e, 'a) t -> (_, _) t -> ('e, 'a) t

Same as prefix, but reversed.

Let operators

Sourceval (let+) : ('e, 'a) t -> ('a -> 'b) -> 'b pattern

let+ x = y in z is map (fun x -> z) y.

Sourceval (and+) : ('e, 'a) t -> ('e, 'b) t -> ('e, 'a * 'b) t

(and+) x y is seq x y.

Be warned that this is not an applicative functor: let+ x = t1 and+ y = t2 in z is not the same as let+ y = t2 and+ x = t1 in z.

Infix operators

Sourceval (<||>) : ('e, 'a) t -> ('e, 'b) t -> ('e, ('a, 'b) Either.t) t

t <||> t' is alt_either t t'.

Sourceval (<|>) : (_, 'a) t -> (_, 'a) t -> 'a pattern

t <|> t' is alt t t'. It is not compatible with eval, use either instead if you need to call eval.

Sourceval (<&>) : ('e, 'a) t -> ('e, 'b) t -> ('e, 'a * 'b) t

t <&> t' is seq t t'.

Sourceval (*>) : (_, _) t -> ('e, 'a) t -> ('e, 'a) t

ti *> t is prefix ti t.

Sourceval (<*) : ('e, 'a) t -> (_, _) t -> ('e, 'a) t

t <* ti is suffix t ti.

Sourcemodule Infix : sig ... end

Useful combinators

Sourceval str : string -> (_, unit) t

str s matches s and evaluates to s.

Sourceval char : char -> (_, unit) t

char c matches c and evaluates to c.

Sourceval blanks : (_, unit) t

blanks matches Re.(rep blank) and doesn't return anything.

Sourceval int : (_, int) t

int matches -?[0-9]+ and returns the matched integer.

Integers that do not fit in an int will fail.

Sourceval pos_int : (_, int) t

pos_int matches [0-9]+ and returns the matched positive integer.

Integers that do not fit in an int will fail.

Sourceval float : (_, float) t

float matches -?[0-9]+( .[0-9]* )? and returns the matched floating point number.

Floating point numbers that do not fit in a float returns infinity or neg_infinity.

Sourceval bool : (_, bool) t

bool matches true|false and returns the matched boolean.

Sourceval list : ('e, 'a) t -> ('e, 'a list) t

list e is similar to rep e, but returns a list.

Sourceval terminated_list : sep:('e, _) t -> ('e, 'a) t -> ('e, 'a list) t

terminated_list ~sep tyre is list (tyre <* sep) .

Sourceval separated_list : sep:(_, _) t -> ('e, 'a) t -> ('e, 'a list) t

separated_list ~sep tyre is equivalent to opt (e <&> list (sep *> e)).

Sourcemodule Charset : sig ... end
Sourceval charset : Charset.t -> (_, char) t

charset cs is a (_, regular) t that matches any character in cs.

Sourceval rep_charset : Charset.t -> (_, string) t

rep_charset cs matches the same text as rep (charset cs), but directly returns a string instead of a char Seq.t.

Predefined character sets as char expressions

Sourceval any : (_, char) t

any character including newline

Sourceval rep_any : (_, string) t

matches the same strings as rep any but returns the matched string instead of a list of chars.

Sourceval notnl : (_, char) t

any character except a new line

Sourceval wordc : (_, char) t
Sourceval alpha : (_, char) t
Sourceval alnum : (_, char) t
Sourceval ascii : (_, char) t
Sourceval blank : (_, char) t
Sourceval cntrl : (_, char) t
Sourceval digit : (_, char) t

see Charset.digit.

There are combinators for ints and floats, using them is advisable.

Sourceval graph : (_, char) t
Sourceval lower : (_, char) t
Sourceval print : (_, char) t
Sourceval punct : (_, char) t
Sourceval space : (_, char) t
Sourceval upper : (_, char) t
Sourceval xdigit : (_, char) t

Other combinators

See Re for details on the semantics of those combinators.

Sourceval start : (_, unit) t
Sourceval stop : (_, unit) t
Sourceval word : ('e, 'a) t -> ('e, 'a) t
Sourceval whole_string : ('e, 'a) t -> ('e, 'a) t
Sourceval longest : ('e, 'a) t -> ('e, 'a) t
Sourceval shortest : ('e, 'a) t -> ('e, 'a) t
Sourceval first : ('e, 'a) t -> ('e, 'a) t
Sourceval greedy : ('e, 'a) t -> ('e, 'a) t
Sourceval non_greedy : ('e, 'a) t -> ('e, 'a) t
Sourceval nest : ('e, 'a) t -> ('e, 'a) t

Matching

Sourcetype 'a re

A compiled typed (_, regular) t.

Sourceval compile : (_, 'a) t -> 'a re

compile tyre is the compiled tyregex representing tyre.

Sourcetype 'a error = [
  1. | `NoMatch of 'a re * string
  2. | `ConverterFailure of exn
]
Sourceval pp_error : Format.formatter -> _ error -> unit
Sourceval exec : ?pos:int -> ?len:int -> 'a re -> string -> ('a, 'a error) result

exec ctyre s matches the string s using the compiled tyregex ctyre and returns the extracted value.

Returns Error (`NoMatch (tyre, s) if tyre doesn't match s. Returns Error (`ConverterFailure exn) if a converter failed with the exception exn.

  • parameter pos

    Optional beginning of the string (default 0)

  • parameter len

    Length of the substring of str that can be matched (default to the end of the string)

Sourceval execp : ?pos:int -> ?len:int -> 'a re -> string -> bool

execp ctyre s returns true if ctyre matches s. Converters are never called.

  • parameter pos

    Optional beginning of the string (default 0)

  • parameter len

    Length of the substring of str that can be matched (default to the end of the string)

  • since 0.1.1
Sourceval replace : ?pos:int -> ?len:int -> ?all:bool -> 'a re -> ('a -> string) -> string -> (string, [> `ConverterFailure of exn ]) result

replace r f s returns s where every match of r has been replaced by f v where v is the value associated with r. If all is set to false, it only replaces the first match.

Repeated Matching

Sourceval all : ?pos:int -> ?len:int -> 'a re -> string -> ('a list, 'a error) result

all ctyre s calls to exec repeatedly and returns the list of all the matches.

Sourceval all_seq : ?pos:int -> ?len:int -> 'a re -> string -> 'a Seq.t

all_seq ctyre s is all ctyre s but returns a seq instead. Matches are enumerated lazily.

Exceptions raised by converters are not caught.

Routing

Sourcetype +'a route =
  1. | Route : (_, 'x) t * ('x -> 'a) -> 'a route
    (*

    A route is a pair of a tyregex and a handler. When the tyregex is matched, the function is called with the result of the matching.

    *)
Sourceval (-->) : (_, 'x) t -> ('x -> 'a) -> 'a route

tyre --> f is Route (tyre, f).

Sourceval route : 'a route list -> 'a re

route [ tyre1 --> f1 ; tyre2 --> f2 ] produces a compiled tyregex such that, if tyre1 matches, f1 is called, and so on.

The compiled tyregex shoud be used with exec.

Evaluating

Sourceval eval : (evaluable, 'a) t -> 'a -> string

eval tyre v returns a string s such that exec (compile tyre) s = v.

Note that such string s is not unique. eval will usually returns a very simple witness.

Sourceval evalpp : (evaluable, 'a) t -> Format.formatter -> 'a -> unit

evalpp tyre ppf v is equivalent to Format.fprintf ppf "%s" (eval tyre v), but more efficient.

Is is generally used with "%a":

let my_pp = Tyre.evalpp tyre in
Format.printf "%a@." my_pp v

Pretty printing

Sourceval pp : Format.formatter -> (_, 'a) t -> unit
Sourceval pp_re : Format.formatter -> 'a re -> unit