package ppx_regexp
Install
dune-project
Dependency
Authors
Maintainers
Sources
sha256=25083bc47c6ca224b52d958e3272c938c1115895446ed526ca330f03a2d50ca8
sha512=e9e8888b8f4cf4f7b2aab38af8e835f716a5b973b7a48ae329daafcc80b705bc8f839f6f76364699804903cc7f9ae6d5d69d9cf0f257c007558ff9f0fbf6d357
Description
This syntax extension turns
match%pcre x with
| {|re1|} -> e1
...
| {|reN|} -> eN
| _ -> e0
into suitable invocations to the ocaml-re library. The patterns are plain
strings of the form accepted by Re_pcre, except groups can be bound to
variables using the syntax (?<var>...). The type of var will be
string if a match is of the groups is guaranteed given a match of the
whole pattern, and string option if the variable is bound to or nested
below an optionally matched group.
Published: 09 Jun 2022
README
Two PPXes for Working with Regular Expressions
This repo provides two PPXes providing regular expression-based routing:
ppx_regexpmaps to re with the conventional last-match extraction intostringandstring option.ppx_tyremaps to Tyre providing typed extraction into options, lists, tuples, objects, and polymorphic variants.
Another difference is that ppx_regexp works directly on strings essentially hiding the library calls, while ppx_tyre provides Tyre.t and Tyre.route which can be composed an applied using the Tyre library.
ppx_regexp - Regular Expression Matching with OCaml Patterns
This syntax extension turns
function%pcre
| {|re1|} -> e1
...
| {|reN|} -> eN
| _ -> e0into suitable invocations of the Re library, and similar for match%pcre. The patterns are plain strings of the form accepted by Re_pcre, with the following additions:
(?<var>...)defines a group and binds whatever it matches asvar. The type ofvarwill bestringif the match is guaranteed given that the whole pattern matches, andstring optionif the variable is bound to or nested below an optionally matched group.?<var>at the start of a pattern binds group 0 asvar : string. This may not be the full string if the pattern is unanchored.
A variable is allowed for the universal case and is bound to the matched string. A regular alias is currently not allowed for patterns, since it is not obvious whether is should bind the full string or group 0.
Example
The following prints out times and hosts for SMTP connections to the Postfix daemon:
(* Link with re, re.pcre, lwt, lwt.unix.
Preprocess with ppx_regexp.
Adjust to your OS. *)
open Lwt.Infix
let check_line =
(function%pcre
| {|(?<t>.*:\d\d) .* postfix/smtpd\[[0-9]+\]: connect from (?<host>[a-z0-9.-]+)|} ->
Lwt_io.printlf "%s %s" t host
| _ ->
Lwt.return_unit)
let () = Lwt_main.run begin
Lwt_io.printl "SMTP connections from:" >>= fun () ->
Lwt_stream.iter_s check_line (Lwt_io.lines_of_file "/var/log/syslog")
endppx_tyre - Syntax Support for Tyre Routes
Typed regular expressions
This PPX compiles
[%tyre {|re|}]into 'a Tyre.t.
For instance, We can define a pattern that recognize strings of the form "dim:3x5" like so:
# open Tyre ;;
# let dim = [%tyre "dim:(?&int)x(?&int)"] ;;
val dim : (int * int) Tyre.tThe syntax (?&id) allows to call a typed regular expression named id of type 'a Tyre.t, such as Tyre.int.
For convenience, you can also use named capture groups to name the captured elements.
# let dim = [%tyre "dim:(?<x>(?&int))x(?&y:int)"] ;;
val dim : < x : int; y : int > Tyre.tNames given using the syntax (?<foo>re) will be used for the fields of the results. (?&y:int) is a shortcut for (?<y>(?&int)). This can also be used for alternatives, for instance:
# let id_or_name = [%tyre "id:(?&id:int)|name:(?<name>[[:alnum:]]+)"] ;;
val id_or_name : [ `id of int | `name of string ] Tyre.tExpressions of type Tyre.t can then be composed as part of bigger regular expressions, or compiled with Tyre.compile. See tyre's documentation for details.
Routes
ppx_tyre can also be used for routing, in the style of ppx_regexp:
function%tyre
| {|re1|} -> e1
...
| {|reN|} -> eNis turned into a 'a Type.route, where re, re1, ... are regular expressions using the same syntax as above. "re" as v is considered like (?<v>re) and "re1" | "re2" is turned into a regular expression alternative.
Once routes are defined, matching is done with Tyre.exec.
Details
The syntax follow Perl's syntax:
re?extracts an option of whatreextracts.re+,re*,re{n,m}extracts a list of whatreextracts.(?&qname)refers to any identifier bound to a typed regular expression of type'a Tyre.t.- Normal parens are non-capturing.
There are two ways to capture:
- Anonymous capture
(+re) - Named capture
(?<v>re)
- Anonymous capture
- One or more
(?<v>re)at the top level can be used to bind variables instead ofas .... - One or more
(?<v>re)in a sequence extracts an object where each methodvis bound to whatreextracts. - An alternative with one
(?<v>re)per branch extracts a polymorphic variant where each constructor`vreceives whatreextracts as its argument. (?&v:qname)is a shortcut for(?<v>(?&qname)).
Limitations
No Pattern Guards
Pattern guards are not supported. This is due to the fact that all match cases are combined into a single regular expression, so if one of the patterns succeed, the match is committed before we can check the guard condition.
No Exhaustiveness Check
The syntax extension will always warn if no catch-all case is provided. No exhaustiveness check is attempted. Doing it right would require reimplementing full regular expression parsing and an algorithm which would ideally produce a counter-example.
Bug Reports
The processor is currently new and not well tested. Please break it and file bug reports in the GitHub issue tracker. Any exception raised by generated code except for Match_failure is a bug.