Match on Groups in Regular Expressions using ppx_regexp

Task

Text Processing / Regular Expressions / Match on Groups in Regular Expressions

Use Regexp to parse a YYYY-MM-DD date.

Opam Packages Used

  • ppx_regexp Tested with version: 0.5.1 — Used libraries: ppx_regexp
  • re Tested with version: 1.12.0 — Used libraries: re

Code

Extracting components from a date string

  • We use match%pcre to pattern match against a string using regex
  • The regex pattern is enclosed in {re|...|re} string delimiters (it does not matter whether you use named delimiters or not, i.e. re has no special meaning here)
  • Named capture groups are created using ?<name>... syntax
  • \d means "match a digit", {4} means "exactly 4 times"
let () =
  match%pcre "Date: 1972-01-23  " with
  | {|?<date>(?<year>\d{4})-(?<month>\d\d)-(?<day>\d\d)|} ->
      Printf.printf "Date found: (%s)\n" date;
      Printf.printf "Year: (%s)\n" year;
      Printf.printf "Month: (%s)\n" month;
      Printf.printf "Day: (%s)\n" day;
  | _ -> print_string "Date not found\n"

Discussion

The re library supports multiple syntaxes, and provides concurrent pattern matching.

The ppx_regexp package provides a preprocessor extension (PPX) that introduces syntactic sugar (e.g. match%pcre) for the PCRE syntax.

To work with this package, we recommend referencing the PCRE syntax or any PCRE cheat sheet.

Recipe not working? Comments not clear or out of date?

Open an issue or contribute to this recipe!

Other Recipes for this Task