package SZXX

  1. Overview
  2. Docs

Module Xml.SAXSource

Advanced parsing utilities: custom parser options and tools to stream huge documents

Sourcetype node =
  1. | Prologue of DOM.attr_list
  2. | Element_open of {
    1. tag : Base.string;
    2. attrs : DOM.attr_list;
    }
  3. | Element_close of Base.string
  4. | Text of Base.string
  5. | Cdata of Base.string
  6. | Nothing
  7. | Many of node Base.list
Sourceval sexp_of_node : node -> Sexplib0.Sexp.t
Sourceval compare_node : node -> node -> Base.int
Sourceval equal_node : node -> node -> Base.bool
Sourcetype parser_options = {
  1. accept_html_boolean_attributes : Base.bool;
    (*

    Invalid XML but valid HTML: <div attr1="foo" attr2> But with accept_html_boolean_attributes set to true, attr2 will be "attr2"

    *)
  2. accept_unquoted_attributes : Base.bool;
    (*

    Invalid XML but valid HTML: <div attr1="foo" attr2=bar> But with accept_unquoted_attributes set to true, attr2 will be "bar"

    *)
  3. accept_single_quoted_attributes : Base.bool;
    (*

    Invalid XML but valid HTML: <div attr1="foo" attr2='bar'> But with accept_unquoted_attributes set to true, attr2 will be "bar"

    *)
  4. batch_size : Base.int;
    (*

    (Default: 20) Performance optimization. When batch_size is greater than 1, the parser will prefer to return Many list where the length of list is batch_size.

    *)
}
Sourceval sexp_of_parser_options : parser_options -> Sexplib0.Sexp.t
Sourceval compare_parser_options : parser_options -> parser_options -> Base.int
Sourceval equal_parser_options : parser_options -> parser_options -> Base.bool
Sourceval default_parser_options : parser_options

HTML boolean attributes: true. Anything else: false.

Sourceval make_parser : parser_options -> node Angstrom.t
Sourceval parser : node Angstrom.t

IO-agnostic Angstrom.t XML parser.

It is not fully spec-compliant, it does not attempt to validate character encoding or reject all incorrect documents. It does not process references. It does not automatically unescape XML escape sequences but SZXX.Xml.DOM.unescape is provided to do so.

See README.md for examples on how to use it.

Sourcemodule Expert : sig ... end

For those who want finer-grained control and want to parse (using Angstrom) and fold (using this module) by hand.