package SZXX
Page
Library
Module
Module type
Parameter
Class
Class type
Source
Module SZXX.XmlSource
Advanced parsing utilities: custom parser options and tools to stream huge documents
type document = {decl_attrs : DOM.attr_list;(*The declaration attributes, e.g. version and encoding
*)top : DOM.element;(*The top element of the document
*)
}val parse_document :
?parser:SAX.node Angstrom.t ->
?strict:Base.bool ->
Feed.t ->
(document, Base.string) Base.Result.tProgressively parse a fully formed, fully escaped XML document. It begins parsing without having to read the whole input in its entirety.
parser: Override the default parser. Make your own parser with SZXX.Xml.SAX.make_parser or pass SZXX.Xml.html_parser.
strict: Default: true. When false, non-closed elements are treated as self-closing elements, HTML-style. For example a <br> without a matching </br> will be treated as a self-closing <br />.
feed: A producer of raw input data. Create a feed by using the SZXX.Feed module.
val parse_document_from_string :
?parser:SAX.node Angstrom.t ->
?strict:Base.bool ->
Base.string ->
(document, Base.string) Base.Result.tSame as parse_document, but from a string
val stream_matching_elements :
?parser:SAX.node Angstrom.t ->
?strict:Base.bool ->
filter_path:Base.string Base.list ->
on_match:(DOM.element -> Base.unit) ->
Feed.t ->
(document, Base.string) Base.Result.tProgressively assemble an XML DOM, but every element that matches filter_path is passed to on_match instead of being added to the DOM. This "shallow DOM" is then returned. All text nodes are properly unescaped. It begins parsing without having to read the whole input in its entirety.
parser: Override the default parser. Make your own parser with SZXX.Xml.SAX.make_parser or pass SZXX.Xml.html_parser.
strict: Default: true. When false, non-closed elements are treated as self-closing elements, HTML-style. For example a <br> without a matching </br> will be treated as a self-closing <br />.
feed: A producer of raw input data. Create a feed by using the SZXX.Feed module.
filter_path: indicates which part of the DOM should be streamed out instead of being stored in the DOM. For example ["html"; "body"; "div"; "div"; "p"] will emit all the <p> tags nested inside exactly 2 levels of <div> tags in an HTML document.
on_match: Called on every element that matched filter_path