package mechaml

  1. Overview
  2. Docs

Module Mechaml.PageSource

Page

This module contains all the functions used to analyze a page, select specific elements, and manage forms.

Sourcetype t

The type of an html page

Sourceval from_soup : ?location:Uri.t -> Soup.soup Soup.node -> t

Make a new page from a base URI and a Lambdasoup document

Sourceval from_string : ?location:Uri.t -> string -> t

Make a new page from a base URI and a HTML string

Sourceval base_uri : t -> Uri.t

Return the location of a page (or Uri.empty if not specified)

Sourceval resolver : t -> Uri.t -> Uri.t

Return the resolver of page, that take relative URIs to absolute ones using the page base URI

Convert to Lambdasoup

Lazy sequences

Lambdasoup provides lazy sequences to traverse only needed part of an HTML document when used in combination with with_stop. We provide a wrapper that is compatible with Mechaml types such as forms, images, inputs, etc.

Sourcetype +'a seq

Lazy sequences of HTML elements. See Soup.nodes type

Sourcetype 'a stop = 'a Soup.stop = {
  1. throw : 'b. 'a -> 'b;
}

Soup.stop type

Operations on lazy sequences

Sourceval iter : ('a -> unit) -> 'a seq -> unit
Sourceval fold : ('a -> 'b -> 'a) -> 'a -> 'b seq -> 'a
Sourceval filter : ('a -> bool) -> 'a seq -> 'a seq
Sourceval first : 'a seq -> 'a option
Sourceval nth : int -> 'a seq -> 'a option
Sourceval find_first : ('a -> bool) -> 'a seq -> 'a option
Sourceval to_list : 'a seq -> 'a list
Sourceval with_stop : ('a stop -> 'a) -> 'a

see Lambdasoup's Soup.with_stop

Form

Sourcemodule Form : sig ... end

Operations on forms and inputs

Operations on hypertext links

Sourcemodule Image : sig ... end

Operations on images

Nodes selection

All the following function are built using the same pattern.

  • xxxs (eg forms) return all the elements of a certain type as a lazy sequence. For example, forms mypage will return all the forms in the page
  • xxx_with take a CSS selector as parameter, and return the first element that matches the selector, or None if there isn't any. Eg, link_with "[href$=.jpg]" mypage will try to find a link that point to a JPEG image
  • xxxs_with proceed as the previous one, but return a lazy sequence of all elements matching the selector.
Sourceval form_with : string -> t -> Form.t option
Sourceval forms_with : string -> t -> Form.t seq
Sourceval forms : t -> Form.t seq
Sourceval image_with : string -> t -> Image.t option
Sourceval images_with : string -> t -> Image.t seq
Sourceval images : t -> Image.t seq
OCaml

Innovation. Community. Security.