package mechaml

  1. Overview
  2. Docs

Page

This module contains all the functions used to analyze a page, select specific elements, and manage forms.

type t

The type of an html page

val from_soup : ?location:Uri.t -> Soup.soup Soup.node -> t

Make a new page from a base URI and a Lambdasoup document

val from_string : ?location:Uri.t -> string -> t

Make a new page from a base URI and an HTML string

val base_uri : t -> Uri.t

Return the location of a page (or Uri.empty if not specified)

val resolver : t -> Uri.t -> Uri.t

Return the resolver of page, which is a function that takes relative URIs of the page to absolute ones using the page base URI

val soup : t -> Soup.soup Soup.node

Convert to a Lambdasoup HTML node

Lazy sequences

Lambdasoup provides lazy sequences to traverse only needed part of an HTML document when used in combination with with_stop. We provide a wrapper that is compatible with Mechaml types such as forms, images, inputs, etc.

type +'a seq

Lazy sequences of HTML elements. See Soup.nodes type

type 'a stop = 'a Soup.stop = {
  1. throw : 'b. 'a -> 'b;
}

Soup.stop type

Operations on lazy sequences

val iter : ('a -> unit) -> 'a seq -> unit
val fold : ('a -> 'b -> 'a) -> 'a -> 'b seq -> 'a
val filter : ('a -> bool) -> 'a seq -> 'a seq
val first : 'a seq -> 'a option
val nth : int -> 'a seq -> 'a option
val find_first : ('a -> bool) -> 'a seq -> 'a option
val to_list : 'a seq -> 'a list
val with_stop : ('a stop -> 'a) -> 'a

see Lambdasoup's Soup.with_stop

Form

module Form : sig ... end

Operations on forms and inputs

Operations on hypertext links

module Image : sig ... end

Operations on images

Nodes selection

All the following function are built using the same pattern.

  • xxxs (eg forms) return all the elements of a certain type as a lazy sequence. For example, forms mypage will return all the forms in the page
  • xxx_with take a CSS selector as parameter, and return the first element that matches the selector, or None if there isn't any. Eg, link_with "[href$=.jpg]" mypage will try to find a link that point to a JPEG image
  • xxxs_with proceed as the previous one, but return a lazy sequence of all elements matching the selector.
val form_with : string -> t -> Form.t option
val forms_with : string -> t -> Form.t seq
val forms : t -> Form.t seq
val image_with : string -> t -> Image.t option
val images_with : string -> t -> Image.t seq
val images : t -> Image.t seq