Library
Module
Module type
Parameter
Class
Class type
Scraping agent
Mechaml is a web agent that allows to :
It is built on top of Cohttp, Lwt and Lambdasoup.
type http_status_code = Cohttp.Code.status_code
type http_headers = Cohttp.Header.t
module HttpResponse : sig ... end
The HttpResponse module defines a type and operations to extract content and metadata from the server response
type result = t * HttpResponse.t
val init : ?max_redirect:int -> unit -> t
Create a new empty agent. ~max_redirect
indicates how many times the agent will automatically and consecutively follow the Location
header in case of an HTTP 302 or 303 response code, to avoid a redirect loop. Set to 0
to disable automatic redirection.
The following functions perform a get request to the specified URI. get "http://www.site/some/url" agent
sends a HTTP GET request and return the updated state of the agent together with the server response
val click : Page.Link.t -> t -> result Lwt.t
Same as get, but work directly with links instead of URIs
The following functions send a raw post request to the specified URI
val submit : Page.Form.t -> t -> result Lwt.t
Submit a filled form
Save some downloaded content in a file
val save_image : string -> Page.Image.t -> t -> result Lwt.t
save_image "/path/to/myfile.jpg" image agent
loads the image using get
, opens myfile.jpg
, write the content in asynchronously and then returns the result
val save_content : string -> string -> unit Lwt.t
save_content "/path/to/myfile.html" content
writes the specified content in a file using Lwt asynchronous I/O
(see Cookiejar
)
val cookie_jar : t -> Cookiejar.t
Return the current Cookiejar
val set_cookie_jar : Cookiejar.t -> t -> t
Set the current Cookiejar
val add_cookie : Cookiejar.Cookie.t -> t -> t
Add a single cookie to the current Cookiejar
val remove_cookie : Cookiejar.Cookie.t -> t -> t
Remove a single cookie from the Cookiejar
val client_headers : t -> Cohttp.Header.t
Return the default headers sent when performing HTTP requests
val set_client_headers : Cohttp.Header.t -> t -> t
Use the specified headers as new default headers
Add a single key/value pair to the default headers
Set the maximum consecutive redirections (to avoid infinite loops). Use 0
to disable automatic redirection)
This module defines a monad that implicitly manages the state corresponding to the agent while being inside the Lwt monad. This is basically the state monad (for Agent.t
) and the Lwt one stacked
module Monad : sig ... end