package b0

  1. Overview
  2. Docs
Legend:
Page
Library
Module
Module type
Parameter
Class
Class type
Source

Module B0_urlSource

Sloppy URL processing.

URL standards are in a sorry state. This module takes a sloppy approach to URL processing. It only breaks URLs into their components and classifies them.

Warning. None of the functions here perform percent encoding or decoding. Use Percent when deemed appropriate.

URLs

Sourcetype scheme = string

The type for schemes, without the ':' separator.

Sourcetype authority = string

The type for HOST:PORT authorities.

Sourcetype path = string

The type for paths.

Sourcetype query = string

The type for queries, without the '?' separator.

Sourcetype fragment = string

The type for fragments, without the '#' seperator.

Sourcetype t = string

The type for URLs.

Sourceval scheme : t -> scheme option

scheme u is the scheme of u, if any.

Sourceval authority : t -> authority option

authority u is the authority of u, if any.

Sourceval path : t -> path option

path u is the path of u, if any.

Sourceval query : t -> query option

query u is the query of u, if any.

Sourceval fragment : t -> fragment option

fragment u is the fragment of u, if any.

Kinds

Sourcetype relative_kind = [
  1. | `Scheme
  2. | `Absolute_path
  3. | `Relative_path
  4. | `Empty
]

The type for kinds of relative references. Represents this alternation.

Sourcetype kind = [
  1. | `Absolute
  2. | `Relative of relative_kind
]

The type for kinds of URLs. Represents this this alternation.

Sourceval kind : t -> kind

kind u determines the kind of u. It decides that u is absolute if u starts with a scheme and :.

Operations

Sourceval of_url : t -> ?scheme:scheme option -> ?authority:authority option -> ?path:path option -> ?query:query option -> ?fragment:fragment option -> unit -> t

of_url u () is a new url whith unspecified components defaulting to those of u. If specified with None the given component is deleted.

Sourceval append : t -> t -> t

append root u is u if kind u is `Absolute. Otherwise uses root to make it absolute according to its relative_kind. The result is guaranteed to be absolute if root is, the result may be surprising or non-sensical if root isn't (FIXME can't we characterize that more ?).

Sourceval to_absolute : scheme:scheme -> root_path:path option -> t -> t

to_absolute ~scheme ~root_path transforms u depending on the value of kind u:

  • If `Absolute then this is u itself.
  • If `Relative `Scheme then u is given the scheme scheme.
  • If `Relative `Absolute_path then u is given the scheme scheme.
  • If `Relative `Relative_path then u is given the scheme scheme and the path of u is prepended by root_path (if any).
  • If `Relative `Empty then u is given the scheme scheme and the path is root_path (if any).

Authorities

Sourcemodule Authority : sig ... end

Sloppy authority processing.

Scraping

Sourceval list_of_text_scrape : ?root:t -> string -> t list

list_of_text_scrape ?root s roughly finds absolute and relative URLs in the ASCII compatible (including UTF-8) textual data s by looking in order:

  1. For the next href or src substring then tries to parses the content of an HTML attribute. This may result in relative or absolute paths.
  2. For next http substrings in s and then delimits an URL depending on the previous characters and checks that the delimited URL starts with http:// or https://.

Relative URLs are appended to root if provided. Otherwise they are kept as is. The result may have duplicates.

Formatting

Sourceval pp : Format.formatter -> t -> unit

pp formats an URL. For now this is just Format.pp_print_string.

Sourceval pp_kind : Format.formatter -> kind -> unit

pp_kind formats an unspecified representation of kinds.

Percent encoding

Sourcemodule Percent : sig ... end

Percent-encoding codecs according to RFC 3986.