package b0
Install
dune-project
Dependency
Authors
Maintainers
Sources
sha512=e9aa779e66c08fc763019f16d4706f465d16c05d6400b58fbd0313317ef33ddea51952e2b058db28e65f7ddb7012f328c8bf02d8f1da17bb543348541a2587f0
doc/b0.std/B0_url/index.html
Module B0_url
Source
Sloppy URL processing.
URL standards are in a sorry state. This module takes a sloppy approach to URL processing. It only breaks URLs into their components and classifies them.
Warning. None of the functions here perform percent encoding or decoding. Use Percent
when deemed appropriate.
URLs
The type for schemes, without the ':'
separator.
The type for HOST:PORT
authorities.
The type for paths.
The type for queries, without the '?'
separator.
The type for fragments, without the '#'
seperator.
The type for URLs.
Kinds
The type for kinds of relative references. Represents this alternation.
The type for kinds of URLs. Represents this this alternation.
kind u
determines the kind of u
. It decides that u
is absolute if u
starts with a scheme and :
.
Operations
val of_url :
t ->
?scheme:scheme option ->
?authority:authority option ->
?path:path option ->
?query:query option ->
?fragment:fragment option ->
unit ->
t
of_url u ()
is a new url whith unspecified components defaulting to those of u
. If specified with None
the given component is deleted.
append root u
is u
if kind
u
is `Absolute
. Otherwise uses root
to make it absolute according to its relative_kind
. The result is guaranteed to be absolute if root
is, the result may be surprising or non-sensical if root
isn't (FIXME can't we characterize that more ?).
to_absolute ~scheme ~root_path
transforms u
depending on the value of kind
u
:
- If
`Absolute
then this isu
itself. - If
`Relative `Scheme
thenu
is given the schemescheme
. - If
`Relative `Absolute_path
thenu
is given the schemescheme
. - If
`Relative `Relative_path
thenu
is given the schemescheme
and the path ofu
is prepended byroot_path
(if any). - If
`Relative `Empty
thenu
is given the schemescheme
and the path isroot_path
(if any).
Authorities
Scraping
list_of_text_scrape ?root s
roughly finds absolute and relative URLs in the ASCII compatible (including UTF-8) textual data s
by looking in order:
- For the next
href
orsrc
substring then tries to parses the content of an HTML attribute. This may result in relative or absolute paths. - For next
http
substrings ins
and then delimits an URL depending on the previous characters and checks that the delimited URL starts withhttp://
orhttps://
.
Relative URLs are appended to root
if provided. Otherwise they are kept as is. The result may have duplicates.
Formatting
pp
formats an URL. For now this is just Format.pp_print_string
.
pp_kind
formats an unspecified representation of kinds.