Library
Module
Module type
Parameter
Class
Class type
Emile module, parser of e-mail address.
The local part of an e-mail address is composed by two kinds of words:
`Atom
is string as is.`String
is a string surrounded by double-quote to allow white-space.The second kind is sanitized — we deleted double-quote which surround string
.
type local = word list
Local part of e-mail address.
Subset of domain described by RFC5321 which contains 3 kinds of address:
IPv4
: a valid IPv4 addressIPv6
: a valid IPv6 addressExt (ldh, value)
: an extended kind of domain recognized by ldh
identifier which valus is value
Parser of IPv4
and IPv6
was done by Ipaddr
. An extended kind Ext
needs to be resolved by the client.
Domain part of e-mail address. A domain integrate kinds from RFC5321 (see addr
), a domain described by RFC5322 and a `Literal
which is the last best-effort value possible as a domain.
Emile
does not resolve domain.
A phrase is a sentence to associate a name with an e-mail address or a group of e-mail addresses. `Encoded
value is not normalized on the charset specified. The encoded's string is decoded as is only. For example, `Encoded
can inform to use KOI-8 encoding (cyrillic charset). However, Emile
does not check if value is a valid KOI-8 string, nor normalizes to unicode. Emile
just decodes it as is.
val pp_addr : addr Fmt.t
val pp_domain : domain Fmt.t
val pp_word : word Fmt.t
val pp_local : local Fmt.t
val pp_raw : raw Fmt.t
val pp_phrase : phrase Fmt.t
val pp_mailbox : mailbox Fmt.t
val pp_group : group Fmt.t
val pp_address : address Fmt.t
val pp_set : set Fmt.t
case_insensitive a b
maps values with lowercase_ascii
and compare them with String.compare
. We do not map UTF8 value.
equal_domains a b
apply equal_domain
to ordered domains (see compare_domain
) between a
and b
.
equal_mailbox ?case_sensitive a b
tests if mailbox
a
and mailbox
b
are semantically equal. The user can define if the local-part need to be case-sensitive or not (by case_sensitive
). If a
xor b
has a name, we consider a = b
if we have the same local-part and same domain(s). Otherwise, we compare identifier/phrase
between them.
compare ?case_sensitive a b
compares mailbox
a
and mailbxo
b
semantically. We prioritize local-part, domain-part and finally optionnal name.
If you don't want a headache, you should move on.
module Parser : sig ... end
This is an aggregation of rules used to parse an e-mail address. The goal of this documentation is to show relations between RFCs, updates, and final description of parts needed to parse an e-mail address.
We have 4 kinds of parsers for e-mail address:
List.of_string*
is the most general parser which used as the parser of To:
field into an e-mail. Indeed, this value is a list of set
which can contain only one e-mail address or a named group of e-mail addresses.
This parser is used into tests of Emile
.
address_of_string*
is the parser of e-mail address like local-part@domain
. This is the most common (in your mind) case for the client to parse an e-mail address. This parser does not handle a named e-mail address or a multiple domains e-mail address however.set_of_string*
is the parser which performs a named group of e-mail addresses (group
) or an optionaly named e-mail address (mailbox
). In constrast to address_of_string
, this parser handles multiple domains e-mail address.of_string*
is the most general unit parser of e-mail address. That means, this parser is like set_of_string
without a named group of e-mail addresses. It handles named e-mail address and multiple domains e-mail address. The client should use this function if he does not know exactly the format of input.For each parser, you have the common of_string
function, the of_string_with_crlf
function and finally the of_string_raw
function. The first one is the most easy to understand, it takes your string and try to extract an e-mail address (or a set or a list of set).
Then, the second is a more general parser. The delimiter of an e-mail address into an e-mail context is a double CRLF code (to stop the folding whitespace rule). Indeed, an e-mail can be encoded on multiple lines... So, of_string
function is a special case of of_string_with_crlf
where we put a double CRLF code at the end of your string to ensure to stop parser somewhere.
The final function, of_string_raw
could be interesting client who wants to integrate Emile
inside a parser. This function compute only a slice of your string and returns how many bytes it consumed to extract e-mail address. Internal stuff put CRLF code too to stop parser and uncount CRLF code when it returns how many byte(s) it consumed.
For client who wants to use Emile
into an existing parser, your e-mail address should be delimited or surrounded by characters. For example, you can have an e-mail in this form: <local@domain>
. In this example, e-mail address is surrounded by <
and >
. Your goal is to extract string inside them and use address_of_string
which does not allow <
and >
into e-mail address.
In other case, your e-mail address can have this form: John
<local@domain>\n
. In this case, your e-mail address is delimited by \n
and you should use of_string
which will compute name (John
) and associated e-mail address.
By these examples, extract an e-mail address is clearly not easy because it can take different forms and client needs to figure out what he clearly needs. Then, these parsers can fail for different non-obvious reasons - and, in this case, client needs to understand standards sadly to understand where is specially the problem.
In other way, if client is comfortable with Angstrom
, Emile
provides indigestible parsers (see Parser
).
module List : sig ... end