emile 0.3 · OCaml Package

Emile module, parser of e-mail address.

type raw =

| Quoted_printable of string
| Base64 of [ `Dirty of string | `Clean of string | `Wrong_padding ]

An e-mail address can contain as a part of a phrase (identifier) an encoded string. Standards describe 2 kinds of encoding:

Quoted Printable: used to insert hexadecimal value with the = operator.
Base 64: string encoded in MIME's Base64

Parser already decodes encoded raw, the client can use it as is.

type word = [

| `Atom of string
| `String of string

]

The local part of an e-mail address is composed by two kinds of words:

`Atom is string as is.
`String is a string surrounded by double-quote to allow white-space.

The second kind is sanitized — we deleted double-quote which surround string.

type local = word list

Local part of e-mail address.

type addr =

| IPv4 of Ipaddr.V4.t
| IPv6 of Ipaddr.V6.t
| Ext of string * string

Subset of domain described by RFC5321 which contains 3 kinds of address:

IPv4: a valid IPv4 address
IPv6: a valid IPv6 address
Ext (ldh, value): an extended kind of domain recognized by ldh identifier which valus is value

Parser of IPv4 and IPv6 was done by Ipaddr. An extended kind Ext needs to be resolved by the client.

type domain = [

| `Domain of string list
| `Addr of addr
| `Literal of string

]

Domain part of e-mail address. A domain integrate kinds from RFC5321 (see addr), a domain described by RFC5322 and a `Literal which is the last best-effort value possible as a domain.

Emile does not resolve domain.

type phrase = [ `Dot | `Word of word | `Encoded of string * raw ] list

A phrase is a sentence to associate a name with an e-mail address or a group of e-mail addresses. `Encoded value is not normalized on the charset specified. The encoded's string is decoded as is only. For example, `Encoded can inform to use KOI-8 encoding (cyrillic charset). However, Emile does not check if value is a valid KOI-8 string, nor normalizes to unicode. Emile just decodes it as is.

type mailbox = {

name : phrase option;
local : local;
domain : domain * domain list;

}

A mailbox is an e-mail address. It contains an optional name (see phrase), a local-part see {!local

}

and one or more domain(s).

type group = {

group : phrase;
mailboxes : mailbox list;

}

A group is a named set of mailbox.

type address = local * (domain * domain list)

A basic e-mail address.

type set = [

| `Mailbox of mailbox
| `Group of group

]

The Emile's set type which is a singleton (only one mailbox) or a set of e-mail addresses (a group).

Pretty-printer

val pp_addr : addr Fmt.t

val pp_domain : domain Fmt.t

val pp_word : word Fmt.t

val pp_local : local Fmt.t

val pp_raw : raw Fmt.t

val pp_phrase : phrase Fmt.t

val pp_mailbox : mailbox Fmt.t

val pp_group : group Fmt.t

val pp_address : address Fmt.t

val pp_set : set Fmt.t

Equal & Compare

type 'a equal = 'a -> 'a -> bool

type 'a compare = 'a -> 'a -> int

val case_sensitive : string -> string -> int

Alias of String.compare.

val case_insensitive : string -> string -> int

case_insensitive a b maps values with lowercase_ascii and compare them with String.compare. We do not map UTF8 value.

val equal_word : compare:string compare -> word equal

equal ~compare a b tests if word a and word b are semantically equal. compare specifies implementation to compare two string (i.e. to be case-sensitive or not).

val compare_word : ?case_sensitive:bool -> word compare

compare_word ?case_sensitive a b compares word a and word b semantically. From standards, word SHOULD be case-sensitive, the client can notice this behaviour by ?case_sensitive (default is true).

val equal_raw : compare:string compare -> raw equal

equal_raw a b tests if raw a and raw b are semantically equal. Semantically equal means we compare raw's content, by this way, a Base64 raw could be equal to a Quoted_printable raw if and only if string are equal.

val compare_raw : compare:string compare -> raw compare

compare_raw a b compares raw a and raw b semantically.

val equal_phrase : phrase equal

equal_phrase a b tests if phrase a and phrase b are semantically equal. In this case, the comparison is case-insensitive between elements in phrase. The order of elements is important.

val compare_phrase : phrase compare

compare_phrase a b compares phrase a and phrase b semantically.

val equal_addr : addr equal

equal_addr a b tests if addr a and addr b are semantically equal. An IPv4 should be equal with an IPv6 address. Then, for extended kind, we strictly compare (Pervasives.compare) kind and value.

val compare_addr : addr compare

compare_addr a b compares addr a and addr b, we prioritize IPv6, IPv4 and finally Ext.

val equal_domain : domain equal

equal_addr a b tests if domain a and domain b are semantically equal. We do not resolve domain, a `Domain could be semantically equal to another `Domain if they point to the same IPv4/IPv6.

val compare_domain : domain compare

comapre_domain a b compares domain a and domain b, we prioritize `Domain, `Literal and finally `Addr. The comparison between two `Literal and between part of `Domain are case-insensitive.

val equal_domains : (domain * domain list) equal

equal_domains a b apply equal_domain to ordered domains (see compare_domain) between a and b.

val compare_domains : (domain * domain list) compare

compare_domains a b compares ordered list of domain a and ordered list of domain b.

val equal_local : ?case_sensitive:bool -> local equal

equal_local ?case_sensitive a b tests if local a and local b are semantically equal. Standards notices local-part SHOULD be case-sensitive, the client can choose this behaviour with case_sensitive.

val compare_local : ?case_sensitive:bool -> local compare

compare_local ?case_sensitive a b compares local a and local b semantically. The user can decide if the comparison is case-sensitive or not (with case_sensitive).

val equal_mailbox : ?case_sensitive:bool -> mailbox equal

equal_mailbox ?case_sensitive a b tests if mailbox a and mailbox b are semantically equal. The user can define if the local-part need to be case-sensitive or not (by case_sensitive). If a xor b has a name, we consider a = b if we have the same local-part and same domain(s). Otherwise, we compare identifier/phrase between them.

val compare_mailbox : ?case_sensitive:bool -> mailbox compare

compare ?case_sensitive a b compares mailbox a and mailbxo b semantically. We prioritize local-part, domain-part and finally optionnal name.

val compare_group : group compare

comapre_group a b compares group a and group b. We compare the group name first and compare ordered mailboxes list then.

val equal_group : group equal

equal_group a b tests if group a and group b are semantically equal. We compare first group name and ordered mailboxes list then.

val compare_address : address compare

compare_address a b compares semantically address a* and address b.

val equal_address : address equal

equal_address a b tests semantically address a and address b.

val equal_set : set equal

equal a b tests semantically set a and set b.

val compare_set : set compare

compare a b compares set a and set b.

val strictly_equal_set : set equal

A structurally equal function on set.

Parsers

If you don't want a headache, you should move on.

module Parser : sig ... end

This is an aggregation of rules used to parse an e-mail address. The goal of this documentation is to show relations between RFCs, updates, and final description of parts needed to parse an e-mail address.

Decoders

We have 4 kinds of parsers for e-mail address:

List.of_string* is the most general parser which used as the parser of To: field into an e-mail. Indeed, this value is a list of set which can contain only one e-mail address or a named group of e-mail addresses.
This parser is used into tests of Emile.
address_of_string* is the parser of e-mail address like local-part@domain. This is the most common (in your mind) case for the client to parse an e-mail address. This parser does not handle a named e-mail address or a multiple domains e-mail address however.
set_of_string* is the parser which performs a named group of e-mail addresses (group) or an optionaly named e-mail address (mailbox). In constrast to address_of_string, this parser handles multiple domains e-mail address.
of_string* is the most general unit parser of e-mail address. That means, this parser is like set_of_string without a named group of e-mail addresses. It handles named e-mail address and multiple domains e-mail address. The client should use this function if he does not know exactly the format of input.

For each parser, you have the common of_string function, the of_string_with_crlf function and finally the of_string_raw function. The first one is the most easy to understand, it takes your string and try to extract an e-mail address (or a set or a list of set).

Then, the second is a more general parser. The delimiter of an e-mail address into an e-mail context is a double CRLF code (to stop the folding whitespace rule). Indeed, an e-mail can be encoded on multiple lines... So, of_string function is a special case of of_string_with_crlf where we put a double CRLF code at the end of your string to ensure to stop parser somewhere.

The final function, of_string_raw could be interesting client who wants to integrate Emile inside a parser. This function compute only a slice of your string and returns how many bytes it consumed to extract e-mail address. Internal stuff put CRLF code too to stop parser and uncount CRLF code when it returns how many byte(s) it consumed.

For client who wants to use Emile into an existing parser, your e-mail address should be delimited or surrounded by characters. For example, you can have an e-mail in this form: <local@domain>. In this example, e-mail address is surrounded by < and >. Your goal is to extract string inside them and use address_of_string which does not allow < and > into e-mail address.

In other case, your e-mail address can have this form: John <local@domain>\n. In this case, your e-mail address is delimited by \n and you should use of_string which will compute name (John) and associated e-mail address.

By these examples, extract an e-mail address is clearly not easy because it can take different forms and client needs to figure out what he clearly needs. Then, these parsers can fail for different non-obvious reasons - and, in this case, client needs to understand standards sadly to understand where is specially the problem.

In other way, if client is comfortable with Angstrom, Emile provides indigestible parsers (see Parser).

type error = [

| `Invalid of string * string list
| `Incomplete

]

val pp_error : error Fmt.t

pp_error ppf err prints an error.

module List : sig ... end

val address_of_string_with_crlf : string -> (address, error) result

val address_of_string : string -> (address, error) result

val address_of_string_raw : 
  string ->
  int ->
  int ->
  (address * int, error) result

val set_of_string_with_crlf : string -> (set, error) result

val set_of_string : string -> (set, error) result

val set_of_string_raw : string -> int -> int -> (set * int, error) result

val of_string_with_crlf : string -> (mailbox, error) result

val of_string : string -> (mailbox, error) result

val of_string_raw : string -> int -> int -> (mailbox * int, error) result

package emile

Pretty-printer

Equal & Compare

Parsers

Decoders