mrmime

Mr. MIME
README

mrmime is a library to parse and generate mail according several RFCs:

  • RFC822: Standard For The Format of ARPA Internet Text Messages

  • RFC2822: Internet Message Format

  • RFC5321: Simple Mail Transfer Protocol

  • RFC5322: Internet Message Format

  • RfC2045: MIME Part One: Format of Internet Message Bodies

  • RFC2046: MIME Part Two: Media Types

  • RFC2047: MIME Part Three: Message-Header Extensions for Non-ASCII Text

  • RFC2049: MIME Part Five: Conformance Criteria and Examples

  • RFC6532: Internationalized Email Headers

mrmime was made with angstrom to be able to parse mails and try
to do the best-effort. From a bunch of mails (2 billions), mrmime is able to
parse all of them - however, results can diverge from what you expect.

In other side, mrmime is able to generate valid mail from an OCaml description.
Generation follows some rules:

  • stream produced emits only line per line

  • we do the best-effort to limit lines by 78 characters

  • we follows RFC6532 and emit UTF-8 mail

How to parse a mail?

We have different ways to parse a mail and it's depends of what you want. In
fact, in some ways, you should be interesting only by the header part. In some
others cases, you probably want bodies. We decide to separate these tasks into 2
API (which differ) to fit under some constraints.

For example, if you want to extract only the header, we probably want to take
care about memory consumption - if you want, for example, to implement a SMTP
server and where only the header is interesting.

An stream API is provided in this case and from this, we are able to implement
a DKIM checker which needs only one-pass to verify your mail.

In other side, if you want to extract bodies of your mail, parser provided is
not a stream parser where we need to extract bodies from a multipart mail.
An explanation of how to use it is given in this document.

Parse only the header part

For many purposes, we are mostly interesting to parse only the header part of a
mail. In this case, Hd sub-module should be what you want.

A complex example of Hd is available on the ocaml-dkim
project which wants to extract DKIM signature from header.

let dkim_signature = Mrmime.Field_name.v "DKIM-Signature"

let extract_dkim () =
  let open Mrmime in
  let tmp = Bytes.create 0x1000 in
  let buffer = Bigstringaf.create 0x1000 in
  let decoder = Hd.decoder buffer in
  let rec decode () = match Hd.decode decoder with
    | `Field field ->
      ( match Location.prj field with
      | Field.Field (field_name, Unstructured, v)
          when Field_name.equal field_name dkim_signature ->
        Fmt.pr "%a: %a\n%!" Field_name.pp dkim_signature Unstructured.pp v
      | _ -> decode () )
    | `Malformed err -> failwith err
    | `End rest -> ()
    | `Await ->
      let len = input stdin tmp 0 (Bytes.length tmp) in
      ( match Hd.src decoder (Bytes.unsafe_to_string tmp) 0 len with
        | Ok () -> decode ()
        | Error (`Msg err) -> failwith err ) in
  decode ()

This little snippet will parse a mail which is encoded with CRLF end-of-line
from stdin (so you should map your mail with this newline convention). When it
reachs a DKIM field, it prints a well-parsed value of it (in our case, an
unstructured value). [Other] corresponds to other fields - DKIM signature
can appear here where we failed to parse value as an unstructured value.

Parse entirely a mail

Of course, the initial goal of mrmime is to parse an entire mail. In this
case, you should use the Mail sub-module which provides angstrom
parser.

Bodies can be weight and if you want to store them by yourself, we provide an
API which expects consumers to consume bodies (and store them, for example, into
UNIX files).

A complex example is available on ptt to extract bodies and save them into
UNIX files. For this we use:

val stream : emitters:(Header.t -> (string option -> unit) * 'id) -> (Header.t * 'id t) Angstrom.t

Which will call emitters at any part of your mail. parser will decode
properly part (according Content-Transfer-Encoding) and give you inputs into
your consumer.

How to emit a mail?

mrmime is able to generate a mail from an OCaml description of it. You have
several ways to craft informations like address or Content-Type field for a
specific part.

Many sub-modules of mrmime provide a way to construct an information like a
subject needed for you mail or recipients of it. For example, the sub-module
Mailbox provides an easy way to construct an address:

let romain_calascibetta =
  let open Mrmime.Mailbox in
  Local.[ w "romain"; w "calascibetta" ] @ Domain.(domain, [ a "x25519"; a "net" ])

Documentation was done to help you to construct many of these values. Of course,
Header will be the module to construct an header:

let header =
  let open Mrmime in
  Field.[ Field (Field_name.subject, Unstructured,
                 Unstructured.Craft.(compile [ v "Simple"; sp 1; v "Email" ]))
        ; Field (Field_name.v "To", Addresses, [ `Mailbox romain_calascibetta ])
        ; Field (Field_name.date, Date, (Date.of_ptime ~zone:GMT (Ptime_clock.now ()))) ]
  |> Header.of_list

Then, Header provides a to_stream function which will emit your header line
per line (with the CRLF newline convention) - mostly to be able to branch it
into a SMTP pipe.

Finally, for a multipart mail, the Mt sub-module is the most interesting to
make part from stream (stream from a file or from standard input) associated to
Content fields (like Content-Transfer-Encoding). mrmime takes care about
how to encode your stream (base64 or quoted-printable).

A complex example of how to use Mt module is available in
facteur project which is able to send a multipart mail.

Encoding

A real effort was made to consider any inputs/outputs of mrmime as UTF-8
string. This result is done by some underlying packages:

  • rosetta as universal unifier to unicode

  • uuuu as mapper from ISO-8859 to Unicode

  • coin as mapper from KOI8-{U,R} to Unicode

  • yuscii as mapper from UTF-7 to Unicode

SMTP protocol constraints bodies to use only 7 bits per byte (historial
limitation). By this way, encoding such as quoted-printable or base64 are
used to encode bodies and respect this limitation. mrmime uses:

  • pecu as a stream encoder/decoder

  • base64 (base64.rfc2045 sub-package) as a stream encoder/decoder

Status of the project

mrmime is really experimental. Where it wants to take care about many purposes
(encoding or multipart), API should change often. We reach a first version
because we are able to send a well formed multipart mail from it - however, it's
possible to reach weird case where mrmime can emit invalid mail.

About parser, the same advise is done where Mail format is not really respected
by implementations in many cases and the parser should fail on some of them for
a weird reason.

Of course, feedback is expected to improve it. So you can use it, but you should
not expect an industrial quality - I mean, not yet. So play with it, and enjoy
your hacking!

mrmime has received funding from the Next Generation Internet Initiative (NGI)
within the framework of the DAPSI Project.

Install
Published
26 Oct 2021
Sources
mrmime-v0.5.0.tbz
sha256=0ac119fbcf49e66d2e13dec3cc23109be03cbd7b9f7f868ab1afb3eb3bf2c4e4
sha512=3f047fea13792415317ca5e3ba26a5ca8761662de57937a3b40ae590a0d5a82da645118472fc25ff8568b6615587264ae0c410adf39ae0498492e9a0dfa6695e
Dependencies
Reverse Dependencies
dkim
>= "0.3.0"
letters
= "0.1.1" | >= "0.2.1"
received
>= "0.5.1"