rosetta

Universal mapper to Unicode
README

Rosetta is a merge-point between uuuu, coin and
yuscii. It able to decode UTF-7, ISO-8859 and KOI8 and return Unicode
code-point - then, end-user can normalize it to UTF-8 with uutf for
example.

The final goal is to provide an universal decoder of any encoding. This project
is a part of mrmime, a parser of emails to be able to decode
encoded-word (according rfc2047).

If you want to handle a new encoding (like, hmmhmm, APL-ISO-IR-68...), you can
make a new issue - then, the process will be to make a new little library and
integrate it to rosetta.

How to use it?

rosetta follows the same design as libraries used underlying. More precisely,
it follows the same API as uutf about encoding. This is a little example
to transform a latin1 flow to UTF-8:

let trans ic oc =
  let decoder = Rosetta.decoder (Rosetta.encoding_of_string "latin1") (`Channel ic) in
  let encoder = Uutf.encoder `UTF_8 (`Channel oc) in
  let rec go () = match Rosetta.decode decoder with
    | `Await -> assert false (* XXX(dinosaure): impossible when you use `String of `Channel as source. *)
    | `Uchar _ as uchar -> ignore @@ Uutf.encode encoder uchar ; go ()
    | `End -> ignore @@ Uutf.encoder `End
    | `Malformed err -> failwith err in
  go ()
  
let () = trans stdin stdout

About encoding_of_string

rosetta follows aliases availables into IANA character sets database:
https://www.iana.org/assignments/character-sets.xhtml

Others aliases will raise an exception. This function is case-insensitive.

About translation tables

rosetta relies on underlying libraries such as uuuu or coin. They
integrate translation tables provided by Unicode consortium. They should not be
updated - so we statically save them into an int array.

About encoding

rosetta supports only decoding to Unicode code-point. A support of encoding is
not on our plan where people should only use Unicode now. Deal with many
encodings is a pain and we should only produce something according to Unicode
than old encoding like latin1.

Install
Published
12 Dec 2019
Sources
rosetta-v0.2.0.tbz
sha256=d8a2b6b235b7c15025d3d72a87d05bf691fcf7f3d90a892cce9c5529f760498f
sha512=9a323cd5b05e9ae7ba1f572936a42948fbc42090e1be6557840652d9deddee4cb979691047f1b6814afc07e81ec74eb9b1fcab098ba6d525ae88530c790b967a
Dependencies
yuscii
>= "0.2.1"
uuuu
>= "0.1.1"
coin
>= "0.1.1"
cmdliner
< "1.1.0"
dune
>= "1.4"
ocaml
>= "4.03.0"
Reverse Dependencies