package rosetta

You can search for identifiers within the package.

in-package search v0.2.0

On This Page

Description
Readme
Dependencies (5)
Dev Dependencies
Used by (5)
Conflicts

Universal mapper to Unicode

Install

dune-project

Dependency

github.com Readme Changelog MIT License Edit opam file Versions (4)

Authors

R

Romain Calascibetta <romain.calascibetta@gmail.com>

Maintainers

R

Romain Calascibetta <romain.calascibetta@gmail.com>

Sources

rosetta-0.4.0.tbz

sha256=dd6d662bf71bf3f305d60922d2b6f84e6570a816bf32b3e22360227753ff951f

sha512=e9e22e949a483ec1d67b00e68ac712fc125e71c92882a90fcbc97d10be8219d27260d825914ac6cc2832c17d5d8e662a2b308ff2f668f59170892d39a7e3d0af

Description

Universal mapper to Unicode (ISO-8859, KOI8, UTF-7)

Published: 28 Nov 2025

README

Rosetta - universal decoder of an encoded flow to Unicode

Rosetta is a merge-point between uuuu, coin and yuscii. It able to decode UTF-7, ISO-8859 and KOI8 and return Unicode code-point - then, end-user can normalize it to UTF-8 with uutf for example.

The final goal is to provide an universal decoder of any encoding. This project is a part of mrmime, a parser of emails to be able to decode encoded-word (according rfc2047).

If you want to handle a new encoding (like, hmmhmm, APL-ISO-IR-68...), you can make a new issue - then, the process will be to make a new little library and integrate it to rosetta.

How to use it?

rosetta follows the same design as libraries used underlying. More precisely, it follows the same API as uutf about encoding. This is a little example to transform a latin1 flow to UTF-8:

let trans ic oc =
  let decoder = Rosetta.decoder (Rosetta.encoding_of_string "latin1") (`Channel ic) in
  let encoder = Uutf.encoder `UTF_8 (`Channel oc) in
  let rec go () = match Rosetta.decode decoder with
    | `Await -> assert false (* XXX(dinosaure): impossible when you use `String of `Channel as source. *)
    | `Uchar _ as uchar -> ignore @@ Uutf.encode encoder uchar ; go ()
    | `End -> ignore @@ Uutf.encoder `End
    | `Malformed err -> failwith err in
  go ()
  
let () = trans stdin stdout

About `encoding_of_string`

rosetta follows aliases availables into IANA character sets database: https://www.iana.org/assignments/character-sets.xhtml

Others aliases will raise an exception. This function is case-sensitive.

Dependencies (5)

yuscii >= "0.3.0"
uuuu >= "0.2.0"
coin >= "0.1.2"
dune >= "1.4"
ocaml >= "4.03.0"

Dev Dependencies

None

Used by (5)

Conflicts