package uuuu

  1. Overview
  2. Docs
Mapper of ISO-8859-* to Unicode

Install

dune-project
 Dependency

Authors

Maintainers

Sources

uuuu-0.4.0.tbz
sha256=e7e18d93db37e99a1cac08eebe50c88904eae9617dab433c8f787f4da92b1928
sha512=dc84dcd3156b5c1fa414c73ad78ef3c9f8b626bfbc3159f262e3d7d17325f098f28dd55d93f7643cb5b8329fa54f2ca876f442fefe399b92afe0a5430134d66f

Description

A simple mapper between ISO-8859-* to Unicode. Useful for a translation between ISO-8859-* and Unicode

Published: 28 Nov 2025

README

Uuuu

Uhuhuhuhuhuh! uuuu (Universal Unifier to Unicode Un OCaml) is a little library to normalize an ISO-8859 input to Unicode code-point. This library uses tables provided by the Unicode Consortium:

Unicode table

This project takes tables and converts them to OCaml code. Then, it provides a non-blocking best-effort decoder to translate ISO-8859 codepoint to UTF-8 codepoint.

How to use it?

uuuu has an streaming interface. So it should be easy to use it and trick on it. uuuu has a simple goal, offer a general way to decode an ISO-8859 input and normalize it to unicode codepoints. We need to be able to control memory-consumption and ensure to offer a non-blocking computation. Finally, an error should not stop the process of the decoding.

This is a little example with uutf to translate a latin1 to UTF-8:

let trans ic oc =
  let decoder = Uuuu.decoder (Uuuu.encoding_of_string "latin1") (`Channel ic) in
  let encoder = Uutf.encoder `UTF_8 (`Channel oc) in
  let rec go () = match Uuuu.decode decoder with
    | `Await -> assert false (* XXX(dinosaure): impossible when you use `String of `Channel as source. *)
    | `Uchar _ as uchar -> ignore @@ Uutf.encode encoder uchar ; go ()
    | `End -> ignore @@ Uutf.encoder `End
    | `Malformed err -> failwith err in
  go ()
  
let () = trans stdin stdout

About encoding_of_string

uuuu follows aliases availables into IANA character sets database: https://www.iana.org/assignments/character-sets.xhtml

Others aliases will raise an exception. This function is case-sensitive.

A larger decoder

uuuu is a part of a biggest project rosetta which is a decoder for some others encodings. If you want to handle more encodings than ISO-8859, you should look into this higher library.

Dependencies (2)

  1. dune
  2. ocaml >= "4.06.0"

Dev Dependencies

None

Used by (1)

  1. rosetta

Conflicts

None