package uuuu
Install
dune-project
Dependency
Authors
Maintainers
Sources
sha256=e7e18d93db37e99a1cac08eebe50c88904eae9617dab433c8f787f4da92b1928
sha512=dc84dcd3156b5c1fa414c73ad78ef3c9f8b626bfbc3159f262e3d7d17325f098f28dd55d93f7643cb5b8329fa54f2ca876f442fefe399b92afe0a5430134d66f
doc/README.html
Uuuu
Uhuhuhuhuhuh! uuuu (Universal Unifier to Unicode Un OCaml) is a little library to normalize an ISO-8859 input to Unicode code-point. This library uses tables provided by the Unicode Consortium:
This project takes tables and converts them to OCaml code. Then, it provides a non-blocking best-effort decoder to translate ISO-8859 codepoint to UTF-8 codepoint.
How to use it?
uuuu has an streaming interface. So it should be easy to use it and trick on it. uuuu has a simple goal, offer a general way to decode an ISO-8859 input and normalize it to unicode codepoints. We need to be able to control memory-consumption and ensure to offer a non-blocking computation. Finally, an error should not stop the process of the decoding.
This is a little example with uutf to translate a latin1 to UTF-8:
let trans ic oc =
let decoder = Uuuu.decoder (Uuuu.encoding_of_string "latin1") (`Channel ic) in
let encoder = Uutf.encoder `UTF_8 (`Channel oc) in
let rec go () = match Uuuu.decode decoder with
| `Await -> assert false (* XXX(dinosaure): impossible when you use `String of `Channel as source. *)
| `Uchar _ as uchar -> ignore @@ Uutf.encode encoder uchar ; go ()
| `End -> ignore @@ Uutf.encoder `End
| `Malformed err -> failwith err in
go ()
let () = trans stdin stdoutAbout encoding_of_string
uuuu follows aliases availables into IANA character sets database: https://www.iana.org/assignments/character-sets.xhtml
Others aliases will raise an exception. This function is case-sensitive.
A larger decoder
uuuu is a part of a biggest project rosetta which is a decoder for some others encodings. If you want to handle more encodings than ISO-8859, you should look into this higher library.