Page
Library
Module
Module type
Parameter
Class
Class type
Source
UucdUnicode character database decoder.
Uucd decodes the data of the Unicode character database from its XML representation. It provides high-level (but not necessarily efficient) access to the data so that efficient representations can be extracted.
Uucd decodes the representation described in the Annex #42 of Unicode 12.0.0. Subsequent versions may be decoded as long as no new cases are introduced in parsed enumerated properties.
Consult the basics.
Note. All strings returned by the module are UTF-8 encoded.
Release v12.0.0 — Unicode version 12.0.0 — homepage
The type for Unicode code points, ranges from 0x0000 to 0x10_FFFF.
is_cp n is true iff n a Unicode code point.
is_scalar_value n is true iff n is a Unicode scalar value.
Properties are referenced by their name and property values by their abbreviated name. To understand their semantics refer to the standard.
val unknown_prop : (string * string) -> string propunknown_prop (ns, n) is a property read from an XML attribute whose expanded name is (ns, n). This can be used to access a property unknown to the module.
In alphabetical order.
val age : [ `Version of int * int | `Unassigned ] propval alphabetic : bool propval ascii_hex_digit : bool propval bidi_class :
[ `AL
| `AN
| `B
| `BN
| `CS
| `EN
| `ES
| `ET
| `FSI
| `L
| `LRE
| `LRI
| `LRO
| `NSM
| `ON
| `PDF
| `PDI
| `R
| `RLE
| `RLI
| `RLO
| `S
| `WS ]
propval bidi_control : bool propval bidi_mirrored : bool propval bidi_paired_bracket_type : [ `O | `C | `N ] propval block :
[ `ASCII
| `Adlam
| `Aegean_Numbers
| `Ahom
| `Alchemical
| `Alphabetic_PF
| `Anatolian_Hieroglyphs
| `Ancient_Greek_Music
| `Ancient_Greek_Numbers
| `Ancient_Symbols
| `Arabic
| `Arabic_Ext_A
| `Arabic_Math
| `Arabic_PF_A
| `Arabic_PF_B
| `Arabic_Sup
| `Armenian
| `Arrows
| `Avestan
| `Balinese
| `Bamum
| `Bamum_Sup
| `Bassa_Vah
| `Batak
| `Bengali
| `Bhaiksuki
| `Block_Elements
| `Bopomofo
| `Bopomofo_Ext
| `Box_Drawing
| `Brahmi
| `Braille
| `Buginese
| `Buhid
| `Byzantine_Music
| `CJK
| `CJK_Compat
| `CJK_Compat_Forms
| `CJK_Compat_Ideographs
| `CJK_Compat_Ideographs_Sup
| `CJK_Ext_A
| `CJK_Ext_B
| `CJK_Ext_C
| `CJK_Ext_D
| `CJK_Ext_E
| `CJK_Ext_F
| `CJK_Radicals_Sup
| `CJK_Strokes
| `CJK_Symbols
| `Carian
| `Caucasian_Albanian
| `Chakma
| `Cham
| `Cherokee
| `Cherokee_Sup
| `Chess_Symbols
| `Compat_Jamo
| `Control_Pictures
| `Coptic
| `Coptic_Epact_Numbers
| `Counting_Rod
| `Cuneiform
| `Cuneiform_Numbers
| `Currency_Symbols
| `Cypriot_Syllabary
| `Cyrillic
| `Cyrillic_Ext_A
| `Cyrillic_Ext_B
| `Cyrillic_Ext_C
| `Cyrillic_Sup
| `Deseret
| `Devanagari
| `Devanagari_Ext
| `Diacriticals
| `Diacriticals_Ext
| `Diacriticals_For_Symbols
| `Diacriticals_Sup
| `Dingbats
| `Dogra
| `Domino
| `Duployan
| `Early_Dynastic_Cuneiform
| `Egyptian_Hieroglyph_Format_Controls
| `Egyptian_Hieroglyphs
| `Elbasan
| `Elymaic
| `Emoticons
| `Enclosed_Alphanum
| `Enclosed_Alphanum_Sup
| `Enclosed_CJK
| `Enclosed_Ideographic_Sup
| `Ethiopic
| `Ethiopic_Ext
| `Ethiopic_Ext_A
| `Ethiopic_Sup
| `Geometric_Shapes
| `Geometric_Shapes_Ext
| `Georgian
| `Georgian_Ext
| `Georgian_Sup
| `Glagolitic
| `Glagolitic_Sup
| `Gothic
| `Grantha
| `Greek
| `Greek_Ext
| `Gujarati
| `Gunjala_Gondi
| `Gurmukhi
| `Half_And_Full_Forms
| `Half_Marks
| `Hangul
| `Hanifi_Rohingya
| `Hanunoo
| `Hatran
| `Hebrew
| `High_PU_Surrogates
| `High_Surrogates
| `Hiragana
| `IDC
| `IPA_Ext
| `Ideographic_Symbols
| `Imperial_Aramaic
| `Indic_Number_Forms
| `Indic_Siyaq_Numbers
| `Inscriptional_Pahlavi
| `Inscriptional_Parthian
| `Jamo
| `Jamo_Ext_A
| `Jamo_Ext_B
| `Javanese
| `Kaithi
| `Kana_Ext_A
| `Kana_Sup
| `Kanbun
| `Kangxi
| `Kannada
| `Katakana
| `Katakana_Ext
| `Kayah_Li
| `Kharoshthi
| `Khmer
| `Khmer_Symbols
| `Khojki
| `Khudawadi
| `Lao
| `Latin_1_Sup
| `Latin_Ext_A
| `Latin_Ext_Additional
| `Latin_Ext_B
| `Latin_Ext_C
| `Latin_Ext_D
| `Latin_Ext_E
| `Lepcha
| `Letterlike_Symbols
| `Limbu
| `Linear_A
| `Linear_B_Ideograms
| `Linear_B_Syllabary
| `Lisu
| `Low_Surrogates
| `Lycian
| `Lydian
| `Mahajani
| `Mahjong
| `Makasar
| `Malayalam
| `Mandaic
| `Manichaean
| `Marchen
| `Masaram_Gondi
| `Math_Alphanum
| `Math_Operators
| `Mayan_Numerals
| `Medefaidrin
| `Meetei_Mayek
| `Meetei_Mayek_Ext
| `Mende_Kikakui
| `Meroitic_Cursive
| `Meroitic_Hieroglyphs
| `Miao
| `Misc_Arrows
| `Misc_Math_Symbols_A
| `Misc_Math_Symbols_B
| `Misc_Pictographs
| `Misc_Symbols
| `Misc_Technical
| `Modi
| `Modifier_Letters
| `Modifier_Tone_Letters
| `Mongolian
| `Mongolian_Sup
| `Mro
| `Multani
| `Music
| `Myanmar
| `Myanmar_Ext_A
| `Myanmar_Ext_B
| `NB
| `NKo
| `Nabataean
| `Nandinagari
| `New_Tai_Lue
| `Newa
| `Number_Forms
| `Nushu
| `Nyiakeng_Puachue_Hmong
| `OCR
| `Ogham
| `Ol_Chiki
| `Old_Hungarian
| `Old_Italic
| `Old_North_Arabian
| `Old_Permic
| `Old_Persian
| `Old_Sogdian
| `Old_South_Arabian
| `Old_Turkic
| `Oriya
| `Ornamental_Dingbats
| `Osage
| `Osmanya
| `Ottoman_Siyaq_Numbers
| `PUA
| `Pahawh_Hmong
| `Palmyrene
| `Pau_Cin_Hau
| `Phags_Pa
| `Phaistos
| `Phoenician
| `Phonetic_Ext
| `Phonetic_Ext_Sup
| `Playing_Cards
| `Psalter_Pahlavi
| `Punctuation
| `Rejang
| `Rumi
| `Runic
| `Samaritan
| `Saurashtra
| `Sharada
| `Shavian
| `Shorthand_Format_Controls
| `Siddham
| `Sinhala
| `Sinhala_Archaic_Numbers
| `Small_Forms
| `Small_Kana_Ext
| `Sogdian
| `Sora_Sompeng
| `Soyombo
| `Specials
| `Sundanese
| `Sundanese_Sup
| `Sup_Arrows_A
| `Sup_Arrows_B
| `Sup_Arrows_C
| `Sup_Math_Operators
| `Sup_PUA_A
| `Sup_PUA_B
| `Sup_Punctuation
| `Sup_Symbols_And_Pictographs
| `Super_And_Sub
| `Sutton_SignWriting
| `Syloti_Nagri
| `Symbols_And_Pictographs_Ext_A
| `Syriac
| `Syriac_Sup
| `Tagalog
| `Tagbanwa
| `Tags
| `Tai_Le
| `Tai_Tham
| `Tai_Viet
| `Tai_Xuan_Jing
| `Takri
| `Tamil
| `Tamil_Sup
| `Tangut
| `Tangut_Components
| `Telugu
| `Thaana
| `Thai
| `Tibetan
| `Tifinagh
| `Tirhuta
| `Transport_And_Map
| `UCAS
| `UCAS_Ext
| `Ugaritic
| `VS
| `VS_Sup
| `Vai
| `Vedic_Ext
| `Vertical_Forms
| `Wancho
| `Warang_Citi
| `Yi_Radicals
| `Yi_Syllables
| `Yijing
| `Zanabazar_Square ]
propval canonical_combining_class : int propval cased : bool propval case_ignorable : bool propval changes_when_casefolded : bool propval changes_when_casemapped : bool propval changes_when_lowercased : bool propval changes_when_nfkc_casefolded : bool propval changes_when_titlecased : bool propval changes_when_uppercased : bool propval composition_exclusion : bool propval dash : bool propval decomposition_type :
[ `Can
| `Com
| `Enc
| `Fin
| `Font
| `Fra
| `Init
| `Iso
| `Med
| `Nar
| `Nb
| `Sml
| `Sqr
| `Sub
| `Sup
| `Vert
| `Wide
| `None ]
propval default_ignorable_code_point : bool propval deprecated : bool propval diacritic : bool propval east_asian_width : [ `A | `F | `H | `N | `Na | `W ] propval expands_on_nfc : bool propval expands_on_nfd : bool propval expands_on_nfkc : bool propval expands_on_nfkd : bool propval extender : bool propval full_composition_exclusion : bool propval general_category :
[ `Lu
| `Ll
| `Lt
| `Lm
| `Lo
| `Mn
| `Mc
| `Me
| `Nd
| `Nl
| `No
| `Pc
| `Pd
| `Ps
| `Pe
| `Pi
| `Pf
| `Po
| `Sm
| `Sc
| `Sk
| `So
| `Zs
| `Zl
| `Zp
| `Cc
| `Cf
| `Cs
| `Co
| `Cn ]
propval grapheme_base : bool propval grapheme_cluster_break :
[ `CN
| `CR
| `EB
| `EBG
| `EM
| `EX
| `GAZ
| `L
| `LF
| `LV
| `LVT
| `PP
| `RI
| `SM
| `T
| `V
| `XX
| `ZWJ ]
propval grapheme_extend : bool propval grapheme_link : bool propval hangul_syllable_type : [ `L | `LV | `LVT | `T | `V | `NA ] propval hex_digit : bool propval hyphen : bool propval id_continue : bool propval id_start : bool propval ideographic : bool propval ids_binary_operator : bool propval ids_trinary_operator : bool propval indic_syllabic_category :
[ `Avagraha
| `Bindu
| `Brahmi_Joining_Number
| `Cantillation_Mark
| `Consonant
| `Consonant_Dead
| `Consonant_Final
| `Consonant_Head_Letter
| `Consonant_Initial_Postfixed
| `Consonant_Killer
| `Consonant_Medial
| `Consonant_Placeholder
| `Consonant_Preceding_Repha
| `Consonant_Prefixed
| `Consonant_Repha
| `Consonant_Subjoined
| `Consonant_Succeeding_Repha
| `Consonant_With_Stacker
| `Gemination_Mark
| `Invisible_Stacker
| `Joiner
| `Modifying_Letter
| `Non_Joiner
| `Nukta
| `Number
| `Number_Joiner
| `Other
| `Pure_Killer
| `Register_Shifter
| `Syllable_Modifier
| `Tone_Letter
| `Tone_Mark
| `Virama
| `Visarga
| `Vowel
| `Vowel_Dependent
| `Vowel_Independent ]
propval indic_matra_category :
[ `Right
| `Left
| `Visual_Order_Left
| `Left_And_Right
| `Top
| `Bottom
| `Top_And_Bottom
| `Top_And_Right
| `Top_And_Left
| `Top_And_Left_And_Right
| `Bottom_And_Right
| `Top_And_Bottom_And_Right
| `Overstruck
| `Invisible
| `NA ]
propval indic_positional_category :
[ `Bottom
| `Bottom_And_Right
| `Left
| `Left_And_Right
| `NA
| `Overstruck
| `Right
| `Top
| `Top_And_Bottom
| `Top_And_Bottom_And_Right
| `Top_And_Left
| `Top_And_Left_And_Right
| `Top_And_Right
| `Visual_Order_Left ]
propval iso_comment : string propval jamo_short_name : string propval join_control : bool propval joining_group :
[ `African_Feh
| `African_Noon
| `African_Qaf
| `Ain
| `Alaph
| `Alef
| `Alef_Maqsurah
| `Beh
| `Beth
| `Burushaski_Yeh_Barree
| `Dal
| `Dalath_Rish
| `E
| `Farsi_Yeh
| `Fe
| `Feh
| `Final_Semkath
| `Gaf
| `Gamal
| `Hah
| `Hanifi_Rohingya_Kinna_Ya
| `Hanifi_Rohingya_Pa
| `Hamza_On_Heh_Goal
| `He
| `Heh
| `Heh_Goal
| `Heth
| `Kaf
| `Kaph
| `Khaph
| `Knotted_Heh
| `Lam
| `Lamadh
| `Malayalam_Bha
| `Malayalam_Ja
| `Malayalam_Lla
| `Malayalam_Llla
| `Malayalam_Nga
| `Malayalam_Nna
| `Malayalam_Nnna
| `Malayalam_Nya
| `Malayalam_Ra
| `Malayalam_Ssa
| `Malayalam_Tta
| `Manichaean_Aleph
| `Manichaean_Ayin
| `Manichaean_Beth
| `Manichaean_Daleth
| `Manichaean_Dhamedh
| `Manichaean_Five
| `Manichaean_Gimel
| `Manichaean_Heth
| `Manichaean_Hundred
| `Manichaean_Kaph
| `Manichaean_Lamedh
| `Manichaean_Mem
| `Manichaean_Nun
| `Manichaean_One
| `Manichaean_Pe
| `Manichaean_Qoph
| `Manichaean_Resh
| `Manichaean_Sadhe
| `Manichaean_Samekh
| `Manichaean_Taw
| `Manichaean_Ten
| `Manichaean_Teth
| `Manichaean_Thamedh
| `Manichaean_Twenty
| `Manichaean_Waw
| `Manichaean_Yodh
| `Manichaean_Zayin
| `Meem
| `Mim
| `No_Joining_Group
| `Noon
| `Nun
| `Nya
| `Pe
| `Qaf
| `Qaph
| `Reh
| `Reversed_Pe
| `Rohingya_Yeh
| `Sad
| `Sadhe
| `Seen
| `Semkath
| `Shin
| `Straight_Waw
| `Swash_Kaf
| `Syriac_Waw
| `Tah
| `Taw
| `Teh_Marbuta
| `Teh_Marbuta_Goal
| `Teth
| `Waw
| `Yeh
| `Yeh_Barree
| `Yeh_With_Tail
| `Yudh
| `Yudh_He
| `Zain
| `Zhain ]
propval joining_type : [ `U | `C | `T | `D | `L | `R ] propval line_break :
[ `AI
| `AL
| `B2
| `BA
| `BB
| `BK
| `CB
| `CJ
| `CL
| `CM
| `CP
| `CR
| `EX
| `GL
| `H2
| `H3
| `HL
| `HY
| `ID
| `IN
| `IS
| `JL
| `JT
| `JV
| `LF
| `NL
| `NS
| `NU
| `OP
| `PO
| `PR
| `QU
| `RI
| `SA
| `SG
| `SP
| `SY
| `WJ
| `XX
| `ZW
| `EB
| `EM
| `ZWJ ]
propval logical_order_exception : bool propval lowercase : bool propval math : bool propval name : [ `Pattern of string | `Name of string ] propIn the `Pattern case occurences of the character '#' (U+0023) in the string must be replaced by the value of the code point as four to six uppercase hexadecimal digits (the minimal needed). E.g. the pattern "CJK UNIFIED IDEOGRAPH-#" associated to code point U+3400 gives the name "CJK UNIFIED IDEOGRAPH-3400".
val name_alias :
(string * [ `Abbreviation | `Alternate | `Control | `Correction | `Figment ])
list
propval nfc_quick_check : [ `True | `False | `Maybe ] propval nfd_quick_check : [ `True | `False | `Maybe ] propval nfkc_quick_check : [ `True | `False | `Maybe ] propval nfkd_quick_check : [ `True | `False | `Maybe ] propval noncharacter_code_point : bool propval numeric_type : [ `None | `De | `Di | `Nu ] propval numeric_value : [ `NaN | `Frac of int * int | `Num of int64 ] propval other_alphabetic : bool propval other_default_ignorable_code_point : bool propval other_grapheme_extend : bool propval other_id_continue : bool propval other_id_start : bool propval other_lowercase : bool propval other_math : bool propval other_uppercase : bool propval pattern_syntax : bool propval pattern_white_space : bool propval prepended_concatenation_mark : bool propval quotation_mark : bool propval radical : bool propval regional_indicator : bool proptype script = [ | `Adlm| `Aghb| `Ahom| `Arab| `Armi| `Armn| `Avst| `Bali| `Bamu| `Bass| `Batk| `Beng| `Bhks| `Bopo| `Brah| `Brai| `Bugi| `Buhd| `Cakm| `Cans| `Cari| `Cham| `Cher| `Copt| `Cprt| `Cyrl| `Deva| `Dogr| `Dsrt| `Dupl| `Egyp| `Elba| `Elym| `Ethi| `Geor| `Glag| `Gong| `Gonm| `Goth| `Gran| `Grek| `Gujr| `Guru| `Hang| `Hani| `Hano| `Hatr| `Hebr| `Hira| `Hluw| `Hmng| `Hmnp| `Hrkt| `Hung| `Ital| `Java| `Kali| `Kana| `Khar| `Khmr| `Khoj| `Knda| `Kthi| `Lana| `Laoo| `Latn| `Lepc| `Limb| `Lina| `Linb| `Lisu| `Lyci| `Lydi| `Mahj| `Maka| `Mand| `Mani| `Marc| `Medf| `Mend| `Merc| `Mero| `Mlym| `Modi| `Mong| `Mroo| `Mtei| `Mult| `Mymr| `Nand| `Narb| `Nbat| `Newa| `Nkoo| `Nshu| `Ogam| `Olck| `Orkh| `Orya| `Osge| `Osma| `Palm| `Pauc| `Perm| `Phag| `Phli| `Phlp| `Phnx| `Plrd| `Prti| `Qaai| `Rjng| `Rohg| `Runr| `Samr| `Sarb| `Saur| `Sgnw| `Shaw| `Shrd| `Sidd| `Sind| `Sinh| `Sogd| `Sogo| `Sora| `Soyo| `Sund| `Sylo| `Syrc| `Tagb| `Takr| `Tale| `Talu| `Taml| `Tang| `Tavt| `Telu| `Tfng| `Tglg| `Thaa| `Thai| `Tibt| `Tirh| `Ugar| `Vaii| `Wara| `Wcho| `Xpeo| `Xsux| `Yiii| `Zanb| `Zinh| `Zyyy| `Zzzz ]val sentence_break :
[ `AT
| `CL
| `CR
| `EX
| `FO
| `LE
| `LF
| `LO
| `NU
| `SC
| `SE
| `SP
| `ST
| `UP
| `XX ]
propval soft_dotted : bool propval sterm : bool propval terminal_punctuation : bool propval uax_42_element : [ `Reserved | `Noncharacter | `Surrogate | `Char ] propNot normative, artefact of Uucd. Corresponds to the XML element name that describes the code point.
val unicode_1_name : string propval unified_ideograph : bool propval uppercase : bool propval variation_selector : bool propval vertical_orientation : [ `U | `R | `Tu | `Tr ] propval white_space : bool propval word_break :
[ `CR
| `DQ
| `EB
| `EBG
| `EM
| `EX
| `Extend
| `FO
| `GAZ
| `HL
| `KA
| `LE
| `LF
| `MB
| `ML
| `MN
| `NL
| `NU
| `RI
| `SQ
| `WSegSpace
| `XX
| `ZWJ ]
propval xid_continue : bool propval xid_start : bool propIn alphabetic order. For now unihan properties are always represented as strings.
val kAccountingNumeric : string propval kAlternateHanYu : string propval kAlternateJEF : string propval kAlternateKangXi : string propval kAlternateMorohashi : string propval kBigFive : string propval kCCCII : string propval kCNS1986 : string propval kCNS1992 : string propval kCangjie : string propval kCantonese : string propval kCheungBauer : string propval kCheungBauerIndex : string propval kCihaiT : string propval kCompatibilityVariant : string propval kCowles : string propval kDaeJaweon : string propval kDefinition : string propval kEACC : string propval kFenn : string propval kFennIndex : string propval kFourCornerCode : string propval kFrequency : string propval kGB0 : string propval kGB1 : string propval kGB3 : string propval kGB5 : string propval kGB7 : string propval kGB8 : string propval kGSR : string propval kGradeLevel : string propval kHDZRadBreak : string propval kHKGlyph : string propval kHKSCS : string propval kHanYu : string propval kHangul : string propval kHanyuPinlu : string propval kHanyuPinyin : string propval kIBMJapan : string propval kIICore : string propval kIRGDaeJaweon : string propval kIRGDaiKanwaZiten : string propval kIRGHanyuDaZidian : string propval kIRGKangXi : string propval kIRG_GSource : string propval kIRG_HSource : string propval kIRG_JSource : string propval kIRG_KPSource : string propval kIRG_KSource : string propval kIRG_MSource : string propval kIRG_TSource : string propval kIRG_USource : string propval kIRG_VSource : string propval kJHJ : string propval kJIS0213 : string propval kJa : string propval kJapaneseKun : string propval kJapaneseOn : string propval kJinmeiyoKanji : string propval kJis0 : string propval kJis1 : string propval kJoyoKanji : string propval kKPS0 : string propval kKPS1 : string propval kKSC0 : string propval kKSC1 : string propval kKangXi : string propval kKarlgren : string propval kKorean : string propval kKoreanEducationHanja : string propval kKoreanName : string propval kLau : string propval kMainlandTelegraph : string propval kMandarin : string propval kMatthews : string propval kMeyerWempe : string propval kMorohashi : string propval kNelson : string propval kOtherNumeric : string propval kPhonetic : string propval kPrimaryNumeric : string propval kPseudoGB1 : string propval kRSAdobe_Japan1_6 : string propval kRSJapanese : string propval kRSKanWa : string propval kRSKangXi : string propval kRSKorean : string propval kRSMerged : string propval kRSTUnicode : string propval kRSUnicode : string propval kReading : string propval kSBGY : string propval kSemanticVariant : string propval kSimplifiedVariant : string propval kSpecializedSemanticVariant : string propval kSrc_NushuDuben : string propval kTGH : string propval kTGT_MergedSrc : string propval kTaiwanTelegraph : string propval kTang : string propval kTotalStrokes : string propval kTraditionalVariant : string propval kVietnamese : string propval kWubi : string propval kXHC1983 : string propval kXerox : string propval kZVariant : string proptype named_sequence = string * cp listThe type for named sequences. Sequence name, code point sequence.
The type for normalization corrections. Code point, old normalization, new normalization, version
type standardized_variant =
cp list * string * [ `Isolate | `Initial | `Medial | `Final ] listThe type for standarized variants. Code point sequence, description, when.
The type for CJK radicals. Radical number, CJK radical character, CJK unified ideograph.
type emoji_source = cp list * int option * int option * int optionThe type for emoji sources. Unicode, docomo, kddi, softbank.
type t = {description : string;repertoire : props Cpmap.t;blocks : block list;named_sequences : named_sequence list;provisional_named_sequences : named_sequence list;normalization_corrections : normalization_correction list;standardized_variants : standardized_variant list;cjk_radicals : cjk_radical list;emoji_sources : emoji_source list;}The type for Unicode character databases.
Note. Absence of an optional top-level field in the database is denoted by the neutral element of its type (empty string, empty list, Cpmap.empty). This means that the module doesn't distinguish between absence of a field and presence of the field with empty data (but incurs no problems in this context).
cp_prop ucd cp p is the property p of the code point cp in db's repertoire, if p is in the repertoire and the property exists for cp.
The type for input sources.
decode d decodes a database from d or returns an error.
val decoded_range : decoder -> (int * int) * (int * int)decoded_range d is the range of characters spanning the `Error decoded by d. A pair of line and column numbers respectively one and zero based.
The database and subsets of it for Unicode 12.0.0 are available here. Databases with groups should be preferred, they maximize value sharing and improve parsing performance.
A database is decoded as follows:
let ucd_or_die inf = try
let ic = if inf = "-" then stdin else open_in inf in
let d = Uucd.decoder (`Channel ic) in
match Uucd.decode d with
| `Ok db -> db
| `Error e ->
let (l0, c0), (l1, c1) = Uucd.decoded_range d in
Printf.eprintf "%s:%d.%d-%d.%d: %s\n%!" inf l0 c0 l1 c1 e;
exit 1
with Sys_error e -> Printf.eprintf "%s\n%!" e; exit 1
let ucd = ucd_or_die "/tmp/ucd.all.grouped.xml"The convenience function cp_prop can be used to query the property of a given code point. For example the general category of U+1F42B is given by:
let u_1F42B_gc = Uucd.cp_prop ucd 0x1F42B Uucd.general_category