package orsetto

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type

Unicode character set properties.

Overview

This module provides an interface to the Unicode character set database.

Types
type 'a map = 'a Ucs_ucdgen_aux.map

An alias for the abstract type representing a map of all Unicode code points to the value of its corresponding property.

type 'a index

The property index type. The full Unicode character database is large, and the portion required by the Orsetto Ucs library itself is small, so values of this type provide an abstraction of the relevant portion of the database available to the application.

type utyp = ..

The extensible universal property type.

type utyp +=
  1. | Typ_bool of bool map * bool index
  2. | Typ_int of int map * int index
  3. | Typ_string of string map * string index
  4. | Typ_uchars of Uchar.t list option map * Uchar.t list option index

The core population of the extensible universal property type.

Functions and Constants
val create_index : (string * 'a) list -> 'a index

Use create_index s to compose an index from a sequence of pairs.

val query_map : 'a map -> Uchar.t -> 'a

Use query m c to resolve the value property m for character c.

val search_index : 'a index -> string -> 'a option

Use search_index idx nym to query the index idx for the entry named by nym. Index keys are loosely matched.

val search_property : utyp index -> string -> utyp option

Use search_property idx nym to query the property database index idx for the property named nym. Property names are loosely matched.

val require_property : utyp index -> string -> utyp

Use require_property idx nym to query the property database index idx for the property named nym. Property names are loosedly matched. Raises Not_found if no property named nym is indexed.

type blk = [
  1. | `ASCII
  2. | `Adlam
  3. | `Aegean_Numbers
  4. | `Ahom
  5. | `Alchemical
  6. | `Alphabetic_PF
  7. | `Anatolian_Hieroglyphs
  8. | `Ancient_Greek_Music
  9. | `Ancient_Greek_Numbers
  10. | `Ancient_Symbols
  11. | `Arabic
  12. | `Arabic_Ext_A
  13. | `Arabic_Ext_B
  14. | `Arabic_Math
  15. | `Arabic_PF_A
  16. | `Arabic_PF_B
  17. | `Arabic_Sup
  18. | `Armenian
  19. | `Arrows
  20. | `Avestan
  21. | `Balinese
  22. | `Bamum
  23. | `Bamum_Sup
  24. | `Bassa_Vah
  25. | `Batak
  26. | `Bengali
  27. | `Bhaiksuki
  28. | `Block_Elements
  29. | `Bopomofo
  30. | `Bopomofo_Ext
  31. | `Box_Drawing
  32. | `Brahmi
  33. | `Braille
  34. | `Buginese
  35. | `Buhid
  36. | `Byzantine_Music
  37. | `CJK
  38. | `CJK_Compat
  39. | `CJK_Compat_Forms
  40. | `CJK_Compat_Ideographs
  41. | `CJK_Compat_Ideographs_Sup
  42. | `CJK_Ext_A
  43. | `CJK_Ext_B
  44. | `CJK_Ext_C
  45. | `CJK_Ext_D
  46. | `CJK_Ext_E
  47. | `CJK_Ext_F
  48. | `CJK_Ext_G
  49. | `CJK_Radicals_Sup
  50. | `CJK_Strokes
  51. | `CJK_Symbols
  52. | `Carian
  53. | `Caucasian_Albanian
  54. | `Chakma
  55. | `Cham
  56. | `Cherokee
  57. | `Cherokee_Sup
  58. | `Chess_Symbols
  59. | `Chorasmian
  60. | `Compat_Jamo
  61. | `Control_Pictures
  62. | `Coptic
  63. | `Coptic_Epact_Numbers
  64. | `Counting_Rod
  65. | `Cuneiform
  66. | `Cuneiform_Numbers
  67. | `Currency_Symbols
  68. | `Cypriot_Syllabary
  69. | `Cypro_Minoan
  70. | `Cyrillic
  71. | `Cyrillic_Ext_A
  72. | `Cyrillic_Ext_B
  73. | `Cyrillic_Ext_C
  74. | `Cyrillic_Sup
  75. | `Deseret
  76. | `Devanagari
  77. | `Devanagari_Ext
  78. | `Diacriticals
  79. | `Diacriticals_Ext
  80. | `Diacriticals_For_Symbols
  81. | `Diacriticals_Sup
  82. | `Dingbats
  83. | `Dives_Akuru
  84. | `Dogra
  85. | `Domino
  86. | `Duployan
  87. | `Early_Dynastic_Cuneiform
  88. | `Egyptian_Hieroglyphs
  89. | `Egyptian_Hieroglyph_Format_Controls
  90. | `Elbasan
  91. | `Elymaic
  92. | `Emoticons
  93. | `Enclosed_Alphanum
  94. | `Enclosed_Alphanum_Sup
  95. | `Enclosed_CJK
  96. | `Enclosed_Ideographic_Sup
  97. | `Ethiopic
  98. | `Ethiopic_Ext
  99. | `Ethiopic_Ext_A
  100. | `Ethiopic_Ext_B
  101. | `Ethiopic_Sup
  102. | `Geometric_Shapes
  103. | `Geometric_Shapes_Ext
  104. | `Georgian
  105. | `Georgian_Ext
  106. | `Georgian_Sup
  107. | `Glagolitic
  108. | `Glagolitic_Sup
  109. | `Gothic
  110. | `Grantha
  111. | `Greek
  112. | `Greek_Ext
  113. | `Gujarati
  114. | `Gunjala_Gondi
  115. | `Gurmukhi
  116. | `Half_And_Full_Forms
  117. | `Half_Marks
  118. | `Hangul
  119. | `Hanifi_Rohingya
  120. | `Hanunoo
  121. | `Hatran
  122. | `Hebrew
  123. | `High_PU_Surrogates
  124. | `High_Surrogates
  125. | `Hiragana
  126. | `IDC
  127. | `IPA_Ext
  128. | `Ideographic_Symbols
  129. | `Imperial_Aramaic
  130. | `Indic_Number_Forms
  131. | `Indic_Siyaq_Numbers
  132. | `Inscriptional_Pahlavi
  133. | `Inscriptional_Parthian
  134. | `Jamo
  135. | `Jamo_Ext_A
  136. | `Jamo_Ext_B
  137. | `Javanese
  138. | `Kaithi
  139. | `Kana_Ext_A
  140. | `Kana_Ext_B
  141. | `Kana_Sup
  142. | `Kanbun
  143. | `Kangxi
  144. | `Kannada
  145. | `Katakana
  146. | `Katakana_Ext
  147. | `Kayah_Li
  148. | `Kharoshthi
  149. | `Khitan_Small_Script
  150. | `Khmer
  151. | `Khmer_Symbols
  152. | `Khojki
  153. | `Khudawadi
  154. | `Lao
  155. | `Latin_1_Sup
  156. | `Latin_Ext_A
  157. | `Latin_Ext_Additional
  158. | `Latin_Ext_B
  159. | `Latin_Ext_C
  160. | `Latin_Ext_D
  161. | `Latin_Ext_E
  162. | `Latin_Ext_F
  163. | `Latin_Ext_G
  164. | `Lepcha
  165. | `Letterlike_Symbols
  166. | `Limbu
  167. | `Linear_A
  168. | `Linear_B_Ideograms
  169. | `Linear_B_Syllabary
  170. | `Lisu
  171. | `Lisu_Sup
  172. | `Low_Surrogates
  173. | `Lycian
  174. | `Lydian
  175. | `Mahajani
  176. | `Mahjong
  177. | `Makasar
  178. | `Malayalam
  179. | `Mandaic
  180. | `Manichaean
  181. | `Marchen
  182. | `Masaram_Gondi
  183. | `Math_Alphanum
  184. | `Math_Operators
  185. | `Mayan_Numerals
  186. | `Medefaidrin
  187. | `Meetei_Mayek
  188. | `Meetei_Mayek_Ext
  189. | `Mende_Kikakui
  190. | `Meroitic_Cursive
  191. | `Meroitic_Hieroglyphs
  192. | `Miao
  193. | `Misc_Arrows
  194. | `Misc_Math_Symbols_A
  195. | `Misc_Math_Symbols_B
  196. | `Misc_Pictographs
  197. | `Misc_Symbols
  198. | `Misc_Technical
  199. | `Modi
  200. | `Modifier_Letters
  201. | `Modifier_Tone_Letters
  202. | `Mongolian
  203. | `Mongolian_Sup
  204. | `Mro
  205. | `Multani
  206. | `Music
  207. | `Myanmar
  208. | `Myanmar_Ext_A
  209. | `Myanmar_Ext_B
  210. | `NB
  211. | `NKo
  212. | `Nabataean
  213. | `Nandinagari
  214. | `New_Tai_Lue
  215. | `Newa
  216. | `No_Block_Assigned
  217. | `Number_Forms
  218. | `Nushu
  219. | `Nyiakeng_Puachue_Hmong
  220. | `OCR
  221. | `Ogham
  222. | `Ol_Chiki
  223. | `Old_Hungarian
  224. | `Old_Italic
  225. | `Old_North_Arabian
  226. | `Old_Permic
  227. | `Old_Persian
  228. | `Old_Sogdian
  229. | `Old_South_Arabian
  230. | `Old_Turkic
  231. | `Old_Uyghur
  232. | `Oriya
  233. | `Ornamental_Dingbats
  234. | `Osage
  235. | `Osmanya
  236. | `Ottoman_Siyaq_Numbers
  237. | `PUA
  238. | `Pahawh_Hmong
  239. | `Palmyrene
  240. | `Pau_Cin_Hau
  241. | `Phags_Pa
  242. | `Phaistos
  243. | `Phoenician
  244. | `Phonetic_Ext
  245. | `Phonetic_Ext_Sup
  246. | `Playing_Cards
  247. | `Psalter_Pahlavi
  248. | `Punctuation
  249. | `Rejang
  250. | `Rumi
  251. | `Runic
  252. | `Samaritan
  253. | `Saurashtra
  254. | `Sharada
  255. | `Shavian
  256. | `Shorthand_Format_Controls
  257. | `Siddham
  258. | `Sinhala
  259. | `Sinhala_Archaic_Numbers
  260. | `Small_Forms
  261. | `Small_Kana_Ext
  262. | `Sogdian
  263. | `Sora_Sompeng
  264. | `Soyombo
  265. | `Specials
  266. | `Sundanese
  267. | `Sundanese_Sup
  268. | `Sup_Arrows_A
  269. | `Sup_Arrows_B
  270. | `Sup_Arrows_C
  271. | `Sup_Math_Operators
  272. | `Sup_PUA_A
  273. | `Sup_PUA_B
  274. | `Sup_Punctuation
  275. | `Sup_Symbols_And_Pictographs
  276. | `Super_And_Sub
  277. | `Sutton_SignWriting
  278. | `Syloti_Nagri
  279. | `Symbols_And_Pictographs_Ext_A
  280. | `Symbols_For_Legacy_Computing
  281. | `Syriac
  282. | `Syriac_Sup
  283. | `Tagalog
  284. | `Tagbanwa
  285. | `Tags
  286. | `Tai_Le
  287. | `Tai_Tham
  288. | `Tai_Viet
  289. | `Tai_Xuan_Jing
  290. | `Takri
  291. | `Tamil
  292. | `Tamil_Sup
  293. | `Tangsa
  294. | `Tangut
  295. | `Tangut_Components
  296. | `Tangut_Sup
  297. | `Telugu
  298. | `Thaana
  299. | `Thai
  300. | `Tibetan
  301. | `Tifinagh
  302. | `Tirhuta
  303. | `Toto
  304. | `Transport_And_Map
  305. | `UCAS
  306. | `UCAS_Ext
  307. | `UCAS_Ext_A
  308. | `Ugaritic
  309. | `VS
  310. | `VS_Sup
  311. | `Vai
  312. | `Vedic_Ext
  313. | `Vertical_Forms
  314. | `Vithkuqi
  315. | `Wancho
  316. | `Warang_Citi
  317. | `Yezidi
  318. | `Yi_Radicals
  319. | `Yi_Syllables
  320. | `Yijing
  321. | `Zanabazar_Square
  322. | `Znamenny_Music
]

Unicode code block

val equal_blk : blk -> blk -> bool

Equality

val show_blk : blk -> string

String representation

type utyp +=
  1. | Typ_block of blk map * blk index

Extend the universal type

type gc = [
  1. | `C
  2. | `Cc
  3. | `Cf
  4. | `Cs
  5. | `Co
  6. | `Cn
  7. | `L
  8. | `LC
  9. | `Lu
  10. | `Ll
  11. | `Lt
  12. | `Lm
  13. | `Lo
  14. | `M
  15. | `Mn
  16. | `Mc
  17. | `Me
  18. | `N
  19. | `Nd
  20. | `Nl
  21. | `No
  22. | `P
  23. | `Pc
  24. | `Pd
  25. | `Ps
  26. | `Pe
  27. | `Pi
  28. | `Pf
  29. | `Po
  30. | `S
  31. | `Sm
  32. | `Sc
  33. | `Sk
  34. | `So
  35. | `Z
  36. | `Zs
  37. | `Zl
  38. | `Zp
]

The general category property value type.

val equal_gc : gc -> gc -> bool

Equality

val show_gc : gc -> string

String representation

type utyp +=
  1. | Typ_general_category of gc map * gc index
type qc =
  1. | QC_yes
  2. | QC_no
  3. | QC_maybe

The normalization quick check property type.

val equal_qc : qc -> qc -> bool

Equality

val show_qc : qc -> string

String representation

type utyp +=
  1. | Typ_quick_check of qc map * qc index

Extension of the universal type

type script = [
  1. | `Adlm
  2. | `Aghb
  3. | `Ahom
  4. | `Arab
  5. | `Armi
  6. | `Armn
  7. | `Avst
  8. | `Bali
  9. | `Bamu
  10. | `Bass
  11. | `Batk
  12. | `Beng
  13. | `Bhks
  14. | `Bopo
  15. | `Brah
  16. | `Brai
  17. | `Bugi
  18. | `Buhd
  19. | `Cakm
  20. | `Cans
  21. | `Cari
  22. | `Cham
  23. | `Cher
  24. | `Chrs
  25. | `Copt
  26. | `Cpmn
  27. | `Cprt
  28. | `Cyrl
  29. | `Deva
  30. | `Diak
  31. | `Dogr
  32. | `Dsrt
  33. | `Dupl
  34. | `Egyp
  35. | `Elba
  36. | `Elym
  37. | `Ethi
  38. | `Geor
  39. | `Glag
  40. | `Gong
  41. | `Gonm
  42. | `Goth
  43. | `Gran
  44. | `Grek
  45. | `Gujr
  46. | `Guru
  47. | `Hang
  48. | `Hani
  49. | `Hano
  50. | `Hatr
  51. | `Hebr
  52. | `Hira
  53. | `Hluw
  54. | `Hmng
  55. | `Hmnp
  56. | `Hrkt
  57. | `Hung
  58. | `Ital
  59. | `Java
  60. | `Kali
  61. | `Kana
  62. | `Khar
  63. | `Khmr
  64. | `Khoj
  65. | `Kits
  66. | `Knda
  67. | `Kthi
  68. | `Lana
  69. | `Laoo
  70. | `Latn
  71. | `Lepc
  72. | `Limb
  73. | `Lina
  74. | `Linb
  75. | `Lisu
  76. | `Lyci
  77. | `Lydi
  78. | `Mahj
  79. | `Maka
  80. | `Mand
  81. | `Mani
  82. | `Marc
  83. | `Medf
  84. | `Mend
  85. | `Merc
  86. | `Mero
  87. | `Mlym
  88. | `Modi
  89. | `Mong
  90. | `Mroo
  91. | `Mtei
  92. | `Mult
  93. | `Mymr
  94. | `Nand
  95. | `Narb
  96. | `Nbat
  97. | `Newa
  98. | `Nkoo
  99. | `Nshu
  100. | `Ogam
  101. | `Olck
  102. | `Orkh
  103. | `Orya
  104. | `Osge
  105. | `Osma
  106. | `Ougr
  107. | `Palm
  108. | `Pauc
  109. | `Perm
  110. | `Phag
  111. | `Phli
  112. | `Phlp
  113. | `Phnx
  114. | `Plrd
  115. | `Prti
  116. | `Qaai
  117. | `Rjng
  118. | `Rohg
  119. | `Runr
  120. | `Samr
  121. | `Sarb
  122. | `Saur
  123. | `Sgnw
  124. | `Shaw
  125. | `Shrd
  126. | `Sidd
  127. | `Sind
  128. | `Sinh
  129. | `Sogd
  130. | `Sogo
  131. | `Sora
  132. | `Soyo
  133. | `Sund
  134. | `Sylo
  135. | `Syrc
  136. | `Tagb
  137. | `Takr
  138. | `Tale
  139. | `Talu
  140. | `Taml
  141. | `Tang
  142. | `Tavt
  143. | `Telu
  144. | `Tfng
  145. | `Tglg
  146. | `Thaa
  147. | `Thai
  148. | `Tibt
  149. | `Tirh
  150. | `Tnsa
  151. | `Toto
  152. | `Ugar
  153. | `Vaii
  154. | `Vith
  155. | `Wara
  156. | `Wcho
  157. | `Xpeo
  158. | `Xsux
  159. | `Yezi
  160. | `Yiii
  161. | `Zanb
  162. | `Zinh
  163. | `Zyyy
  164. | `Zzzz
]

Unicode script identifier

val equal_script : script -> script -> bool

Equality

val show_script : script -> string

String representation

type utyp +=
  1. | Typ_script of script map * script index

Extend the universal type.

module Quick : sig ... end

This module contains internal fast-path functions for property query.