package orsetto

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type

Unicode character set properties.

Overview

This module provides an interface to the Unicode character set database.

Types
type 'a map = 'a Ucs_ucdgen_aux.map

An alias for the abstract type representing a map of all Unicode code points to the value of its corresponding property.

type 'a index

The property index type. The full Unicode character database is large, and the portion required by the Orsetto Ucs library itself is small, so values of this type provide an abstraction of the relevant portion of the database available to the application.

type utyp = ..

The extensible universal property type.

type utyp +=
  1. | Typ_bool of bool map * bool index
  2. | Typ_int of int map * int index
  3. | Typ_string of string map * string index
  4. | Typ_uchars of Uchar.t list option map * Uchar.t list option index

The core population of the extensible universal property type.

Functions and Constants
val create_index : (string * 'a) list -> 'a index

Use create_index s to compose an index from a sequence of pairs.

val query_map : 'a map -> Uchar.t -> 'a

Use query m c to resolve the value property m for character c.

val search_index : 'a index -> string -> 'a option

Use search_index idx nym to query the index idx for the entry named by nym. Index keys are loosely matched.

val search_property : utyp index -> string -> utyp option

Use search_property idx nym to query the property database index idx for the property named nym. Property names are loosely matched.

val require_property : utyp index -> string -> utyp

Use require_property idx nym to query the property database index idx for the property named nym. Property names are loosedly matched. Raises Not_found if no property named nym is indexed.

type blk = [
  1. | `ASCII
  2. | `Adlam
  3. | `Aegean_Numbers
  4. | `Ahom
  5. | `Alchemical
  6. | `Alphabetic_PF
  7. | `Anatolian_Hieroglyphs
  8. | `Ancient_Greek_Music
  9. | `Ancient_Greek_Numbers
  10. | `Ancient_Symbols
  11. | `Arabic
  12. | `Arabic_Ext_A
  13. | `Arabic_Math
  14. | `Arabic_PF_A
  15. | `Arabic_PF_B
  16. | `Arabic_Sup
  17. | `Armenian
  18. | `Arrows
  19. | `Avestan
  20. | `Balinese
  21. | `Bamum
  22. | `Bamum_Sup
  23. | `Bassa_Vah
  24. | `Batak
  25. | `Bengali
  26. | `Bhaiksuki
  27. | `Block_Elements
  28. | `Bopomofo
  29. | `Bopomofo_Ext
  30. | `Box_Drawing
  31. | `Brahmi
  32. | `Braille
  33. | `Buginese
  34. | `Buhid
  35. | `Byzantine_Music
  36. | `CJK
  37. | `CJK_Compat
  38. | `CJK_Compat_Forms
  39. | `CJK_Compat_Ideographs
  40. | `CJK_Compat_Ideographs_Sup
  41. | `CJK_Ext_A
  42. | `CJK_Ext_B
  43. | `CJK_Ext_C
  44. | `CJK_Ext_D
  45. | `CJK_Ext_E
  46. | `CJK_Ext_F
  47. | `CJK_Radicals_Sup
  48. | `CJK_Strokes
  49. | `CJK_Symbols
  50. | `Carian
  51. | `Caucasian_Albanian
  52. | `Chakma
  53. | `Cham
  54. | `Cherokee
  55. | `Cherokee_Sup
  56. | `Chess_Symbols
  57. | `Compat_Jamo
  58. | `Control_Pictures
  59. | `Coptic
  60. | `Coptic_Epact_Numbers
  61. | `Counting_Rod
  62. | `Cuneiform
  63. | `Cuneiform_Numbers
  64. | `Currency_Symbols
  65. | `Cypriot_Syllabary
  66. | `Cyrillic
  67. | `Cyrillic_Ext_A
  68. | `Cyrillic_Ext_B
  69. | `Cyrillic_Ext_C
  70. | `Cyrillic_Sup
  71. | `Deseret
  72. | `Devanagari
  73. | `Devanagari_Ext
  74. | `Diacriticals
  75. | `Diacriticals_Ext
  76. | `Diacriticals_For_Symbols
  77. | `Diacriticals_Sup
  78. | `Dingbats
  79. | `Dogra
  80. | `Domino
  81. | `Duployan
  82. | `Early_Dynastic_Cuneiform
  83. | `Egyptian_Hieroglyphs
  84. | `Egyptian_Hieroglyph_Format_Controls
  85. | `Elbasan
  86. | `Elymaic
  87. | `Emoticons
  88. | `Enclosed_Alphanum
  89. | `Enclosed_Alphanum_Sup
  90. | `Enclosed_CJK
  91. | `Enclosed_Ideographic_Sup
  92. | `Ethiopic
  93. | `Ethiopic_Ext
  94. | `Ethiopic_Ext_A
  95. | `Ethiopic_Sup
  96. | `Geometric_Shapes
  97. | `Geometric_Shapes_Ext
  98. | `Georgian
  99. | `Georgian_Ext
  100. | `Georgian_Sup
  101. | `Glagolitic
  102. | `Glagolitic_Sup
  103. | `Gothic
  104. | `Grantha
  105. | `Greek
  106. | `Greek_Ext
  107. | `Gujarati
  108. | `Gunjala_Gondi
  109. | `Gurmukhi
  110. | `Half_And_Full_Forms
  111. | `Half_Marks
  112. | `Hangul
  113. | `Hanifi_Rohingya
  114. | `Hanunoo
  115. | `Hatran
  116. | `Hebrew
  117. | `High_PU_Surrogates
  118. | `High_Surrogates
  119. | `Hiragana
  120. | `IDC
  121. | `IPA_Ext
  122. | `Ideographic_Symbols
  123. | `Imperial_Aramaic
  124. | `Indic_Number_Forms
  125. | `Indic_Siyaq_Numbers
  126. | `Inscriptional_Pahlavi
  127. | `Inscriptional_Parthian
  128. | `Jamo
  129. | `Jamo_Ext_A
  130. | `Jamo_Ext_B
  131. | `Javanese
  132. | `Kaithi
  133. | `Kana_Ext_A
  134. | `Kana_Sup
  135. | `Kanbun
  136. | `Kangxi
  137. | `Kannada
  138. | `Katakana
  139. | `Katakana_Ext
  140. | `Kayah_Li
  141. | `Kharoshthi
  142. | `Khmer
  143. | `Khmer_Symbols
  144. | `Khojki
  145. | `Khudawadi
  146. | `Lao
  147. | `Latin_1_Sup
  148. | `Latin_Ext_A
  149. | `Latin_Ext_Additional
  150. | `Latin_Ext_B
  151. | `Latin_Ext_C
  152. | `Latin_Ext_D
  153. | `Latin_Ext_E
  154. | `Lepcha
  155. | `Letterlike_Symbols
  156. | `Limbu
  157. | `Linear_A
  158. | `Linear_B_Ideograms
  159. | `Linear_B_Syllabary
  160. | `Lisu
  161. | `Low_Surrogates
  162. | `Lycian
  163. | `Lydian
  164. | `Mahajani
  165. | `Mahjong
  166. | `Makasar
  167. | `Malayalam
  168. | `Mandaic
  169. | `Manichaean
  170. | `Marchen
  171. | `Masaram_Gondi
  172. | `Math_Alphanum
  173. | `Math_Operators
  174. | `Mayan_Numerals
  175. | `Medefaidrin
  176. | `Meetei_Mayek
  177. | `Meetei_Mayek_Ext
  178. | `Mende_Kikakui
  179. | `Meroitic_Cursive
  180. | `Meroitic_Hieroglyphs
  181. | `Miao
  182. | `Misc_Arrows
  183. | `Misc_Math_Symbols_A
  184. | `Misc_Math_Symbols_B
  185. | `Misc_Pictographs
  186. | `Misc_Symbols
  187. | `Misc_Technical
  188. | `Modi
  189. | `Modifier_Letters
  190. | `Modifier_Tone_Letters
  191. | `Mongolian
  192. | `Mongolian_Sup
  193. | `Mro
  194. | `Multani
  195. | `Music
  196. | `Myanmar
  197. | `Myanmar_Ext_A
  198. | `Myanmar_Ext_B
  199. | `NB
  200. | `NKo
  201. | `Nabataean
  202. | `Nandinagari
  203. | `New_Tai_Lue
  204. | `Newa
  205. | `No_Block_Assigned
  206. | `Number_Forms
  207. | `Nushu
  208. | `Nyiakeng_Puachue_Hmong
  209. | `OCR
  210. | `Ogham
  211. | `Ol_Chiki
  212. | `Old_Hungarian
  213. | `Old_Italic
  214. | `Old_North_Arabian
  215. | `Old_Permic
  216. | `Old_Persian
  217. | `Old_Sogdian
  218. | `Old_South_Arabian
  219. | `Old_Turkic
  220. | `Oriya
  221. | `Ornamental_Dingbats
  222. | `Osage
  223. | `Osmanya
  224. | `Ottoman_Siyaq_Numbers
  225. | `PUA
  226. | `Pahawh_Hmong
  227. | `Palmyrene
  228. | `Pau_Cin_Hau
  229. | `Phags_Pa
  230. | `Phaistos
  231. | `Phoenician
  232. | `Phonetic_Ext
  233. | `Phonetic_Ext_Sup
  234. | `Playing_Cards
  235. | `Psalter_Pahlavi
  236. | `Punctuation
  237. | `Rejang
  238. | `Rumi
  239. | `Runic
  240. | `Samaritan
  241. | `Saurashtra
  242. | `Sharada
  243. | `Shavian
  244. | `Shorthand_Format_Controls
  245. | `Siddham
  246. | `Sinhala
  247. | `Sinhala_Archaic_Numbers
  248. | `Small_Forms
  249. | `Small_Kana_Ext
  250. | `Sogdian
  251. | `Sora_Sompeng
  252. | `Soyombo
  253. | `Specials
  254. | `Sundanese
  255. | `Sundanese_Sup
  256. | `Sup_Arrows_A
  257. | `Sup_Arrows_B
  258. | `Sup_Arrows_C
  259. | `Sup_Math_Operators
  260. | `Sup_PUA_A
  261. | `Sup_PUA_B
  262. | `Sup_Punctuation
  263. | `Sup_Symbols_And_Pictographs
  264. | `Super_And_Sub
  265. | `Sutton_SignWriting
  266. | `Syloti_Nagri
  267. | `Symbols_And_Pictographs_Ext_A
  268. | `Syriac
  269. | `Syriac_Sup
  270. | `Tagalog
  271. | `Tagbanwa
  272. | `Tags
  273. | `Tai_Le
  274. | `Tai_Tham
  275. | `Tai_Viet
  276. | `Tai_Xuan_Jing
  277. | `Takri
  278. | `Tamil
  279. | `Tamil_Sup
  280. | `Tangut
  281. | `Tangut_Components
  282. | `Telugu
  283. | `Thaana
  284. | `Thai
  285. | `Tibetan
  286. | `Tifinagh
  287. | `Tirhuta
  288. | `Transport_And_Map
  289. | `UCAS
  290. | `UCAS_Ext
  291. | `Ugaritic
  292. | `VS
  293. | `VS_Sup
  294. | `Vai
  295. | `Vedic_Ext
  296. | `Vertical_Forms
  297. | `Wancho
  298. | `Warang_Citi
  299. | `Yi_Radicals
  300. | `Yi_Syllables
  301. | `Yijing
  302. | `Zanabazar_Square
]

Unicode code block

val equal_blk : blk -> blk -> bool

Equality

val show_blk : blk -> string

String representation

type utyp +=
  1. | Typ_block of blk map * blk index

Extend the universal type

type gc = [
  1. | `C
  2. | `Cc
  3. | `Cf
  4. | `Cs
  5. | `Co
  6. | `Cn
  7. | `L
  8. | `LC
  9. | `Lu
  10. | `Ll
  11. | `Lt
  12. | `Lm
  13. | `Lo
  14. | `M
  15. | `Mn
  16. | `Mc
  17. | `Me
  18. | `N
  19. | `Nd
  20. | `Nl
  21. | `No
  22. | `P
  23. | `Pc
  24. | `Pd
  25. | `Ps
  26. | `Pe
  27. | `Pi
  28. | `Pf
  29. | `Po
  30. | `S
  31. | `Sm
  32. | `Sc
  33. | `Sk
  34. | `So
  35. | `Z
  36. | `Zs
  37. | `Zl
  38. | `Zp
]

The general category property value type.

val equal_gc : gc -> gc -> bool

Equality

val show_gc : gc -> string

String representation

type utyp +=
  1. | Typ_general_category of gc map * gc index
type qc =
  1. | QC_yes
  2. | QC_no
  3. | QC_maybe

The normalization quick check property type.

val equal_qc : qc -> qc -> bool

Equality

val show_qc : qc -> string

String representation

type utyp +=
  1. | Typ_quick_check of qc map * qc index

Extension of the universal type

type script = [
  1. | `Adlm
  2. | `Aghb
  3. | `Ahom
  4. | `Arab
  5. | `Armi
  6. | `Armn
  7. | `Avst
  8. | `Bali
  9. | `Bamu
  10. | `Bass
  11. | `Batk
  12. | `Beng
  13. | `Bhks
  14. | `Bopo
  15. | `Brah
  16. | `Brai
  17. | `Bugi
  18. | `Buhd
  19. | `Cakm
  20. | `Cans
  21. | `Cari
  22. | `Cham
  23. | `Cher
  24. | `Copt
  25. | `Cprt
  26. | `Cyrl
  27. | `Deva
  28. | `Dogr
  29. | `Dsrt
  30. | `Dupl
  31. | `Egyp
  32. | `Elba
  33. | `Elym
  34. | `Ethi
  35. | `Geor
  36. | `Glag
  37. | `Gong
  38. | `Gonm
  39. | `Goth
  40. | `Gran
  41. | `Grek
  42. | `Gujr
  43. | `Guru
  44. | `Hang
  45. | `Hani
  46. | `Hano
  47. | `Hatr
  48. | `Hebr
  49. | `Hira
  50. | `Hluw
  51. | `Hmng
  52. | `Hmnp
  53. | `Hrkt
  54. | `Hung
  55. | `Ital
  56. | `Java
  57. | `Kali
  58. | `Kana
  59. | `Khar
  60. | `Khmr
  61. | `Khoj
  62. | `Knda
  63. | `Kthi
  64. | `Lana
  65. | `Laoo
  66. | `Latn
  67. | `Lepc
  68. | `Limb
  69. | `Lina
  70. | `Linb
  71. | `Lisu
  72. | `Lyci
  73. | `Lydi
  74. | `Mahj
  75. | `Maka
  76. | `Mand
  77. | `Mani
  78. | `Marc
  79. | `Medf
  80. | `Mend
  81. | `Merc
  82. | `Mero
  83. | `Mlym
  84. | `Modi
  85. | `Mong
  86. | `Mroo
  87. | `Mtei
  88. | `Mult
  89. | `Mymr
  90. | `Nand
  91. | `Narb
  92. | `Nbat
  93. | `Newa
  94. | `Nkoo
  95. | `Nshu
  96. | `Ogam
  97. | `Olck
  98. | `Orkh
  99. | `Orya
  100. | `Osge
  101. | `Osma
  102. | `Palm
  103. | `Pauc
  104. | `Perm
  105. | `Phag
  106. | `Phli
  107. | `Phlp
  108. | `Phnx
  109. | `Plrd
  110. | `Prti
  111. | `Qaai
  112. | `Rjng
  113. | `Rohg
  114. | `Runr
  115. | `Samr
  116. | `Sarb
  117. | `Saur
  118. | `Sgnw
  119. | `Shaw
  120. | `Shrd
  121. | `Sidd
  122. | `Sind
  123. | `Sinh
  124. | `Sogd
  125. | `Sogo
  126. | `Sora
  127. | `Soyo
  128. | `Sund
  129. | `Sylo
  130. | `Syrc
  131. | `Tagb
  132. | `Takr
  133. | `Tale
  134. | `Talu
  135. | `Taml
  136. | `Tang
  137. | `Tavt
  138. | `Telu
  139. | `Tfng
  140. | `Tglg
  141. | `Thaa
  142. | `Thai
  143. | `Tibt
  144. | `Tirh
  145. | `Ugar
  146. | `Vaii
  147. | `Wara
  148. | `Wcho
  149. | `Xpeo
  150. | `Xsux
  151. | `Yiii
  152. | `Zanb
  153. | `Zinh
  154. | `Zyyy
  155. | `Zzzz
]

Unicode script identifier

val equal_script : script -> script -> bool

Equality

val show_script : script -> string

String representation

type utyp +=
  1. | Typ_script of script map * script index

Extend the universal type.

module Quick : sig ... end

This module contains internal fast-path functions for property query.