package orsetto

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type

Unicode character set properties.

Overview

This module provides an interface to the Unicode character set database.

Types
type 'a map = 'a Ucs_ucdgen_aux.map

An alias for the abstract type representing a map of all Unicode code points to the value of its corresponding property.

type 'a index

The property index type. The full Unicode character database is large, and the portion required by the Orsetto Ucs library itself is small, so values of this type provide an abstraction of the relevant portion of the database available to the application.

type utyp = ..

The extensible universal property type.

type utyp +=
  1. | Typ_bool of bool map * bool index
  2. | Typ_int of int map * int index
  3. | Typ_string of string map * string index
  4. | Typ_uchars of Uchar.t list option map * Uchar.t list option index

The core population of the extensible universal property type.

Functions and Constants
val create_index : (string * 'a) list -> 'a index

Use create_index s to compose an index from a sequence of pairs.

val query_map : 'a map -> Uchar.t -> 'a

Use query m c to resolve the value property m for character c.

val search_index : 'a index -> string -> 'a option

Use search_index idx nym to query the index idx for the entry named by nym. Index keys are loosely matched.

val search_property : utyp index -> string -> utyp option

Use search_property idx nym to query the property database index idx for the property named nym. Property names are loosely matched.

val require_property : utyp index -> string -> utyp

Use require_property idx nym to query the property database index idx for the property named nym. Property names are loosedly matched. Raises Not_found if no property named nym is indexed.

type blk = [
  1. | `ASCII
  2. | `Adlam
  3. | `Aegean_Numbers
  4. | `Ahom
  5. | `Alchemical
  6. | `Alphabetic_PF
  7. | `Anatolian_Hieroglyphs
  8. | `Ancient_Greek_Music
  9. | `Ancient_Greek_Numbers
  10. | `Ancient_Symbols
  11. | `Arabic
  12. | `Arabic_Ext_A
  13. | `Arabic_Math
  14. | `Arabic_PF_A
  15. | `Arabic_PF_B
  16. | `Arabic_Sup
  17. | `Armenian
  18. | `Arrows
  19. | `Avestan
  20. | `Balinese
  21. | `Bamum
  22. | `Bamum_Sup
  23. | `Bassa_Vah
  24. | `Batak
  25. | `Bengali
  26. | `Bhaiksuki
  27. | `Block_Elements
  28. | `Bopomofo
  29. | `Bopomofo_Ext
  30. | `Box_Drawing
  31. | `Brahmi
  32. | `Braille
  33. | `Buginese
  34. | `Buhid
  35. | `Byzantine_Music
  36. | `CJK
  37. | `CJK_Compat
  38. | `CJK_Compat_Forms
  39. | `CJK_Compat_Ideographs
  40. | `CJK_Compat_Ideographs_Sup
  41. | `CJK_Ext_A
  42. | `CJK_Ext_B
  43. | `CJK_Ext_C
  44. | `CJK_Ext_D
  45. | `CJK_Ext_E
  46. | `CJK_Ext_F
  47. | `CJK_Ext_G
  48. | `CJK_Radicals_Sup
  49. | `CJK_Strokes
  50. | `CJK_Symbols
  51. | `Carian
  52. | `Caucasian_Albanian
  53. | `Chakma
  54. | `Cham
  55. | `Cherokee
  56. | `Cherokee_Sup
  57. | `Chess_Symbols
  58. | `Chorasmian
  59. | `Compat_Jamo
  60. | `Control_Pictures
  61. | `Coptic
  62. | `Coptic_Epact_Numbers
  63. | `Counting_Rod
  64. | `Cuneiform
  65. | `Cuneiform_Numbers
  66. | `Currency_Symbols
  67. | `Cypriot_Syllabary
  68. | `Cyrillic
  69. | `Cyrillic_Ext_A
  70. | `Cyrillic_Ext_B
  71. | `Cyrillic_Ext_C
  72. | `Cyrillic_Sup
  73. | `Deseret
  74. | `Devanagari
  75. | `Devanagari_Ext
  76. | `Diacriticals
  77. | `Diacriticals_Ext
  78. | `Diacriticals_For_Symbols
  79. | `Diacriticals_Sup
  80. | `Dingbats
  81. | `Dives_Akuru
  82. | `Dogra
  83. | `Domino
  84. | `Duployan
  85. | `Early_Dynastic_Cuneiform
  86. | `Egyptian_Hieroglyphs
  87. | `Egyptian_Hieroglyph_Format_Controls
  88. | `Elbasan
  89. | `Elymaic
  90. | `Emoticons
  91. | `Enclosed_Alphanum
  92. | `Enclosed_Alphanum_Sup
  93. | `Enclosed_CJK
  94. | `Enclosed_Ideographic_Sup
  95. | `Ethiopic
  96. | `Ethiopic_Ext
  97. | `Ethiopic_Ext_A
  98. | `Ethiopic_Sup
  99. | `Geometric_Shapes
  100. | `Geometric_Shapes_Ext
  101. | `Georgian
  102. | `Georgian_Ext
  103. | `Georgian_Sup
  104. | `Glagolitic
  105. | `Glagolitic_Sup
  106. | `Gothic
  107. | `Grantha
  108. | `Greek
  109. | `Greek_Ext
  110. | `Gujarati
  111. | `Gunjala_Gondi
  112. | `Gurmukhi
  113. | `Half_And_Full_Forms
  114. | `Half_Marks
  115. | `Hangul
  116. | `Hanifi_Rohingya
  117. | `Hanunoo
  118. | `Hatran
  119. | `Hebrew
  120. | `High_PU_Surrogates
  121. | `High_Surrogates
  122. | `Hiragana
  123. | `IDC
  124. | `IPA_Ext
  125. | `Ideographic_Symbols
  126. | `Imperial_Aramaic
  127. | `Indic_Number_Forms
  128. | `Indic_Siyaq_Numbers
  129. | `Inscriptional_Pahlavi
  130. | `Inscriptional_Parthian
  131. | `Jamo
  132. | `Jamo_Ext_A
  133. | `Jamo_Ext_B
  134. | `Javanese
  135. | `Kaithi
  136. | `Kana_Ext_A
  137. | `Kana_Sup
  138. | `Kanbun
  139. | `Kangxi
  140. | `Kannada
  141. | `Katakana
  142. | `Katakana_Ext
  143. | `Kayah_Li
  144. | `Kharoshthi
  145. | `Khitan_Small_Script
  146. | `Khmer
  147. | `Khmer_Symbols
  148. | `Khojki
  149. | `Khudawadi
  150. | `Lao
  151. | `Latin_1_Sup
  152. | `Latin_Ext_A
  153. | `Latin_Ext_Additional
  154. | `Latin_Ext_B
  155. | `Latin_Ext_C
  156. | `Latin_Ext_D
  157. | `Latin_Ext_E
  158. | `Lepcha
  159. | `Letterlike_Symbols
  160. | `Limbu
  161. | `Linear_A
  162. | `Linear_B_Ideograms
  163. | `Linear_B_Syllabary
  164. | `Lisu
  165. | `Lisu_Sup
  166. | `Low_Surrogates
  167. | `Lycian
  168. | `Lydian
  169. | `Mahajani
  170. | `Mahjong
  171. | `Makasar
  172. | `Malayalam
  173. | `Mandaic
  174. | `Manichaean
  175. | `Marchen
  176. | `Masaram_Gondi
  177. | `Math_Alphanum
  178. | `Math_Operators
  179. | `Mayan_Numerals
  180. | `Medefaidrin
  181. | `Meetei_Mayek
  182. | `Meetei_Mayek_Ext
  183. | `Mende_Kikakui
  184. | `Meroitic_Cursive
  185. | `Meroitic_Hieroglyphs
  186. | `Miao
  187. | `Misc_Arrows
  188. | `Misc_Math_Symbols_A
  189. | `Misc_Math_Symbols_B
  190. | `Misc_Pictographs
  191. | `Misc_Symbols
  192. | `Misc_Technical
  193. | `Modi
  194. | `Modifier_Letters
  195. | `Modifier_Tone_Letters
  196. | `Mongolian
  197. | `Mongolian_Sup
  198. | `Mro
  199. | `Multani
  200. | `Music
  201. | `Myanmar
  202. | `Myanmar_Ext_A
  203. | `Myanmar_Ext_B
  204. | `NB
  205. | `NKo
  206. | `Nabataean
  207. | `Nandinagari
  208. | `New_Tai_Lue
  209. | `Newa
  210. | `No_Block_Assigned
  211. | `Number_Forms
  212. | `Nushu
  213. | `Nyiakeng_Puachue_Hmong
  214. | `OCR
  215. | `Ogham
  216. | `Ol_Chiki
  217. | `Old_Hungarian
  218. | `Old_Italic
  219. | `Old_North_Arabian
  220. | `Old_Permic
  221. | `Old_Persian
  222. | `Old_Sogdian
  223. | `Old_South_Arabian
  224. | `Old_Turkic
  225. | `Oriya
  226. | `Ornamental_Dingbats
  227. | `Osage
  228. | `Osmanya
  229. | `Ottoman_Siyaq_Numbers
  230. | `PUA
  231. | `Pahawh_Hmong
  232. | `Palmyrene
  233. | `Pau_Cin_Hau
  234. | `Phags_Pa
  235. | `Phaistos
  236. | `Phoenician
  237. | `Phonetic_Ext
  238. | `Phonetic_Ext_Sup
  239. | `Playing_Cards
  240. | `Psalter_Pahlavi
  241. | `Punctuation
  242. | `Rejang
  243. | `Rumi
  244. | `Runic
  245. | `Samaritan
  246. | `Saurashtra
  247. | `Sharada
  248. | `Shavian
  249. | `Shorthand_Format_Controls
  250. | `Siddham
  251. | `Sinhala
  252. | `Sinhala_Archaic_Numbers
  253. | `Small_Forms
  254. | `Small_Kana_Ext
  255. | `Sogdian
  256. | `Sora_Sompeng
  257. | `Soyombo
  258. | `Specials
  259. | `Sundanese
  260. | `Sundanese_Sup
  261. | `Sup_Arrows_A
  262. | `Sup_Arrows_B
  263. | `Sup_Arrows_C
  264. | `Sup_Math_Operators
  265. | `Sup_PUA_A
  266. | `Sup_PUA_B
  267. | `Sup_Punctuation
  268. | `Sup_Symbols_And_Pictographs
  269. | `Super_And_Sub
  270. | `Sutton_SignWriting
  271. | `Syloti_Nagri
  272. | `Symbols_And_Pictographs_Ext_A
  273. | `Symbols_For_Legacy_Computing
  274. | `Syriac
  275. | `Syriac_Sup
  276. | `Tagalog
  277. | `Tagbanwa
  278. | `Tags
  279. | `Tai_Le
  280. | `Tai_Tham
  281. | `Tai_Viet
  282. | `Tai_Xuan_Jing
  283. | `Takri
  284. | `Tamil
  285. | `Tamil_Sup
  286. | `Tangut
  287. | `Tangut_Components
  288. | `Tangut_Sup
  289. | `Telugu
  290. | `Thaana
  291. | `Thai
  292. | `Tibetan
  293. | `Tifinagh
  294. | `Tirhuta
  295. | `Transport_And_Map
  296. | `UCAS
  297. | `UCAS_Ext
  298. | `Ugaritic
  299. | `VS
  300. | `VS_Sup
  301. | `Vai
  302. | `Vedic_Ext
  303. | `Vertical_Forms
  304. | `Wancho
  305. | `Warang_Citi
  306. | `Yezidi
  307. | `Yi_Radicals
  308. | `Yi_Syllables
  309. | `Yijing
  310. | `Zanabazar_Square
]

Unicode code block

val equal_blk : blk -> blk -> bool

Equality

val show_blk : blk -> string

String representation

type utyp +=
  1. | Typ_block of blk map * blk index

Extend the universal type

type gc = [
  1. | `C
  2. | `Cc
  3. | `Cf
  4. | `Cs
  5. | `Co
  6. | `Cn
  7. | `L
  8. | `LC
  9. | `Lu
  10. | `Ll
  11. | `Lt
  12. | `Lm
  13. | `Lo
  14. | `M
  15. | `Mn
  16. | `Mc
  17. | `Me
  18. | `N
  19. | `Nd
  20. | `Nl
  21. | `No
  22. | `P
  23. | `Pc
  24. | `Pd
  25. | `Ps
  26. | `Pe
  27. | `Pi
  28. | `Pf
  29. | `Po
  30. | `S
  31. | `Sm
  32. | `Sc
  33. | `Sk
  34. | `So
  35. | `Z
  36. | `Zs
  37. | `Zl
  38. | `Zp
]

The general category property value type.

val equal_gc : gc -> gc -> bool

Equality

val show_gc : gc -> string

String representation

type utyp +=
  1. | Typ_general_category of gc map * gc index
type qc =
  1. | QC_yes
  2. | QC_no
  3. | QC_maybe

The normalization quick check property type.

val equal_qc : qc -> qc -> bool

Equality

val show_qc : qc -> string

String representation

type utyp +=
  1. | Typ_quick_check of qc map * qc index

Extension of the universal type

type script = [
  1. | `Adlm
  2. | `Aghb
  3. | `Ahom
  4. | `Arab
  5. | `Armi
  6. | `Armn
  7. | `Avst
  8. | `Bali
  9. | `Bamu
  10. | `Bass
  11. | `Batk
  12. | `Beng
  13. | `Bhks
  14. | `Bopo
  15. | `Brah
  16. | `Brai
  17. | `Bugi
  18. | `Buhd
  19. | `Cakm
  20. | `Cans
  21. | `Cari
  22. | `Cham
  23. | `Cher
  24. | `Chrs
  25. | `Copt
  26. | `Cprt
  27. | `Cyrl
  28. | `Deva
  29. | `Diak
  30. | `Dogr
  31. | `Dsrt
  32. | `Dupl
  33. | `Egyp
  34. | `Elba
  35. | `Elym
  36. | `Ethi
  37. | `Geor
  38. | `Glag
  39. | `Gong
  40. | `Gonm
  41. | `Goth
  42. | `Gran
  43. | `Grek
  44. | `Gujr
  45. | `Guru
  46. | `Hang
  47. | `Hani
  48. | `Hano
  49. | `Hatr
  50. | `Hebr
  51. | `Hira
  52. | `Hluw
  53. | `Hmng
  54. | `Hmnp
  55. | `Hrkt
  56. | `Hung
  57. | `Ital
  58. | `Java
  59. | `Kali
  60. | `Kana
  61. | `Khar
  62. | `Khmr
  63. | `Khoj
  64. | `Kits
  65. | `Knda
  66. | `Kthi
  67. | `Lana
  68. | `Laoo
  69. | `Latn
  70. | `Lepc
  71. | `Limb
  72. | `Lina
  73. | `Linb
  74. | `Lisu
  75. | `Lyci
  76. | `Lydi
  77. | `Mahj
  78. | `Maka
  79. | `Mand
  80. | `Mani
  81. | `Marc
  82. | `Medf
  83. | `Mend
  84. | `Merc
  85. | `Mero
  86. | `Mlym
  87. | `Modi
  88. | `Mong
  89. | `Mroo
  90. | `Mtei
  91. | `Mult
  92. | `Mymr
  93. | `Nand
  94. | `Narb
  95. | `Nbat
  96. | `Newa
  97. | `Nkoo
  98. | `Nshu
  99. | `Ogam
  100. | `Olck
  101. | `Orkh
  102. | `Orya
  103. | `Osge
  104. | `Osma
  105. | `Palm
  106. | `Pauc
  107. | `Perm
  108. | `Phag
  109. | `Phli
  110. | `Phlp
  111. | `Phnx
  112. | `Plrd
  113. | `Prti
  114. | `Qaai
  115. | `Rjng
  116. | `Rohg
  117. | `Runr
  118. | `Samr
  119. | `Sarb
  120. | `Saur
  121. | `Sgnw
  122. | `Shaw
  123. | `Shrd
  124. | `Sidd
  125. | `Sind
  126. | `Sinh
  127. | `Sogd
  128. | `Sogo
  129. | `Sora
  130. | `Soyo
  131. | `Sund
  132. | `Sylo
  133. | `Syrc
  134. | `Tagb
  135. | `Takr
  136. | `Tale
  137. | `Talu
  138. | `Taml
  139. | `Tang
  140. | `Tavt
  141. | `Telu
  142. | `Tfng
  143. | `Tglg
  144. | `Thaa
  145. | `Thai
  146. | `Tibt
  147. | `Tirh
  148. | `Ugar
  149. | `Vaii
  150. | `Wara
  151. | `Wcho
  152. | `Xpeo
  153. | `Xsux
  154. | `Yezi
  155. | `Yiii
  156. | `Zanb
  157. | `Zinh
  158. | `Zyyy
  159. | `Zzzz
]

Unicode script identifier

val equal_script : script -> script -> bool

Equality

val show_script : script -> string

String representation

type utyp +=
  1. | Typ_script of script map * script index

Extend the universal type.

module Quick : sig ... end

This module contains internal fast-path functions for property query.