Legend:
Library
Module
Module type
Parameter
Class
Class type
type t
The type for glyphs.
To represent a grapheme in unicode is a bit more complicated than what is expected: a printable UChar. For example, diacritics are added to IPA(international phonetic alphabet) letter to produce a modified pronunciation. Variation selectors are added to a CJK character to specify a specific glyph variant for the character.
Therefore the logical type definition of Zed_char.t can be seen as
type Zed_char.t= {
core: UChar.t;
combined: UChar.t list;
}
type char_prop =
| Printableof int
| Other
| Null
The property of a character. It can be either Printable of width, Other(unprintable character) or Null(code 0).
of_utf8 str returns a zed_char from utf8 encoded str. This function checks whether str represents a single UChar or a legal grapheme, i.e. a printable core with optional combining marks. It will raise Failure "malformed Zed_char sequence" If the validation is not passed.
parameterindv_combining
allow to create a Zed_char.t from a single combining mark, default to true
mix_uChar chr uChar tries to append uChar to chr and returns Ok result. If uChar is not a combining mark, then an Error (Zed_char.t consists of uChar) is returned.
val of_uChars :
?trim:bool ->?indv_combining:bool ->Uchar.t list->t option * Uchar.t list
of_uChars uChars transforms uChars to a tuple. The first value is an optional Zed_char.t and the second is a list of remaining uChars. The optional Zed_char.t is either a legal grapheme(a core printable char with optinal combining marks) or a wrap for an arbitrary Uchar.t. After that, all remaining uChars returned as the second value in the tuple.
parametertrim
trim leading combining marks before transforming, default to false
parameterindv_combining
create a Zed_char from an individual dissociated combining mark, default to true
val zChars_of_uChars :
?trim:bool ->?indv_combining:bool ->Uchar.t list->t list * Uchar.t list
zChars of_uChars uChars transforms uChars to a tuple. The first value is a list of Zed_char.t and the second is a list of remaining uChars.
parametertrim
trim leading combining marks before transforming, default to false
parameterindv_combining
create a Zed_char from an individual dissociated combining mark, default to true
The prefix 'unsafe_' of unsafe_of_char and unsafe_of_uChar means the two functions do not check if char or uChar being transformed is a valid grapheme. There is no 'safe_' version, because the scenario we should deal with a single char or uChar is when the char sequence are individual, incomplete. For example, when we are reading user input. Even if a user wants to input a legal grapheme, say, 'a' with a hat(a combining mark) on top. the user will input 'a' and then '^' individually, the later combining mark is always illegal. What we should do is to invoke unsafe_of_uChar user_input and send the result to the edit engine. Other modules in zed, like Zed_string, Zed_lines, Zed_edit ... are already well designed to deal with such a situation. They will do combining mark joining, grapheme validation for you automatically. Use the two 'unsafe_' functions directly, you're doing things the right way.