Legend:
Page
Library
Module
Module type
Parameter
Class
Class type
Source
Page
Library
Module
Module type
Parameter
Class
Class type
Source
Zed_charSourceThe type for glyphs.
To represent a grapheme in unicode is a bit more complicated than what is expected: a printable UChar. For example, diacritics are added to IPA(international phonetic alphabet) letter to produce a modified pronunciation. Variation selectors are added to a CJK character to specify a specific glyph variant for the character.
Therefore the logical type definition of Zed_char.t can be seen as
type Zed_char.t= {
core: UChar.t;
combined: UChar.t list;
}The property of a character. It can be either Printable of width, Other(unprintable character) or Null(code 0).
unsafe_of_utf8 str returns a zed_char from utf8 encoded str without any validation.
of_utf8 str returns a zed_char from utf8 encoded str. This function checks whether str represents a single UChar or a legal grapheme, i.e. a printable core with optional combining marks. It will raise Failure "malformed Zed_char sequence" If the validation is not passed.
Returns whether a Uchar.t is a printable character and its width is not zero.
out_of_range ch idx returns whether idx is out of range of ch.
get ch n returns an optional value of the n-th character of ch.
append ch cm append the combining mark cm to ch and returns it. If cm is not a combining mark, then the original ch is returned.
compare_core ch1 ch2 compares the core components of ch1 and ch2
compare_raw ch1 ch2 compares over the internal characters of ch1 and ch2 sequentially
mix_uChar chr uChar tries to append uChar to chr and returns Ok result. If uChar is not a combining mark, then an Error (Zed_char.t consists of uChar) is returned.
of_uChars uChars transforms uChars to a tuple. The first value is an optional Zed_char.t and the second is a list of remaining uChars. The optional Zed_char.t is either a legal grapheme(a core printable char with optinal combining marks) or a wrap for an arbitrary Uchar.t. After that, all remaining uChars returned as the second value in the tuple.
val zChars_of_uChars :
?trim:bool ->
?indv_combining:bool ->
Uchar.t list ->
t list * Uchar.t listzChars of_uChars uChars transforms uChars to a tuple. The first value is a list of Zed_char.t and the second is a list of remaining uChars.
for_all p zChar checks if all elements of zChar satisfy the predicate p.
The prefix 'unsafe_' of unsafe_of_char and unsafe_of_uChar means the two functions do not check if char or uChar being transformed is a valid grapheme. There is no 'safe_' version, because the scenario we should deal with a single char or uChar is when the char sequence are individual, incomplete. For example, when we are reading user input. Even if a user wants to input a legal grapheme, say, 'a' with a hat(a combining mark) on top. the user will input 'a' and then '^' individually, the later combining mark is always illegal. What we should do is to invoke unsafe_of_uChar user_input and send the result to the edit engine. Other modules in zed, like Zed_string, Zed_lines, Zed_edit ... are already well designed to deal with such a situation. They will do combining mark joining, grapheme validation for you automatically. Use the two 'unsafe_' functions directly, you're doing things the right way.