package matrix
Install
dune-project
Dependency
Authors
Maintainers
Sources
sha256=9e4e90d17f9b2af1b07071fe425bc2c519c849c4f1d1ab73cde512be2d874849
sha512=06e9c4a741590942e81a27738d0b5c0413fafec8cf3b7dae047ad69f155e7b718aa4223818dc161b7d028efffcfd3365905e264d6fd31d453910ddfa91dcf9b9
doc/matrix.glyph/Glyph/String/index.html
Module Glyph.StringSource
Measuring
measure ~width_method ~tab_width s is the total display width of s. Control characters contribute 0.
Note. Invalid UTF-8 byte sequences are replaced with U+FFFD, each contributing width 1.
See also measure_sub.
val measure_sub :
width_method:width_method ->
tab_width:int ->
string ->
pos:int ->
len:int ->
intmeasure_sub ~width_method ~tab_width s ~pos ~len is like measure but operates on the substring s.[pos] .. s.[pos + len - 1] without allocating. The result is 0 when len <= 0.
Counting
grapheme_count s is the number of user-perceived characters (grapheme clusters) in s. Uses full UAX #29 segmentation.
Iterating
iter_graphemes f s calls f ~offset ~len for each grapheme cluster in s.
ignore_zwj defaults to false. When true, ZWJ does not join emoji sequences (same boundary behaviour as `No_zwj).
Note. Invalid UTF-8 byte sequences are treated as individual replacement characters (U+FFFD).
See also iter_grapheme_info.
val iter_grapheme_info :
width_method:width_method ->
tab_width:int ->
(offset:int -> len:int -> width:int -> unit) ->
string ->
unititer_grapheme_info ~width_method ~tab_width f s calls f ~offset ~len ~width for each grapheme cluster in s. Uses the same width calculation and ZWJ handling as Pool.encode. Graphemes whose width resolves to 0 (control and zero-width sequences) are skipped.
Note. Invalid UTF-8 byte sequences are treated as individual replacement characters (U+FFFD).
See also iter_graphemes.
val iter_wrap_breaks :
?width_method:width_method ->
(break_byte_offset:int ->
next_byte_offset:int ->
grapheme_offset:int ->
unit) ->
string ->
unititer_wrap_breaks f s calls f ~break_byte_offset ~next_byte_offset ~grapheme_offset for each word-wrap break opportunity in s, in order from start to end, with:
break_byte_offset— zero-based byte position of the grapheme containing the wrap-break character.next_byte_offset— zero-based byte position of the next grapheme after the break (the resume position).grapheme_offset— zero-based grapheme index of the grapheme containing the wrap-break character.
Breaks occur after graphemes containing ASCII space, tab, hyphen, path separators, punctuation, brackets, and Unicode NBSP, ZWSP, soft hyphen, and typographic spaces.
width_method controls grapheme boundary detection: `Unicode (the default) treats ZWJ sequences as single graphemes, `No_zwj breaks them apart.
See also iter_line_breaks.
iter_line_breaks f s calls f ~pos ~kind for each line terminator in s, in order from start to end, with:
pos— zero-based byte position. For`CRLFthis is the position of the LF byte; for`LFand`CR, the respective byte.kind— theline_break_kind.
CRLF sequences are reported once as `CRLF, not as separate `CR and `LF breaks.
See also iter_wrap_breaks.