package matrix
Install
dune-project
Dependency
Authors
Maintainers
Sources
sha256=9e4e90d17f9b2af1b07071fe425bc2c519c849c4f1d1ab73cde512be2d874849
sha512=06e9c4a741590942e81a27738d0b5c0413fafec8cf3b7dae047ad69f155e7b718aa4223818dc161b7d028efffcfd3365905e264d6fd31d453910ddfa91dcf9b9
doc/matrix.glyph/Glyph/index.html
Module GlyphSource
Unicode glyphs for terminal rendering.
A glyph is a packed, unboxed integer representing a visual character in a terminal cell. Glyphs come in two kinds:
- Simple glyphs store a single Unicode scalar (U+0000 โ U+10FFFF) directly. Zero allocation, zero lookup.
- Complex glyphs reference a multi-codepoint grapheme cluster interned in a
Pool. They carry a pool index, a generation counter, and extent information.
Multi-column characters (wide CJK, emoji) are represented as one start glyph followed by one or more continuation glyphs that reference the same pool entry. Control characters and zero-width sequences map to empty.
Quick start
Create a pool, encode a string, and process glyphs via callback:
let pool = Pool.create () in
Pool.encode pool ~width_method:`Unicode ~tab_width:2
(fun glyph -> Printf.printf "%s " (Pool.to_string pool glyph))
"Hello ๐ World"Memory safety
The Pool uses manual reference counting with automatic slot recycling. Pool-backed glyph IDs include a generation counter so that accessing a glyph whose slot has been recycled returns safe defaults (empty, zero width) rather than stale data. This guarantee holds across normal Pool.incref/Pool.decref cycles. Pool.clear resets the pool and invalidates all previously issued IDs.
Width calculation
Display width follows UAX #11 and UAX #29, correctly handling ZWJ emoji sequences, regional indicator (flag) pairs, variation selectors, and skin-tone modifiers. See width_method for the available strategies.
Types
The type for glyphs. A packed 63-bit integer, always unboxed.
The type is private to prevent construction of invalid values. Use of_uchar, Pool.intern, Pool.encode, empty, or space to create glyphs. The integer representation is readable (e.g. for storage in Bigarray); use unsafe_of_int when loading from external storage.
Note. The bit layout is not a stable serialization format across major versions.
The type for width calculation methods. Determines how grapheme cluster display widths are computed:
`Unicodeโ full UAX #29 segmentation with ZWJ emoji composition. Use for correct emoji and flag rendering.`Wcwidthโ grapheme boundary segmentation for rendering, but each grapheme's width is the sum of per-codepoint wcwidth-style widths. Use for legacy compatibility.`No_zwjโ UAX #29 segmentation that forces a break after ZWJ (no emoji ZWJ sequences), but keeps the full grapheme-aware width logic (RI pairs, VS16, Indic virama).
The type for line terminator kinds.
`LFโ line feed (U+000A).`CRโ carriage return (U+000D).`CRLFโ the two-byte CR LF sequence.
Constants
empty is the empty glyph (0). It represents control characters, zero-width sequences, and U+0000. This is the only glyph for which is_empty is true.
space is the space glyph (U+0020, width 1). It is the default blank-cell content in terminal grids.
Creating
of_uchar u is a glyph for the single Unicode scalar u.
The result is empty for control or zero-width codepoints. Simple glyphs are stored directly in the packed integer with no pool allocation.
See also Pool.intern and Pool.encode.
Predicates
is_inline g is true iff g requires no pool lookup. Useful for skipping reference counting on simple glyphs.
is_start g is true iff g is the start of a character (simple or complex start).
is_continuation g is true iff g is a wide-character continuation placeholder. See make_continuation.
is_complex g is true iff g is pool-backed (complex start or complex continuation).
Properties
grapheme_width g is the full display width of the grapheme represented by g. For complex glyphs (start or continuation) the result is the total cluster width (1โ4). For tab glyphs the result is tab_width.
tab_width defaults to 2.
See also cell_width.
cell_width g is the display width that g occupies in a single cell. The result is 0 for empty and continuation cells. For start cells, the result is the character's display width (1 for most characters, 2 for wide CJK/emoji). Tab glyphs return 1.
Unlike grapheme_width, continuation cells return 0 because they occupy no additional columns beyond the start cell.
left_extent g is the distance from a continuation cell to its start cell. The result is 0 for simple and complex-start glyphs.
right_extent g is the distance from a glyph to the rightmost continuation cell. For a complex start glyph this is width - 1.
codepoint g is the Unicode codepoint of a simple glyph g (U+0000 โ U+10FFFF).
Warning. The result is undefined for complex glyphs.
pool_key g is Some key if g is a pool-backed glyph (complex start or continuation), and None otherwise. The key is a stable, process-local identity for deduplicating interned grapheme references.
The key is only meaningful for glyphs originating from the same pool.
Construction
make_continuation ~code ~left ~right is a continuation cell referencing the same pool entry as code with the given left and right extents. left and right are clamped to [0;3]. If code is a simple glyph the continuation carries no pool reference.
Note. Intended for renderer and grid internals that materialize wide-cell spans.
Converting
to_int g is the raw integer representation of g.
Note. The integer layout is not a stable serialization format across major versions. Use for in-process storage only (e.g. Bigarray).
See also unsafe_of_int.
Pool
A Pool.t manages the storage and lifecycle of complex glyphs (multi-codepoint grapheme clusters) through manual reference counting with generation-based use-after-free protection.
Warning. Pools are not thread-safe. Use one pool per thread or provide external synchronization.
String utilities
Pool-free measurement and iteration on raw string values. These functions do not require a Pool.t.