package matrix

  1. Overview
  2. Docs
Fast, modern terminal toolkit for OCaml

Install

dune-project
 Dependency

Authors

Maintainers

Sources

mosaic-0.1.0.tbz
sha256=9e4e90d17f9b2af1b07071fe425bc2c519c849c4f1d1ab73cde512be2d874849
sha512=06e9c4a741590942e81a27738d0b5c0413fafec8cf3b7dae047ad69f155e7b718aa4223818dc161b7d028efffcfd3365905e264d6fd31d453910ddfa91dcf9b9

doc/matrix.glyph/Glyph/index.html

Module GlyphSource

Unicode glyphs for terminal rendering.

A glyph is a packed, unboxed integer representing a visual character in a terminal cell. Glyphs come in two kinds:

  • Simple glyphs store a single Unicode scalar (U+0000 โ€“ U+10FFFF) directly. Zero allocation, zero lookup.
  • Complex glyphs reference a multi-codepoint grapheme cluster interned in a Pool. They carry a pool index, a generation counter, and extent information.

Multi-column characters (wide CJK, emoji) are represented as one start glyph followed by one or more continuation glyphs that reference the same pool entry. Control characters and zero-width sequences map to empty.

Quick start

Create a pool, encode a string, and process glyphs via callback:

  let pool = Pool.create () in
  Pool.encode pool ~width_method:`Unicode ~tab_width:2
    (fun glyph -> Printf.printf "%s " (Pool.to_string pool glyph))
    "Hello ๐Ÿ‘‹ World"

Memory safety

The Pool uses manual reference counting with automatic slot recycling. Pool-backed glyph IDs include a generation counter so that accessing a glyph whose slot has been recycled returns safe defaults (empty, zero width) rather than stale data. This guarantee holds across normal Pool.incref/Pool.decref cycles. Pool.clear resets the pool and invalidates all previously issued IDs.

Width calculation

Display width follows UAX #11 and UAX #29, correctly handling ZWJ emoji sequences, regional indicator (flag) pairs, variation selectors, and skin-tone modifiers. See width_method for the available strategies.

Types

Sourcetype t = private int

The type for glyphs. A packed 63-bit integer, always unboxed.

The type is private to prevent construction of invalid values. Use of_uchar, Pool.intern, Pool.encode, empty, or space to create glyphs. The integer representation is readable (e.g. for storage in Bigarray); use unsafe_of_int when loading from external storage.

Note. The bit layout is not a stable serialization format across major versions.

Sourcetype width_method = [
  1. | `Unicode
  2. | `Wcwidth
  3. | `No_zwj
]

The type for width calculation methods. Determines how grapheme cluster display widths are computed:

  • `Unicode โ€” full UAX #29 segmentation with ZWJ emoji composition. Use for correct emoji and flag rendering.
  • `Wcwidth โ€” grapheme boundary segmentation for rendering, but each grapheme's width is the sum of per-codepoint wcwidth-style widths. Use for legacy compatibility.
  • `No_zwj โ€” UAX #29 segmentation that forces a break after ZWJ (no emoji ZWJ sequences), but keeps the full grapheme-aware width logic (RI pairs, VS16, Indic virama).
Sourcetype line_break_kind = [
  1. | `LF
  2. | `CR
  3. | `CRLF
]

The type for line terminator kinds.

  • `LF โ€” line feed (U+000A).
  • `CR โ€” carriage return (U+000D).
  • `CRLF โ€” the two-byte CR LF sequence.

Constants

Sourceval empty : t

empty is the empty glyph (0). It represents control characters, zero-width sequences, and U+0000. This is the only glyph for which is_empty is true.

Sourceval space : t

space is the space glyph (U+0020, width 1). It is the default blank-cell content in terminal grids.

Creating

Sourceval of_uchar : Uchar.t -> t

of_uchar u is a glyph for the single Unicode scalar u.

The result is empty for control or zero-width codepoints. Simple glyphs are stored directly in the packed integer with no pool allocation.

See also Pool.intern and Pool.encode.

Predicates

Sourceval is_empty : t -> bool

is_empty g is true iff g is empty.

Sourceval is_inline : t -> bool

is_inline g is true iff g requires no pool lookup. Useful for skipping reference counting on simple glyphs.

Sourceval is_start : t -> bool

is_start g is true iff g is the start of a character (simple or complex start).

Sourceval is_continuation : t -> bool

is_continuation g is true iff g is a wide-character continuation placeholder. See make_continuation.

Sourceval is_complex : t -> bool

is_complex g is true iff g is pool-backed (complex start or complex continuation).

Properties

Sourceval grapheme_width : ?tab_width:int -> t -> int

grapheme_width g is the full display width of the grapheme represented by g. For complex glyphs (start or continuation) the result is the total cluster width (1โ€“4). For tab glyphs the result is tab_width.

tab_width defaults to 2.

See also cell_width.

Sourceval cell_width : t -> int

cell_width g is the display width that g occupies in a single cell. The result is 0 for empty and continuation cells. For start cells, the result is the character's display width (1 for most characters, 2 for wide CJK/emoji). Tab glyphs return 1.

Unlike grapheme_width, continuation cells return 0 because they occupy no additional columns beyond the start cell.

Sourceval left_extent : t -> int

left_extent g is the distance from a continuation cell to its start cell. The result is 0 for simple and complex-start glyphs.

Sourceval right_extent : t -> int

right_extent g is the distance from a glyph to the rightmost continuation cell. For a complex start glyph this is width - 1.

Sourceval codepoint : t -> int

codepoint g is the Unicode codepoint of a simple glyph g (U+0000 โ€“ U+10FFFF).

Warning. The result is undefined for complex glyphs.

Sourceval pool_key : t -> int option

pool_key g is Some key if g is a pool-backed glyph (complex start or continuation), and None otherwise. The key is a stable, process-local identity for deduplicating interned grapheme references.

The key is only meaningful for glyphs originating from the same pool.

Construction

Sourceval make_continuation : code:t -> left:int -> right:int -> t

make_continuation ~code ~left ~right is a continuation cell referencing the same pool entry as code with the given left and right extents. left and right are clamped to [0;3]. If code is a simple glyph the continuation carries no pool reference.

Note. Intended for renderer and grid internals that materialize wide-cell spans.

Converting

Sourceval to_int : t -> int

to_int g is the raw integer representation of g.

Note. The integer layout is not a stable serialization format across major versions. Use for in-process storage only (e.g. Bigarray).

See also unsafe_of_int.

Sourceval unsafe_of_int : int -> t

unsafe_of_int n is n interpreted as a glyph without validation.

Warning. The caller must ensure n was produced by to_int or read from trusted storage. An invalid integer causes undefined behaviour in pool operations.

See also to_int.

Pool

A Pool.t manages the storage and lifecycle of complex glyphs (multi-codepoint grapheme clusters) through manual reference counting with generation-based use-after-free protection.

Warning. Pools are not thread-safe. Use one pool per thread or provide external synchronization.

Sourcemodule Pool : sig ... end

String utilities

Pool-free measurement and iteration on raw string values. These functions do not require a Pool.t.

Sourcemodule String : sig ... end