package quickjs

You can search for identifiers within the package.

in-package search v0.2.0

On This Page

Normalization
Case Conversion
Single Character Operations
Character Classification
Regex Support

package quickjs

doc
- CHANGES
- LICENSE
- README
- Library quickjs
  - Quickjs
    
    Global
    
    Number
    
    Prototype
    
    RegExp
    
    String
    
    Prototype
    
    Unicode
- Library quickjs.bindings
  - Bindings
    
    C
    
    Type
    
    Functions
    
    Function_description
    
    Functions
    
    Type_description
    
    Types
    
    Types_generated
- Library quickjs.c
  - Atod
  - Cutils
  - Dtoa
  - Libregexp
  - Libunicode
- Sources
  - quickjs
    
    Global.ml
    
    Number.ml
    
    RegExp.ml
    
    String.ml
    
    Unicode.ml
    
    quickjs.ml
  - quickjs.bindings
    
    bindings.ml
    
    c.ml
    
    function_description.ml
    
    libregexp__c_generated_functions__Function_description__Functions.ml
    
    libregexp__c_generated_types.ml
    
    type_description.ml
    
    types_generated.ml
  - quickjs.c
    
    atod.ml
    
    cutils.ml
    
    dtoa.ml
    
    libregexp.ml
    
    libunicode.ml

Legend:
Page
Library
Module
Module type
Parameter
Class
Class type
Source

Module `Quickjs.Unicode`Source

Unicode utilities from QuickJS's libunicode

This module provides Unicode character classification, case conversion, and normalization functions. It uses the same battle-tested Unicode tables as QuickJS's ES2023-compliant JavaScript engine.

Normalization

Sourcetype normalization =

| NFC
(*
Canonical Decomposition, followed by Canonical Composition
*)
| NFD
(*
Canonical Decomposition
*)
| NFKC
(*
Compatibility Decomposition, followed by Canonical Composition
*)
| NFKD
(*
Compatibility Decomposition
*)

Unicode normalization forms

Sourceval normalize : normalization -> string -> string option

normalize form str normalizes a UTF-8 string to the specified form. Returns None on memory allocation failure or invalid input.

Example:

  normalize NFC "café" (* composed form *) normalize NFD
    "café" (* decomposed form *)

Case Conversion

Sourceval lowercase : string -> string

lowercase str converts a UTF-8 string to lowercase. Handles Unicode characters like "ÉCOLE" → "école".

Sourceval uppercase : string -> string

uppercase str converts a UTF-8 string to uppercase. Handles special cases like "ß" → "SS".

Single Character Operations

Sourceval lowercase_char : Uchar.t -> Uchar.t list

lowercase_char c returns the lowercase form of a code point. Returns a list because some characters expand (though lowercase rarely does).

Sourceval uppercase_char : Uchar.t -> Uchar.t list

uppercase_char c returns the uppercase form of a code point. Returns a list because some characters expand, e.g., 'ß' → 'S'; 'S'.

Character Classification

Sourceval is_cased : Uchar.t -> bool

is_cased c returns true if the character has uppercase/lowercase forms. Examples: 'a', 'A', 'é' are cased; '1', '!' are not.

Sourceval is_case_ignorable : Uchar.t -> bool

is_case_ignorable c returns true if the character is ignored during case mapping operations (e.g., combining marks).

Sourceval is_id_start : Uchar.t -> bool

is_id_start c returns true if the character can start a JavaScript/Unicode identifier (letters, $, _).

Sourceval is_id_continue : Uchar.t -> bool

is_id_continue c returns true if the character can continue a JavaScript/Unicode identifier (letters, digits, $, _, combining marks).

Sourceval is_whitespace : Uchar.t -> bool

is_whitespace c returns true if the character is Unicode whitespace. Includes ASCII space, tab, newline, and Unicode spaces like U+00A0 (NBSP).

Regex Support

Sourceval canonicalize : ?unicode:bool -> Uchar.t -> Uchar.t

canonicalize ?unicode c returns the canonical form of a character for case-insensitive regex matching.

unicode: if true (default), use full Unicode case folding; if false, only ASCII case folding.

package quickjs

Module Quickjs.UnicodeSource

Normalization

Case Conversion

Single Character Operations

Character Classification

Regex Support

Module `Quickjs.Unicode`Source