package higlo

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type

Syntax highligthing

type token_text = string * int

Utf8 text and its length or a negative number it the length was not computed.

type token =
  1. | Bcomment of token_text
    (*

    block comment

    *)
  2. | Constant of token_text
  3. | Directive of token_text
  4. | Escape of token_text
    (*

    Escape sequence like \123

    *)
  5. | Id of token_text
  6. | Keyword of int * token_text
  7. | Lcomment of token_text
    (*

    one line comment

    *)
  8. | Numeric of token_text
  9. | String of token_text
  10. | Symbol of int * token_text
  11. | Text of token_text
    (*

    Used for everything else

    *)
  12. | Title of int * token_text

Tokens read in the given code, with the corresponding text and length of the text (in number of codepoints). These names are inspired from the highlight tool. Keyword and Symbol are parametrized by an integer to be able to distinguish different families of keywords and symbols, as kwa, kwb, ..., in highlight.

val string_of_token : token -> string

For debug printing.

type error =
  1. | Unknown_lang of string
    (*

    when the required language is not found.

    *)
  2. | Lex_error of Location.t * string
exception Error of error
val string_of_error : error -> string
val pp : Stdlib.Format.formatter -> error -> unit
type lexer = Sedlexing.lexbuf -> token list

Lexers are based on Sedlex. A lexer returns a list of tokens, in the same order they appear in the read string. Text tokens are merged by the parse function.

val get_lexer : string -> lexer

get_lexer lang returns the lexer registered for the given language lang or raises Unknown_lang if no such language was registered.

val registered_langs : unit -> (string * lexer) list

registered_langs returns the list of registered pairs (name, lexer).

val register_lang : string -> lexer -> unit

If a lexer was registered for the same language, it is not available any more.

val parse : ?raise_exn:bool -> lang:string -> string -> token list

parse ?raise_exn ~lang code gets the lexer associated to lang and uses it to build a list of tokens. Consecutive Text tokens are merged. If no lexer is associated to the given language, then the function returns [Text code].

  • parameter raise

    defaults to false. If true, raise exceptions rather than returning [Text code].

val parse_lexbuf : ?on_exn:string -> lang:string -> Sedlexing.lexbuf -> token list