package higlo
Library
Module
Module type
Parameter
Class
Class type
Syntax highligthing
Utf8 text and its length or a negative number it the length was not computed.
type token =
| Bcomment of token_text
(*block comment
*)| Constant of token_text
| Directive of token_text
| Escape of token_text
(*Escape sequence like
*)\123
| Id of token_text
| Keyword of int * token_text
| Lcomment of token_text
(*one line comment
*)| Numeric of token_text
| String of token_text
| Symbol of int * token_text
| Text of token_text
(*Used for everything else
*)| Title of int * token_text
Tokens read in the given code, with the corresponding text and length of the text (in number of codepoints). These names are inspired from the highlight
tool. Keyword
and Symbol
are parametrized by an integer to be able to distinguish different families of keywords and symbols, as kwa
, kwb
, ..., in highlight
.
val string_of_token : token -> string
For debug printing.
exception Error of error
val string_of_error : error -> string
val pp : Stdlib.Format.formatter -> error -> unit
type lexer = Sedlexing.lexbuf -> token list
Lexers are based on Sedlex. A lexer returns a list of tokens, in the same order they appear in the read string. Text
tokens are merged by the parse
function.
val get_lexer : string -> lexer
get_lexer lang
returns the lexer registered for the given language lang
or raises Unknown_lang
if no such language was registered.
val registered_langs : unit -> (string * lexer) list
registered_langs
returns the list of registered pairs (name, lexer).
val register_lang : string -> lexer -> unit
If a lexer was registered for the same language, it is not available any more.
val parse : ?raise_exn:bool -> lang:string -> string -> token list
parse ?raise_exn ~lang code
gets the lexer associated to lang
and uses it to build a list of tokens. Consecutive Text
tokens are merged. If no lexer is associated to the given language, then the function returns [Text code]
.
val parse_lexbuf :
?on_exn:string ->
lang:string ->
Sedlexing.lexbuf ->
token list