package batteries

  1. Overview
  2. Docs

doc/batteries.unthreaded/BatLexing/index.html

Module BatLexing

Simple lexing using ocaml conventions

This module extends Stdlib's Lexing module, go there for documentation on the rest of the functions and types.

The run-time library for lexers generated by ocamllex.

Positions
type position = Lexing.position = {
  1. pos_fname : string;
  2. pos_lnum : int;
  3. pos_bol : int;
  4. pos_cnum : int;
}

A value of type position describes a point in a source file. pos_fname is the file name; pos_lnum is the line number; pos_bol is the offset of the beginning of the line (number of characters between the beginning of the file and the beginning of the line); pos_cnum is the offset of the position (number of characters between the beginning of the file and the position).

See the documentation of type lexbuf for information about how the lexing engine will manage positions.

val dummy_pos : position
Lexer buffers
type lexbuf = Lexing.lexbuf = {
  1. refill_buff : lexbuf -> unit;
  2. mutable lex_buffer : Bytes.t;
  3. mutable lex_buffer_len : int;
  4. mutable lex_abs_pos : int;
  5. mutable lex_start_pos : int;
  6. mutable lex_curr_pos : int;
  7. mutable lex_last_pos : int;
  8. mutable lex_last_action : int;
  9. mutable lex_eof_reached : bool;
  10. mutable lex_mem : int array;
  11. mutable lex_start_p : position;
  12. mutable lex_curr_p : position;
}

The type of lexer buffers. A lexer buffer is the argument passed to the scanning functions defined by the generated scanners. The lexer buffer holds the current state of the scanner, plus a function to refill the buffer from the input.

At each token, the lexing engine will copy lex_curr_p to lex_start_p, then change the pos_cnum field of lex_curr_p by updating it with the number of characters read since the start of the lexbuf. The other fields are left unchanged by the lexing engine. In order to keep them accurate, they must be initialised before the first use of the lexbuf, and updated by the relevant lexer actions (i.e. at each end of line -- see also new_line).

Note: Batteries does not currently support the ~with_positions:false mode available since OCaml 4.08 to disable position tracking. If you need this, please get in touch with the Batteries maintainers.

val from_string : ?with_positions:bool -> string -> lexbuf

Create a lexer buffer which reads from the given string. Reading starts from the first character in the string. An end-of-input condition is generated when the end of the string is reached.

val from_function : ?with_positions:bool -> (bytes -> int -> int) -> lexbuf

Create a lexer buffer with the given function as its reading method. When the scanner needs more characters, it will call the given function, giving it a byte sequence s and a byte count n. The function should put n bytes or fewer in s, starting at index 0, and return the number of bytes provided. A return value of 0 means end of input.

val set_position : lexbuf -> position -> unit

Set the initial tracked input position for lexbuf to a custom value. Ignores pos_fname. See set_filename for changing this field.

  • since 4.11
val set_filename : lexbuf -> string -> unit

Set filename in the initial tracked position to file in lexbuf.

  • since 4.11
val with_positions : lexbuf -> bool

Tell whether the lexer buffer keeps track of position fields lex_curr_p / lex_start_p, as determined by the corresponding optional argument for functions that create lexer buffers (whose default value is true).

When with_positions is false, lexer actions should not modify position fields. Doing it nevertheless could re-enable the with_position mode and degrade performances.

Functions for lexer semantic actions

The following functions can be called from the semantic actions of lexer definitions (the ML code enclosed in braces that computes the value returned by lexing functions). They give access to the character string matched by the regular expression associated with the semantic action. These functions must be applied to the argument lexbuf, which, in the code generated by ocamllex, is bound to the lexer buffer passed to the parsing function.

val lexeme : lexbuf -> string

Lexing.lexeme lexbuf returns the string matched by the regular expression.

val lexeme_char : lexbuf -> int -> char

Lexing.lexeme_char lexbuf i returns character number i in the matched string.

val lexeme_start : lexbuf -> int

Lexing.lexeme_start lexbuf returns the offset in the input stream of the first character of the matched string. The first character of the stream has offset 0.

val lexeme_end : lexbuf -> int

Lexing.lexeme_end lexbuf returns the offset in the input stream of the character following the last character of the matched string. The first character of the stream has offset 0.

val lexeme_start_p : lexbuf -> position

Like lexeme_start, but return a complete position instead of an offset.

val lexeme_end_p : lexbuf -> position

Like lexeme_end, but return a complete position instead of an offset.

val new_line : lexbuf -> unit

Update the lex_curr_p field of the lexbuf to reflect the start of a new line. You can call this function in the semantic action of the rule that matches the end-of-line character.

  • since 3.11.0
Miscellaneous functions
val flush_input : lexbuf -> unit

Discard the contents of the buffer and reset the current position to 0. The next use of the lexbuf will trigger a refill.