# package omod

Library

Module

Module type

Parameter

Class

Class type

Strings.

`include module type of String`

## Strings

`make n c`

is a string of length `n`

with each index holding the character `c`

.

`init n f`

is a string of length `n`

with index `i`

holding the character `f i`

(called in increasing index order).

`get s i`

is the character at index `i`

in `s`

. This is the same as writing `s.[i]`

.

Return a new string that contains the same bytes as the given byte sequence.

Return a new byte sequence that contains the same bytes as the given string.

Same as `Bytes.blit_string`

which should be preferred.

## Concatenating

**Note.** The `Stdlib.(^)`

binary operator concatenates two strings.

`concat sep ss`

concatenates the list of strings `ss`

, inserting the separator string `sep`

between each.

## Predicates and comparisons

`compare s0 s1`

sorts `s0`

and `s1`

in lexicographical order. `compare`

behaves like `Stdlib.compare`

on strings but may be more efficient.

`ends_with `

`~suffix s`

is `true`

if and only if `s`

ends with `suffix`

.

`contains_from s start c`

is `true`

if and only if `c`

appears in `s`

after position `start`

.

`rcontains_from s stop c`

is `true`

if and only if `c`

appears in `s`

before position `stop+1`

.

`contains s c`

is `String.contains_from`

` s 0 c`

.

## Extracting substrings

`sub s pos len`

is a string of length `len`

, containing the substring of `s`

that starts at position `pos`

and has length `len`

.

`split_on_char sep s`

is the list of all (possibly empty) substrings of `s`

that are delimited by the character `sep`

.

The function's result is specified by the following invariants:

- The list is not empty.
- Concatenating its elements using
`sep`

as a separator returns a string equal to the input (`concat (make 1 sep) (split_on_char sep s) = s`

). - No string in the result contains the
`sep`

character.

## Transforming

`map f s`

is the string resulting from applying `f`

to all the characters of `s`

in increasing order.

`mapi f s`

is like `map`

but the index of the character is also passed to `f`

.

`fold_left f x s`

computes `f (... (f (f x s.[0]) s.[1]) ...) s.[n-1]`

, where `n`

is the length of the string `s`

.

`fold_right f s x`

computes `f s.[0] (f s.[1] ( ... (f s.[n-1] x) ...))`

, where `n`

is the length of the string `s`

.

`for_all p s`

checks if all characters in `s`

satisfy the predicate `p`

.

`exists p s`

checks if at least one character of `s`

satisfies the predicate `p`

.

`trim s`

is `s`

without leading and trailing whitespace. Whitespace characters are: `' '`

, `'\x0C'`

(form feed), `'\n'`

, `'\r'`

, and `'\t'`

.

`escaped s`

is `s`

with special characters represented by escape sequences, following the lexical conventions of OCaml.

All characters outside the US-ASCII printable range [0x20;0x7E] are escaped, as well as backslash (0x2F) and double-quote (0x22).

The function `Scanf.unescaped`

is a left inverse of `escaped`

, i.e. `Scanf.unescaped (escaped s) = s`

for any string `s`

(unless `escaped s`

fails).

`uppercase_ascii s`

is `s`

with all lowercase letters translated to uppercase, using the US-ASCII character set.

`lowercase_ascii s`

is `s`

with all uppercase letters translated to lowercase, using the US-ASCII character set.

`capitalize_ascii s`

is `s`

with the first character set to uppercase, using the US-ASCII character set.

`uncapitalize_ascii s`

is `s`

with the first character set to lowercase, using the US-ASCII character set.

## Traversing

`iter f s`

applies function `f`

in turn to all the characters of `s`

. It is equivalent to `f s.[0]; f s.[1]; ...; f s.[length s - 1]; ()`

.

`iteri`

is like `iter`

, but the function is also given the corresponding character index.

## Searching

`index_from s i c`

is the index of the first occurrence of `c`

in `s`

after position `i`

.

`index_from_opt s i c`

is the index of the first occurrence of `c`

in `s`

after position `i`

(if any).

`rindex_from s i c`

is the index of the last occurrence of `c`

in `s`

before position `i+1`

.

`rindex_from_opt s i c`

is the index of the last occurrence of `c`

in `s`

before position `i+1`

(if any).

`index s c`

is `String.index_from`

` s 0 c`

.

`index_opt s c`

is `String.index_from_opt`

` s 0 c`

.

`rindex s c`

is `String.rindex_from`

` s (length s - 1) c`

.

`rindex_opt s c`

is `String.rindex_from_opt`

` s (length s - 1) c`

.

## Strings and Sequences

`to_seq s`

is a sequence made of the string's characters in increasing order. In `"unsafe-string"`

mode, modifications of the string during iteration will be reflected in the sequence.

`to_seqi s`

is like `to_seq`

but also tuples the corresponding index.

## UTF decoding and validations

### UTF-8

`val get_utf_8_uchar : t -> int -> Uchar.utf_decode`

`get_utf_8_uchar b i`

decodes an UTF-8 character at index `i`

in `b`

.

`val is_valid_utf_8 : t -> bool`

`is_valid_utf_8 b`

is `true`

if and only if `b`

contains valid UTF-8 data.

### UTF-16BE

`val get_utf_16be_uchar : t -> int -> Uchar.utf_decode`

`get_utf_16be_uchar b i`

decodes an UTF-16BE character at index `i`

in `b`

.

`val is_valid_utf_16be : t -> bool`

`is_valid_utf_16be b`

is `true`

if and only if `b`

contains valid UTF-16BE data.

### UTF-16LE

`val get_utf_16le_uchar : t -> int -> Uchar.utf_decode`

`get_utf_16le_uchar b i`

decodes an UTF-16LE character at index `i`

in `b`

.

`val is_valid_utf_16le : t -> bool`

`is_valid_utf_16le b`

is `true`

if and only if `b`

contains valid UTF-16LE data.

## Binary decoding of integers

The functions in this section binary decode integers from strings.

All following functions raise `Invalid_argument`

if the characters needed at index `i`

to decode the integer are not available.

Little-endian (resp. big-endian) encoding means that least (resp. most) significant bytes are stored first. Big-endian is also known as network byte order. Native-endian encoding is either little-endian or big-endian depending on `Sys.big_endian`

.

32-bit and 64-bit integers are represented by the `int32`

and `int64`

types, which can be interpreted either as signed or unsigned numbers.

8-bit and 16-bit integers are represented by the `int`

type, which has more bits than the binary encoding. These extra bits are sign-extended (or zero-extended) for functions which decode 8-bit or 16-bit integers and represented them with `int`

values.

`get_uint8 b i`

is `b`

's unsigned 8-bit integer starting at character index `i`

.

`get_int8 b i`

is `b`

's signed 8-bit integer starting at character index `i`

.

`get_uint16_ne b i`

is `b`

's native-endian unsigned 16-bit integer starting at character index `i`

.

`get_uint16_be b i`

is `b`

's big-endian unsigned 16-bit integer starting at character index `i`

.

`get_uint16_le b i`

is `b`

's little-endian unsigned 16-bit integer starting at character index `i`

.

`get_int16_ne b i`

is `b`

's native-endian signed 16-bit integer starting at character index `i`

.

`get_int16_be b i`

is `b`

's big-endian signed 16-bit integer starting at character index `i`

.

`get_int16_le b i`

is `b`

's little-endian signed 16-bit integer starting at character index `i`

.

`get_int32_ne b i`

is `b`

's native-endian 32-bit integer starting at character index `i`

.

`val hash : t -> int`

An unseeded hash function for strings, with the same output value as `Hashtbl.hash`

. This function allows this module to be passed as argument to the functor `Hashtbl.Make`

.

`val seeded_hash : int -> t -> int`

A seeded hash function for strings, with the same output value as `Hashtbl.seeded_hash`

. This function allows this module to be passed as argument to the functor `Hashtbl.MakeSeeded`

.

`get_int32_be b i`

is `b`

's big-endian 32-bit integer starting at character index `i`

.

`get_int32_le b i`

is `b`

's little-endian 32-bit integer starting at character index `i`

.

`get_int64_ne b i`

is `b`

's native-endian 64-bit integer starting at character index `i`

.

`get_int64_be b i`

is `b`

's big-endian 64-bit integer starting at character index `i`

.

`cut ~sep s`

cuts `s`

on the left and right of the first char `sep`

starting from the left.

`rev_cut ~sep s`

cuts `s`

on the left and right of the first char `sep`

starting from the right.

`starts_with ~prefix s`

is `true`

iff `prefix`

is a prefix of `s`

. **Note.** Available in 4.13.

`edit_distance s0 s1`

is the number of single character edits (insertion, deletion, substitution) that are needed to change `s0`

into `s1`

.

`suggest ~dist candidates s`

are the elements of `candidates`

whose edit distance is the smallest to `s`

and at most at a distance of `dist`

of `s`

(defaults to `2`

). If multiple results are returned the order of `candidates`

is preserved.