package stdune
Library
Module
Module type
Parameter
Class
Class type
include module type of struct include StringLabels end with type t := t
Strings
make n c
is a string of length n
with each index holding the character c
.
init n ~f
is a string of length n
with index i
holding the character f i
(called in increasing index order).
get s i
is the character at index i
in s
. This is the same as writing s.[i]
.
Return a new string that contains the same bytes as the given byte sequence.
Return a new byte sequence that contains the same bytes as the given string.
Same as Bytes.blit_string
which should be preferred.
Concatenating
Note. The Stdlib.(^)
binary operator concatenates two strings.
concat ~sep ss
concatenates the list of strings ss
, inserting the separator string sep
between each.
Predicates and comparisons
starts_with
~prefix s
is true
if and only if s
starts with prefix
.
ends_with
~suffix s
is true
if and only if s
ends with suffix
.
contains_from s start c
is true
if and only if c
appears in s
after position start
.
rcontains_from s stop c
is true
if and only if c
appears in s
before position stop+1
.
contains s c
is String.contains_from
s 0 c
.
Extracting substrings
sub s ~pos ~len
is a string of length len
, containing the substring of s
that starts at position pos
and has length len
.
split_on_char ~sep s
is the list of all (possibly empty) substrings of s
that are delimited by the character sep
.
The function's result is specified by the following invariants:
- The list is not empty.
- Concatenating its elements using
sep
as a separator returns a string equal to the input (concat (make 1 sep) (split_on_char sep s) = s
). - No string in the result contains the
sep
character.
Transforming
map f s
is the string resulting from applying f
to all the characters of s
in increasing order.
mapi ~f s
is like map
but the index of the character is also passed to f
.
fold_left f x s
computes f (... (f (f x s.[0]) s.[1]) ...) s.[n-1]
, where n
is the length of the string s
.
fold_right f s x
computes f s.[0] (f s.[1] ( ... (f s.[n-1] x) ...))
, where n
is the length of the string s
.
trim s
is s
without leading and trailing whitespace. Whitespace characters are: ' '
, '\x0C'
(form feed), '\n'
, '\r'
, and '\t'
.
escaped s
is s
with special characters represented by escape sequences, following the lexical conventions of OCaml.
All characters outside the US-ASCII printable range [0x20;0x7E] are escaped, as well as backslash (0x2F) and double-quote (0x22).
The function Scanf.unescaped
is a left inverse of escaped
, i.e. Scanf.unescaped (escaped s) = s
for any string s
(unless escaped s
fails).
uppercase_ascii s
is s
with all lowercase letters translated to uppercase, using the US-ASCII character set.
lowercase_ascii s
is s
with all uppercase letters translated to lowercase, using the US-ASCII character set.
capitalize_ascii s
is s
with the first character set to uppercase, using the US-ASCII character set.
uncapitalize_ascii s
is s
with the first character set to lowercase, using the US-ASCII character set.
Traversing
iter ~f s
applies function f
in turn to all the characters of s
. It is equivalent to f s.[0]; f s.[1]; ...; f s.[length s - 1]; ()
.
iteri
is like iter
, but the function is also given the corresponding character index.
Searching
index_from_opt s i c
is the index of the first occurrence of c
in s
after position i
(if any).
rindex_from_opt s i c
is the index of the last occurrence of c
in s
before position i+1
(if any).
index_opt s c
is String.index_from_opt
s 0 c
.
rindex_opt s c
is String.rindex_from_opt
s (length s - 1) c
.
Strings and Sequences
to_seq s
is a sequence made of the string's characters in increasing order. In "unsafe-string"
mode, modifications of the string during iteration will be reflected in the sequence.
to_seqi s
is like to_seq
but also tuples the corresponding index.
UTF decoding and validations
UTF-8
val get_utf_8_uchar : t -> int -> Uchar.utf_decode
get_utf_8_uchar b i
decodes an UTF-8 character at index i
in b
.
val is_valid_utf_8 : t -> bool
is_valid_utf_8 b
is true
if and only if b
contains valid UTF-8 data.
UTF-16BE
val get_utf_16be_uchar : t -> int -> Uchar.utf_decode
get_utf_16be_uchar b i
decodes an UTF-16BE character at index i
in b
.
val is_valid_utf_16be : t -> bool
is_valid_utf_16be b
is true
if and only if b
contains valid UTF-16BE data.
UTF-16LE
val get_utf_16le_uchar : t -> int -> Uchar.utf_decode
get_utf_16le_uchar b i
decodes an UTF-16LE character at index i
in b
.
val is_valid_utf_16le : t -> bool
is_valid_utf_16le b
is true
if and only if b
contains valid UTF-16LE data.
Binary decoding of integers
The functions in this section binary decode integers from strings.
All following functions raise Invalid_argument
if the characters needed at index i
to decode the integer are not available.
Little-endian (resp. big-endian) encoding means that least (resp. most) significant bytes are stored first. Big-endian is also known as network byte order. Native-endian encoding is either little-endian or big-endian depending on Sys.big_endian
.
32-bit and 64-bit integers are represented by the int32
and int64
types, which can be interpreted either as signed or unsigned numbers.
8-bit and 16-bit integers are represented by the int
type, which has more bits than the binary encoding. These extra bits are sign-extended (or zero-extended) for functions which decode 8-bit or 16-bit integers and represented them with int
values.
get_uint8 b i
is b
's unsigned 8-bit integer starting at character index i
.
get_int8 b i
is b
's signed 8-bit integer starting at character index i
.
get_uint16_ne b i
is b
's native-endian unsigned 16-bit integer starting at character index i
.
get_uint16_be b i
is b
's big-endian unsigned 16-bit integer starting at character index i
.
get_uint16_le b i
is b
's little-endian unsigned 16-bit integer starting at character index i
.
get_int16_ne b i
is b
's native-endian signed 16-bit integer starting at character index i
.
get_int16_be b i
is b
's big-endian signed 16-bit integer starting at character index i
.
get_int16_le b i
is b
's little-endian signed 16-bit integer starting at character index i
.
get_int32_ne b i
is b
's native-endian 32-bit integer starting at character index i
.
val seeded_hash : int -> t -> int
A seeded hash function for strings, with the same output value as Hashtbl
.seeded_hash. This function allows this module to be passed as argument to the functor Hashtbl
.MakeSeeded.
get_int32_be b i
is b
's big-endian 32-bit integer starting at character index i
.
get_int32_le b i
is b
's little-endian 32-bit integer starting at character index i
.
get_int64_ne b i
is b
's native-endian 64-bit integer starting at character index i
.
get_int64_be b i
is b
's big-endian 64-bit integer starting at character index i
.
val compare : t -> t -> Ordering.t
val hash : t -> int
val is_empty : t -> bool
val of_list : char list -> t
module Caseless : sig ... end
Case-insensitive matching semantics.
val index : t -> char -> int option
val index_from : t -> int -> char -> int option
val rindex : t -> char -> int option
val rindex_from : t -> int -> char -> int option
Escape ONLY one character. escape
also escapes '\n',... and transforms all chars above '~' into '\xxx' which is not suitable for UTF-8 strings.
val exists : t -> f:(char -> bool) -> bool
val for_all : t -> f:(char -> bool) -> bool
maybe_quoted s
is s
if s
doesn't need escaping according to OCaml lexing conventions and sprintf "%S" s
otherwise.
(* CR-someday aalekseyev: this function is not great: barely anything "needs escaping according to OCaml lexing conventions", so the condition for whether to add the quote characters ends up being quite arbitrary. *)
quote_for_shell s
quotes s
using Filename.quote
if need_quoting s
is true
quote_list_for_shell l
is List.map l ~f:quote_for_shell |> concat ~sep:" "