package regenerate
Library
Module
Module type
Parameter
Class
Class type
include module type of struct include CCString end
Basic String Utils
type 'a klist = unit -> [ `Nil | `Cons of 'a * 'a klist ]
Common Signature
module type S = CCString.S
Strings
include module type of struct include String end
Strings
make n c
is a string of length n
with each index holding the character c
.
Return a new string that contains the same bytes as the given byte sequence.
Return a new byte sequence that contains the same bytes as the given string.
get s i
is the character at index i
in s
. This is the same as writing s.[i]
.
Concatenating
Note. The Stdlib.(^)
binary operator concatenates two strings.
concat sep ss
concatenates the list of strings ss
, inserting the separator string sep
between each.
Predicates and comparisons
starts_with
~prefix s
is true
if and only if s
starts with prefix
.
ends_with
~suffix s
is true
if and only if s
ends with suffix
.
contains_from s start c
is true
if and only if c
appears in s
after position start
.
rcontains_from s stop c
is true
if and only if c
appears in s
before position stop+1
.
contains s c
is String.contains_from
s 0 c
.
Extracting substrings
sub s pos len
is a string of length len
, containing the substring of s
that starts at position pos
and has length len
.
Transforming
fold_left f x s
computes f (... (f (f x s.[0]) s.[1]) ...) s.[n-1]
, where n
is the length of the string s
.
fold_right f s x
computes f s.[0] (f s.[1] ( ... (f s.[n-1] x) ...))
, where n
is the length of the string s
.
trim s
is s
without leading and trailing whitespace. Whitespace characters are: ' '
, '\x0C'
(form feed), '\n'
, '\r'
, and '\t'
.
escaped s
is s
with special characters represented by escape sequences, following the lexical conventions of OCaml.
All characters outside the US-ASCII printable range [0x20;0x7E] are escaped, as well as backslash (0x2F) and double-quote (0x22).
The function Scanf.unescaped
is a left inverse of escaped
, i.e. Scanf.unescaped (escaped s) = s
for any string s
(unless escaped s
fails).
Traversing
Searching
index_from s i c
is the index of the first occurrence of c
in s
after position i
.
index_from_opt s i c
is the index of the first occurrence of c
in s
after position i
(if any).
rindex_from s i c
is the index of the last occurrence of c
in s
before position i+1
.
rindex_from_opt s i c
is the index of the last occurrence of c
in s
before position i+1
(if any).
index s c
is String.index_from
s 0 c
.
index_opt s c
is String.index_from_opt
s 0 c
.
rindex s c
is String.rindex_from
s (length s - 1) c
.
rindex_opt s c
is String.rindex_from_opt
s (length s - 1) c
.
Strings and Sequences
to_seqi s
is like to_seq
but also tuples the corresponding index.
UTF decoding and validations
UTF-8
val get_utf_8_uchar : t -> int -> Uchar.utf_decode
get_utf_8_uchar b i
decodes an UTF-8 character at index i
in b
.
val is_valid_utf_8 : t -> bool
is_valid_utf_8 b
is true
if and only if b
contains valid UTF-8 data.
UTF-16BE
val get_utf_16be_uchar : t -> int -> Uchar.utf_decode
get_utf_16be_uchar b i
decodes an UTF-16BE character at index i
in b
.
val is_valid_utf_16be : t -> bool
is_valid_utf_16be b
is true
if and only if b
contains valid UTF-16BE data.
UTF-16LE
val get_utf_16le_uchar : t -> int -> Uchar.utf_decode
get_utf_16le_uchar b i
decodes an UTF-16LE character at index i
in b
.
val is_valid_utf_16le : t -> bool
is_valid_utf_16le b
is true
if and only if b
contains valid UTF-16LE data.
Deprecated functions
create n
returns a fresh byte sequence of length n
. The sequence is uninitialized and contains arbitrary bytes.
fill s pos len c
modifies byte sequence s
in place, replacing len
bytes by c
, starting at pos
.
Return a copy of the argument, with all lowercase letters translated to uppercase, including accented letters of the ISO Latin-1 (8859-1) character set.
Return a copy of the argument, with all uppercase letters translated to lowercase, including accented letters of the ISO Latin-1 (8859-1) character set.
Return a copy of the argument, with the first character set to uppercase, using the ISO Latin-1 (8859-1) character set..
Return a copy of the argument, with the first character set to lowercase, using the ISO Latin-1 (8859-1) character set.
Binary decoding of integers
The functions in this section binary decode integers from strings.
All following functions raise Invalid_argument
if the characters needed at index i
to decode the integer are not available.
Little-endian (resp. big-endian) encoding means that least (resp. most) significant bytes are stored first. Big-endian is also known as network byte order. Native-endian encoding is either little-endian or big-endian depending on Sys.big_endian
.
32-bit and 64-bit integers are represented by the int32
and int64
types, which can be interpreted either as signed or unsigned numbers.
8-bit and 16-bit integers are represented by the int
type, which has more bits than the binary encoding. These extra bits are sign-extended (or zero-extended) for functions which decode 8-bit or 16-bit integers and represented them with int
values.
get_uint8 b i
is b
's unsigned 8-bit integer starting at character index i
.
get_int8 b i
is b
's signed 8-bit integer starting at character index i
.
get_uint16_ne b i
is b
's native-endian unsigned 16-bit integer starting at character index i
.
get_uint16_be b i
is b
's big-endian unsigned 16-bit integer starting at character index i
.
get_uint16_le b i
is b
's little-endian unsigned 16-bit integer starting at character index i
.
get_int16_ne b i
is b
's native-endian signed 16-bit integer starting at character index i
.
get_int16_be b i
is b
's big-endian signed 16-bit integer starting at character index i
.
get_int16_le b i
is b
's little-endian signed 16-bit integer starting at character index i
.
get_int32_ne b i
is b
's native-endian 32-bit integer starting at character index i
.
get_int32_be b i
is b
's big-endian 32-bit integer starting at character index i
.
get_int32_le b i
is b
's little-endian 32-bit integer starting at character index i
.
get_int64_ne b i
is b
's native-endian 64-bit integer starting at character index i
.
get_int64_be b i
is b
's big-endian 64-bit integer starting at character index i
.
pad n str
ensures that str
is at least n
bytes long, and pads it on the side
with c
if it's not the case.
val of_gen : char gen -> string
Convert a gen
of characters to a string.
val of_iter : char iter -> string
Convert a iter
of characters to a string.
val of_std_seq : char Seq.t -> string
Convert a sequence
of characters to a string.
val of_seq : char sequence -> string
val of_klist : char klist -> string
Find sub
in string, returns its first index or -1
.
val find_all : ?start:int -> sub:string -> string -> int gen
find_all ~sub s
finds all occurrences of sub
in s
, even overlapping instances.
find_all_l ~sub s
finds all occurrences of sub
in s
and returns them in a list.
Find sub
in string from the right, returns its first index or -1
. Should only be used with very small sub
.
replace ~sub ~by s
replaces some occurrences of sub
by by
in s
.
is_sub ~sub i s j ~sub_len
returns true
iff the substring of sub
starting at position i
and of length sub_len
is a substring of s
starting at position j
.
chop_prefix ~pre s
removes pre
from s
if pre
really is a prefix of s
, returns None
otherwise.
chop_suffix ~suf s
removes suf
from s
if suf
really is a suffix of s
, returns None
otherwise.
val lines_gen : string -> string gen
lines_gen s
returns a generator of the lines of s
(splits along '\n').
val concat_gen : sep:string -> string gen -> string
concat_gen ~sep g
concatenates all strings of g
, separated with sep
.
val unlines_gen : string gen -> string
unlines_gen g
concatenates all strings of g
, separated with '\n'.
set s i c
creates a new string which is a copy of s
, except for index i
, which becomes c
.
Alias to String.iter
.
filter_map f s
calls (f a0) (f a1) ... (f an)
where a0 ... an
are the characters of s. It returns the string of characters ci
such as f ai = Some ci
(when f
returns None
, the corresponding element of s
is discarded).
Map each chars to a string, then concatenates them all.
include S with type t := string
val blit : string -> int -> Bytes.t -> int -> int -> unit
Like String.blit
. Compatible with the -safe-string
option.
Conversions
val to_gen : string -> char CCString.gen
Return the gen
of characters contained in the string.
val to_iter : string -> char CCString.iter
Return the iter
of characters contained in the string.
val to_std_seq : string -> char Seq.t
to_std_seq s
returns a Seq.t
of the bytes in s
.
val to_klist : string -> char CCString.klist
val pp_buf : Buffer.t -> string -> unit
Renamed from pp
since 2.0.
drop_while f s
discards any characters starting from the left, up to the first character c
not satisfying f c
.
rdrop_while f s
discards any characters starting from the right, up to the first character c
not satisfying f c
.
Trim space on the left (see String.trim
for more details).
Trim space on the right (see String.trim
for more details).
Operations on 2 strings
Iterate on pairs of chars with their index.
All pairs of chars respect the predicate?
Ascii functions
Those functions are deprecated in String
since 4.03, so we provide a stable alias for them even in older versions.
See String
.
See String
.
See String
.
See String
.
Finding
A relatively efficient algorithm for finding sub-strings.
module Find = CCString.Find
Splitting
module Split = CCString.Split
Alias to Split.list_cpy
.
Utils
compare_versions a b
compares version strings a
and b
, considering that numbers are above text.
Natural Sort Order, comparing chunks of digits as natural numbers. https://en.wikipedia.org/wiki/Natural_sort_order
Edition distance between two strings. This satisfies the classical distance axioms: it is always positive, symmetric, and satisfies the formula distance a b + distance b c >= distance a c
.
Slices
A contiguous part of a string
module Sub = CCString.Sub
val pp : Format.formatter -> string -> unit