Page
Library
Module
Module type
Parameter
Class
Class type
Source
TextUTF-8 encoded strings
This modules is intended for ``text'' manipulation. By text we mean sequence of unicode characters.
For compatibility and simplicity reasons, text is represented by UTF-8 encoded strings, and there is no special types for unicode characters, whose are just represented by 1-length text.
All functions of this module expect to by applied on valid UTF-8 encoded strings, and may raise Invalid if this is not the case.
Invalid(error, text) Exception raised when an invalid UTF-8 encoded string is encountered. text is the faulty text and error is a description of the first error in text.
check str checks that str is a valid UTF-8 encoded string. Returns None if it is the case, or Some error otherwise.
Same as check but raises an exception in case the argument is not a valid text.
val encode : ?encoding:Encoding.t -> t -> stringencode ?encoding txt encode the given text with encoding, which defaults to Encoding.system plus transliteration.
val decode : ?encoding:Encoding.t -> string -> tdecode ?encoding str decode the given string encoded in encoding, which defaults to Encoding.system
to_ascii txt returns an approximative ascii version of txt. This is the same as encode ~encoding:"ASCII//TRANSLIT" txt
val length : t -> intReturn the number of unicode character contained in the given text
val code : t -> intcode text returns the unicode code-point of first character of text.
For example:
code "A" = 65code "é" = 0xe9val char : int -> tchar code returns the character corresponding to the given unicode code-point.
For example:
char 65 = "A"char 0xe9 = "é"get text n returns the n-th character of text. n is a number of unicode character, not bytes. A negative value is interpreted as a position from the end of the text.
For example:
get "abc" 0 = "a"get "abc" 2 = "c"get "aéb" 1 = "é"get "aéb" 2 = "b"nth "abc" (-1) = "c"sub text pos len Returns the sub-text of text starting at position pos and of length len. pos and/or len may be negative.
For example:
sub "ocaml" 1 2 = "ca"sub "ocaml" 3 (-2) = "ca"sub "ocaml" (-2) 1 = "m"slice text a b returns the text contained in txt between a and b (exlusive). a and/or b may be negative.
For example:
slice "abc" 0 1 = "a"slice "abcdef" 1 (-1) = "bcde"splice text a b repl replace the text between a and b (exclusive) by repl.
For example:
splice "abcd" 1 2 "plop" = "aplopcd"splice "abcd" 1 2 "" = "acd"transform str transforms str in a way such that comparing two string str1 and str2 transformed with Pervasives.compare give the same result as comparing them with compare.
rev t returns the sequence of characters of t in reverse order.
For example:
rev "ocaml" = "lmaco"rev "héhé" = "éhéh"concat sep l returns the concatenation of all texts contained in l, separated by sep.
For example:
concat "/" ["a"; "b"; "c"] = "a/b/c"rev_concat sep l returns the concatenation of all texts contained in l, separated by sep.
For example:
concat "/" ["a"; "b"; "c"] = "c/b/a"explode txt returns the list of all characters of txt.
For example:
explode "" = []explode "abé" = ["a"; "b"; "é"]rev_explode txt returns the list of all characters of txt, in reverse order.
For example:
rev_explode "ocaml" = ["l"; "m"; "a"; "c"; "o"]implode l returns the concatenation of all texts contained in l. This is the same as concat "" l, but a bit more efficient.
For example:
implode ["o"; "c"; "a"; "m"; "l"] = "ocaml"implode ["abc"; "def"] = "abcdef"rev_implode l returns the concatenation of all texts contained in l, in reverse order.
For example:
implode ["o"; "c"; "a"; "m"; "l"] = "lmaco"implode ["abc"; "def"] = "defabc"The following functions tests whether all characters of the given text verify a property:
val is_ascii : t -> boolval is_alnum : t -> boolval is_alpha : t -> boolval is_blank : t -> boolval is_cntrl : t -> boolval is_digit : t -> boolval is_graph : t -> boolval is_lower : t -> boolval is_print : t -> boolval is_punct : t -> boolval is_space : t -> boolval is_upper : t -> boolval is_xdigit : t -> boolFor all the following functions we give a equivalent implementation, and examples. They have the same semantic as the equivalent implementation but are more efficient.
map f text ~ implode (List.map f (explode text))
map (function "a" -> "x" | t -> t) "abc" = "xbc"
rev_map f text ~ implode (List.rev_map f (explode text))
rev_map (function "a" -> "x" | t -> t) "abc" = "cbx"
fold f x text ~ List.fold_left f x (explode text)
fold (fun acc t -> acc + code t) 0 "ABC" = 198
fold f text x ~ List.fold_left f x (rev_explode text)
rev_fold (fun t acc -> acc + code t) "ABC" 0 = 198
filter text ~ implode (List.filter f (explode text))
filter is_alpha "1a2E" = "aE"
rev_filter text ~ implode (List.filter f (rev_explode text))
rev_filter is_alpha "1a2E" = "Ea"
for_all f text returns whether all characters of text verify the predicate f
exists f text returns whether at least one character of text verify f
count f text returhs the number of characters of text verifying f
Returns all words of the given text. Words are sequence of non-space and non-punct characters.
Returns all lines of the given text, without end of line characters. Both "\r\n" and "\n" are recognized as end of line delimiters.
split ?max ?sep text split text according to sep. If max is specified, returns at most max splits. sep defaults to " ".
For example:
split ~sep:"/" "a/b/c" = ["a"; "b"; "c"]split ~sep:".." "a..b..c" = ["a"; "b"; "c"]split ~max:1 "a b c" = ["a b c"]split ~max:2 "a b c" = ["a"; "b c"]rev_split ?max text sep split text according to sep in reverse order.
For example:
split ~sep:"/" "a/b/c" = ["c"; "b"; "a"]split ~max:1 "a b c" = ["a b c"]split ~max:2 "a b c" = ["a b"; "c"]rev_split ~max:2 ~sep:"." "toto.mli" = ["toto"; "mli"]replace text ~patt ~repl replace all occurences of patt in text by repl.
For example:
replace "abcd" ~patt:"b" ~repl:"x" = "axcd"replace "Hello world!" ~patt:"world" ~repl:"you" = "Hello you!"starts_with text prefix returns true iff s starts with prefix.
For example:
starts_with "abcd" "ab" = truestarts_with "abcd" "af" = falsestarts_with "ab" "abcd" = falseends_with s suffix returns true iff s ends with suffix.
For example:
ends_with "abcd" "cd" = trueends_with "abcd" "hd" = falseends_with "ab" "abc" = falsestrip ?chars text removes all characters of text which are part of chars at the right and left. chars defaults to whitespaces.
rstrip ?chars text removes all characters of text which are part of chars at the right.
lstrip ?chars text removes all characters of text which are part of chars at the left.
Since characters are not encoded by a fixed number of bytes, accessing them by character position is not efficient. The following functions allow you to iterates in a string in an efficient way.
pointer_at txt n returns a pointer to the character at position n in txt.
next ptr if ptr is at the end of text, returns None, otherwise, returns Some(ch, ptr') where ch is the character at current position and ptr' is the pointer to the next character of the text.
prev ptr if ptr is at the beginning of text, returns None, otherwise, returns Some(ch, ptr') where ptr' points to the previous character and ch is the character at ptr'.
move n ptr moves ptr by n unicode characters. If n < 0 then ptr is moved to the left. Raises Invalid_argument if the result is outside the text.
chunk a b returns the chunk of text between a and b. Raises Invalid_arugment if a or b.
val offset : pointer -> intoffset ptr returns the position in bytes of ptr
val position : pointer -> intposition ptr returns the position in unicode character of ptr
equal_at ptr str returns wether ptr points to a substring equal to str
find ?from text patt returns a pointer to the first occurrence of patt in text.