package inquire
Install
dune-project
Dependency
Authors
Maintainers
Sources
sha256=0b88d89e24d4cbc0560a7c8d8ec51388990e1b27f24685029997afa52a7c720f
sha512=8b62860a8d15e41528a404a6f1b9968c3d79755607b5ea319af2e3e45516e672a785361d278279910928db4054e1800e87bcee0210ff3eabfb330713b368c827
doc/inquire.zed/Zed_utf8/index.html
Module Zed_utf8
UTF-8 enoded strings
Invalid(error, text) Exception raised when an invalid UTF-8 encoded string is encountered. text is the faulty text and error is a description of the first error in text.
Exception raised when trying to access a character which is outside the bounds of a string.
Validation
val check : t -> check_resultcheck str checks that str is a valid UTF-8 encoded string.
val validate : t -> intSame as check but raises an exception in case the argument is not a valid text, otherwise returns the length of the string.
val next_error : t -> int -> int * int * stringnext_error str ofs returns (ofs', count, msg) where ofs' is the offset of the start of the first invalid sequence after ofs (inclusive) in str, count is the number of unicode character between ofs and ofs' (exclusive) and msg is an error message. If there is no error until the end of string then ofs is String.length str and msg is the empty string.
Construction
singleton ch creates a string of length 1 containing only the given character.
init n f returns the contenation of singleton (f 0), singleton (f 1), ..., singleton (f (n - 1)).
rev_init n f returns the contenation of singleton (f (n - 1)), ..., singleton (f 1), singleton (f 0).
Informations
val length : t -> intReturns the length of the given string.
Comparison
Random access
String manipulation
sub str ofs len Returns the sub-string of str starting at ofs and of length len.
break str pos returns the sub-strings before and after pos in str. It is more efficient than creating two sub-strings with sub.
remove str pos len removes the len characters at position pos in str
replace str pos len repl replaces the len characters at position pos in str by repl.
Tranformation
concat sep l returns the concatenation of all strings of l separated by sep.
concat sep l returns the concatenation of all strings of l in reverse order separated by sep.
rev_explode str returns the list of all characters of str in reverse order.
rev_implode l is the same as implode (List.rev l) but more efficient.
Text traversals
iter f str applies f an all characters of str starting from the left.
rev_iter f str applies f an all characters of str starting from the right.
fold f str acc applies f on all characters of str starting from the left, accumulating a value.
rev_fold f str acc applies f on all characters of str starting from the right, accumulating a value.
rev_map f str maps all characters of str with f in reverse order.
map f str maps all characters of str with f and concatenate the result.
rev_map f str maps all characters of str with f in reverse order and concatenate the result.
rev_filter f str filters characters of str with f in reverse order.
filter_map f str filters and maps characters of str with f.
rev_filter_map f str filters and maps characters of str with f in reverse order.
filter_map f str filters and maps characters of str with f and concatenate the result.
rev_filter_map f str filters and maps characters of str with f in reverse order and concatenate the result.
Scanning
for_all f text returns whether all characters of text verify the predicate f.
exists f text returns whether at least one character of text verify f.
count f text returhs the number of characters of text verifying f.
Tests
Stripping
strip ?predicate text returns text without its firsts and lasts characters that match predicate. predicate default to testing whether the given character has the `White_Space unicode property. For example:
strip "\n foo\n " = "foo"lstrip ?predicate text is the same as strip but it only removes characters at the left of text.
lstrip ?predicate text is the same as strip but it only removes characters at the right of text.
Buffers
add buf ch is the same as Buffer.add_string buf (singleton ch) but is more efficient.
Escaping
escaped_char ch returns a string containg ch or an escaped version of ch if:
chis a control character (code < 32)chis the character with code 127chis a non-ascii, non-alphabetic character
It uses the syntax \xXX, \uXXXX, \UXXXXXX or a specific escape sequence \n, \r, ....
add_escaped_char buf ch is the same as Buffer.add_string buf (escaped_char ch) but a bit more efficient.
add_escaped_char buf text is the same as Buffer.add_string buf (escaped text) but a bit more efficient.
val escaped_string : Uutf.encoding -> string -> tescaped_string enc str escape the string str which is encoded with encoding enc. If decoding str with enc fails, it escape all non-printable bytes of str with the syntax \yAB.
val add_escaped_string : Buffer.t -> Uutf.encoding -> string -> unitadd_escaped_char buf enc text is the same as Buffer.add_string buf (escaped_string enc text) but a bit more efficient.
Safe offset API
val next : t -> int -> intnext str ofs returns the offset of the next character in str.
val prev : t -> int -> intprev str ofs returns the offset of the previous character in str.
extract_next str ofs returns the code-point at offset ofs in str and the offset of the next character.
extract_prev str ofs returns the code-point at the previous offset in str and this offset.
Unsafe offset API
These functions does not check that the given offset is inside the bounds of the given string.
val unsafe_next : t -> int -> intunsafe_next str ofs returns the offset of the next character in str.
val unsafe_prev : t -> int -> intunsafe_prev str ofs returns the offset of the previous character in str.
unsafe_extract str ofs returns the code-point at offset ofs in str.
unsafe_extract_next str ofs returns the code-point at offset ofs in str and the offset the next character.