Bitvector -- an integer with modular arithmentics.
Overview
A numeric value with the 2-complement binary representation. It is good for representing addresses, offsets and other arithmetic values.
Each value is attributed by a bitwidth and signedness. All arithmetic operations over values are done modulo their widths. It is an error to apply arithmetic operation to values with different widths. Default implementations will raise a an exception, however there exists a family of modules that provide arithmetic operations lifted to an Or_error.t
monad. It is suggested to use them, if you know what kind of operands you're expecting.
Clarification on signs
By default, all are numbers represented with bitvectors are considered unsigned. This includes the ordering, e.g., of_int
(-1) ~width:32
is greater than of_int 0 ~width:32
. If you need to perform a signed operation, you can use the signed
operator create a signed word with the same value.
If any operand of a binary operation is signed, then a signed version of an operation is used, i.e., the other operand is upcasted to the signed kind.
Remember to use explicit casts, whenever you really need a signed representation. Examples:
let x = of_int ~-6 ~width:8
let y = to_int x (* y = 250 *)
let z = to_int (signed x) (* z = ~-6 *)
let zero = of_int 0 ~width:8
let p = x < zero (* p = false *)
let q = signed x < zero (* p = true *)
Clarification on size-morphism
Size-monomorphic operations (as opposed to size-polymorphic) expect operands of the same size. When applied to operands of different sizes they either raise exceptions or return an Error
variant as the result. All arithmetic operations are size-monomorphic and we provide interface that use either exceptions or Result.t
to indicate the outcome.
The comparison operation is size-polymorphic by default and takes the size of the bitvector into account. Bitvectors with equal values but different sizes are unequal. The precise order matches with the order of pairs, where the first constituent is the bitvector value, and the second is its size, for example, the following sequence is in an ascending order:
0x0:1, 0x0:32, 0x0:64, 0x1:1, 0x1:32, 0xD:4, 0xDEADBEEF:32
.
A size-monomorphic interfaced is exposed in a Mono
submodule. So if you want a monomorphic map, then just use Mono.Map
module. Note, Mono
submodule doesn't provide Table
, since we cannot guarantee that all keys in a hash-table have equal size. The order functions provided by the Mono module will raise an exception when applied to bitvectors with different sizes.
In the default and Mono
orders, if either of two values is signed (see Clarification on signs) then the values will be ordered as 2-complement signed integers.
Another alternative orders are Signed_value_order
, Unsigned_value_order
, and Literal_order
. They will be briefly described below.
Signed_value_order
is size-polymoprhic and it simply ignores the sizes of bitvectors and orders them by values, e.g., the following bitvectors are ordered in the Value.Signed
order, FF:8; 0:1; 0F:8; FF:32
, and 0:1
is equal to 0:32
. See Clarification on size-morphism for more details on the signedness of operations. Note, that the size of a word still affects the order since it defines the position of the most significant bit.
Unsigned_value_order
ignores the sign and the size of words and compares them by the unsigned order of their values. he following numbers are ordered with the Unsigned_value_order
order, 0:1, 1:32, 0F:8 FF:8
, and FF:32
is equal to FF:8
. Unsigned_value_order
is faster than then any previously described order and is useful when the size of the words should be ignored (or is known to be equal and therefore could be ignored).
Literal_order
is the fastest order that takes into account all constituents of bitvectors, like if we will treat a bitvector as triple of its value, size, and sign and order bitvectors using the lexicographical order.
Clarification on string representation
As a part of Identifiable
interface bitvector provides a pair of complement functions: to_string
and of_string
, that provides facilities to store bitvector as a human readable string, and to restore it from string. The format of the representation is the following (in EBNF):
repr = [sign], [base], digit, {digit}, ":", size, [kind]
sign = "+" | "-";
base = "0x" | "0b" | "0o";
size = dec, {dec};
digit = dec | oct | hex;
dec = ?decimal digit?;
oct = ?octal digit?;
hex = ?hexadecimal digit?;
kind = u | s
Examples: 0x5D:32s, 0b0101:16u, 5:64, +5:8, +0x5D:16
.
If base
is omitted base-10 is assumed. If the kind is omitted, then the usigned kind is assumed. The output format is always in a hex representation with a full prefix. .
word
is an abbreviation to Bitvector.t
Common Interfaces
A bitvector is a value, first of all, so it supports a common set of a value interface: it can be stored, compared, it can be a key in a dictionary, etc. Moreover, being a number it can be compared with zero and applied to a common set of integer operations.
include Regular.Std.Regular.S with type t := t
include Core_kernel.Bin_prot.Binable.S with type t := t
include Bin_prot.Binable.S_only_functions with type t := t
This function only needs implementation if t
exposed to be a polymorphic variant. Despite what the type reads, this does *not* produce a function after reading; instead it takes the constructor tag (int) before reading and reads the rest of the variant t
afterwards.
include Regular.Std.Printable.S with type t := t
val to_string : t -> string
to_string x
returns a human-readable representation of x
val str : unit -> t -> string
str () t
is formatted output function that matches "%a" conversion format specifier in functions, that prints to string, e.g., sprintf
, failwithf
, errorf
and, surprisingly all Lwt
printing function, including Lwt_io.printf
and logging (or any other function with type ('a,unit,string,...) formatN`. Example:
Or_error.errorf "type %a is not valid for %a"
Type.str ty Exp.str exp
val pps : unit -> t -> string
will print to a standard output_channel
, useful for using in printf
, fprintf
, etc.
prints a sequence of values of type t
this will include pp
function from Core
that has type t printer
, and can be used in Format.printf
family of functions
include Core_kernel.Pretty_printer.S with type t := t
include Core_kernel.Comparable.S_binable with type t := t
include Base.Comparable.S with type t := t
include Base.Comparisons.S with type t := t
val equal : t -> t -> bool
val compare : t -> t -> int
compare t1 t2
returns 0 if t1
is equal to t2
, a negative integer if t1
is less than t2
, and a positive integer if t1
is greater than t2
.
val ascending : t -> t -> int
ascending
is identical to compare
. descending x y = ascending y x
. These are intended to be mnemonic when used like List.sort ~compare:ascending
and List.sort
~cmp:descending
, since they cause the list to be sorted in ascending or descending order, respectively.
val descending : t -> t -> int
val between : t -> low:t -> high:t -> bool
between t ~low ~high
means low <= t <= high
val clamp_exn : t -> min:t -> max:t -> t
clamp_exn t ~min ~max
returns t'
, the closest value to t
such that between t' ~low:min ~high:max
is true.
Raises if not (min <= max)
.
include Regular.Std.Data.S with type t := t
type info = string * [ `Ver of string ] * string option
name,Ver v,desc
information attached to a particular reader or writer.
Data representation version. After any change in data representation the version should be increased.
Serializers that are derived from a data representation must have the same version as a version of the data structure, from which it is derived. This kind of serializers can only read and write data of the same version.
Other serializers can actually read and write data independent on its representation version. A serializer, that can't store data of current version simply shouldn't be added to a set of serializers.
It is assumed, that if a reader and a writer has the same name and version, then whatever was written by the writer should be readable by the reader. The round-trip equality is not required, thus it is acceptable if some information is lost.
It is also possible, that a reader and a writer that has the same name are compatible. In that case it is recommended to use semantic versioning.
val size_in_bytes : ?ver:string -> ?fmt:string -> t -> int
size_in_bytes ?ver ?fmt datum
returns the amount of bytes that is needed to represent datum
in the given format and version
of_bytes ?ver ?fmt bytes
deserializes a value from bytes.
to_bytes ?ver ?fmt datum
serializes a datum
to a sequence of bytes.
blit_to_bytes ?ver ?fmt buffer datum offset
copies a serialized representation of datum into a buffer
, starting from the offset
.
of_bigstring ?ver ?fmt buf
deserializes a datum from bigstring
of_bigstring ?ver ?fmt datum
serializes a datum to a sequence of bytes represented as bigstring
blit_to_bigstring ?ver ?fmt buffer datum offset
copies a serialized representation of datum into a buffer
, starting from offset
.
Input/Output functions for the given datum.
module Cache : sig ... end
add_reader ?desc ~ver name reader
registers a new reader
with a provided name
, version ver
and optional description desc
add_writer ?desc ~ver name writer
registers a new writer
with a provided name
, version ver
and optional description desc
val available_readers : unit -> info list
available_reader ()
lists available readers for the data type
val default_reader : unit -> info
default_reader
returns information about default reader
val set_default_reader : ?ver:string -> string -> unit
set_default_reader ?ver name
sets new default reader. If version is not specified then the latest available version is used. Raises an exception if a reader with a given name doesn't exist.
val with_reader : ?ver:string -> string -> (unit -> 'a) -> 'a
with_reader ?ver name operation
temporary sets a default reader to a reader with a specified name and version. The default reader is restored after operation
is finished.
val available_writers : unit -> info list
available_writer ()
lists available writers for the data type
val default_writer : unit -> info
default_writer
returns information about the default writer
val set_default_writer : ?ver:string -> string -> unit
set_default_writer ?ver name
sets new default writer. If version is not specified then the latest available version is used. Raises an exception if a writer with a given name doesn't exist.
val with_writer : ?ver:string -> string -> (unit -> 'a) -> 'a
with_writer ?ver name operation
temporary sets a default writer to a writer with a specified name and version. The default writer is restored after operation
is finished.
val default_printer : unit -> info option
default_writer
optionally returns an information about default printer
val set_default_printer : ?ver:string -> string -> unit
set_default_printer ?ver name
sets new default printer. If version is not specified then the latest available version is used. Raises an exception if a printer with a given name doesn't exist.
val with_printer : ?ver:string -> string -> (unit -> 'a) -> 'a
with_printer ?ver name operation
temporary sets a default printer to a printer with a specified name and version. The default printer is restored after operation
is finished.
Low level access to serializers
find_reader ?ver name
lookups a reader with a given name. If version is not specified, then a reader with maximum version is returned.
find_writer ?ver name
lookups a writer with a given name. If version is not specified, then a writer with maximum version is returned.
Bitvector implements a common set of operations that are expected from integral values.
include Integer.S with type t := t
include Integer.Base with type t := t
abs x
absolute value of x
lnot x
is a logical negation of x
(1-complement)
logand x y
is a conjunction of x
and y
logand x y
is a conjunction of x
and y
logor x y
is a disjunction of x
and y
logxor x y
is exclusive or between x
and y
lshift x y
shift x
by y
bits left
rshift x y
shift x
by y
bits to the right
val arshift : t -> t -> t
arshift x y
shift x
by y
bits to the right and fill with the sign bit.
A common set of infix operators
The comparable interface with size-monomorphic comparison.
Compare by value, ignore size, but take into account the sign.
Compare by value, ignore both the size and the sign.
The lexicographical order of (value,size,sign) triples.
type endian =
| LittleEndian
least significant byte comes first
| BigEndian
most significant byte comes first
Specifies the order of bytes in a word.
Constructors
val create : Bitvec.t -> int -> t
create v w
creates a word from bitvector v
of width w
.
code_addr t x
uses target's address size to create a word.
Same as create x (Theory.Target.code_addr_size t)
.
data_addr t x
uses target's code address size to create a word.
Same as create x (Theory.Target.data_addr_size t)
.
data_word t x
uses target's word size to create a word.
Same as create x (Theory.Target.bits t)
.
val of_string : string -> t
of_bool x
is a bitvector with length 1
and value b0
if x
is false and b1
otherwise.
val of_int : width:int -> int -> t
of_int ~width n
creates a bitvector of the specified bit-width
with the value equal to n
. If bits of the n
that doesn't fit into width
are ignored.
val of_int32 : ?width:int -> int32 -> t
of_int32 ?width n
creates a bitvector of the specified bit-width
with the value equal to n
. If bits of the n
that doesn't fit into width
are ignored. Parameter width
defaults to 32
.
val of_int64 : ?width:int -> int64 -> t
of_int32 ?width n
creates a bitvector of the specified bit-width
with the value equal to n
. If bits of the n
that doesn't fit into width
are ignored. Parameter width
defaults to 32
.
Some predefined constant constructors
Helpful shortcuts
one width
number one with a specified width
, is a shortcut for of_int 1 ~width
zero width
zero with a specified width
, is a shortcut for of_int 0 ~width
ones width
is a number with a specified width
, and all bits set to 1. It is a shortcut for lnot (zero width)
val of_binary : ?width:int -> endian -> string -> t
of_binary ?width endian num
creates a bitvector from a string interpreted as a sequence of bytes in a specified order.
The result is always positive and unsigned. The num
argument is not shared. width
defaults to the length of num
in bits, i.e. 8 * String.length num
.
Conversions to OCaml built in integer types
val to_bitvec : t -> Bitvec.t
to_bitvec x
returns a Bitvec represenation of x
to_int x
projects x
in to OCaml int
.
to_int32 x
projects x
in to int32
to_int64 x
projects x
in to int64
val to_int_exn : t -> int
to_int_exn x
projects x
in to OCaml int
.
val to_int32_exn : t -> int32
to_int32_exn x
projects x
in to int32
val to_int64_exn : t -> int64
to_int64_exn x
projects x
in to int64
val pp : Stdlib.Format.formatter -> t -> unit
printf "%a" pp x
prints x
into a formatter. This is a default printer, controlled by set_default_printer
. Multiple formats are available, see the available_writers
for the actual list of formats and a format description. Out of box it defaults to pp_hex_full
. Note, the printf
function from examples refers to the Format.printf
, thus it is assumed that the Format
module is open in the scope.
val pp_hex : Stdlib.Format.formatter -> t -> unit
printf "%a" pp_hex x
prints x
in the hexadecimal format omitting suffixes, and the prefix if it is not necessary. Example,
# printf "%a\n" pp_hex (Word.of_int32 0xDEADBEEFl);;
0xDEADBEEF
# printf "%a\n" pp_hex (Word.of_int32 0x1);;
1
val pp_dec : Stdlib.Format.formatter -> t -> unit
printf "%a" pp_dec x
prints x
in the decimal format omitting suffixes and prefixes. Example,
# printf "%a\n" pp_dec (Word.of_int32 0xDEADBEEFl);;
3735928559
# printf "%a\n" pp_dec (Word.of_int32 0x1);;
1
val pp_oct : Stdlib.Format.formatter -> t -> unit
printf "%a" pp_oct x
prints x
in the octal format omitting suffixes, and the prefix if it is not necessary. Example,
# printf "%a\n" pp_oct (Word.of_int32 0xDEADBEEFl);;
0o33653337357
# printf "%a\n" pp_oct (Word.of_int32 0x1);;
1
val pp_bin : Stdlib.Format.formatter -> t -> unit
printf "%a" pp_bin x
prints x
in the binary (0 and 1) format omitting suffixes, and the prefix if it is not necessary. Example,
# printf "%a\n" pp_bin (Word.of_int32 0xDEADBEEFl);;
0b11011110101011011011111011101111
# printf "%a\n" pp_bin (Word.of_int32 0x1);;
1
val pp_hex_full : Stdlib.Format.formatter -> t -> unit
printf "%a" pp_hex_full x
prints x
in the hexadecimal format with suffixes, and the prefix if it is necessary. Example,
# printf "%a\n" pp_hex_full (Word.of_int32 0xDEADBEEFl);;
0xDEADBEEF:32u
# printf "%a\n" pp_hex_full (Word.of_int32 0x1);;
1:32u
val pp_dec_full : Stdlib.Format.formatter -> t -> unit
printf "%a" pp_dec_full x
prints x
in the decimal format with suffixes and prefixes. Example,
# printf "%a\n" pp_dec_full (Word.of_int32 0xDEADBEEFl);;
3735928559:32u
# printf "%a\n" pp_dec_full (Word.of_int32 0x1);;
1:32u
val pp_oct_full : Stdlib.Format.formatter -> t -> unit
printf "%a" pp_oct_full x
prints x
in the octal format with suffixes, and the prefix if it is necessary. Example,
# printf "%a\n" pp_oct_full (Word.of_int32 0xDEADBEEFl);;
0o33653337357:32u
# printf "%a\n" pp_oct_full (Word.of_int32 0x1);;
1:32u
val pp_bin_full : Stdlib.Format.formatter -> t -> unit
printf "%a" pp_bin_full x
prints x
in the binary (0 and 1) format omitting suffixes, and the prefix if it is necessary. Example,
# printf "%a\n" pp_bin_full (Word.of_int32 0xDEADBEEFl);;
0b11011110101011011011111011101111:32u
# printf "%a\n" pp_bin_full (Word.of_int32 0x1);;
1:32u
val pp_generic :
?case:[ `upper | `lower ] ->
?prefix:[ `auto | `base | `none | `this of string ] ->
?suffix:[ `none | `full | `size ] ->
?format:[ `hex | `dec | `oct | `bin ] ->
Stdlib.Format.formatter ->
t ->
unit
pp_generic ?case ?prefix ?suffix ?format ppf x
- a printer to suit all tastes.
Note: this is a generic printer factory that should be used if none of the nine preinstantiated suits you.
val string_of_value : ?hex:bool -> t -> string
string_of_value ?hex x
returns a textual representation of the x
value, i.e., ignores size and signedness. If hex
is true
(default), then it is in the hexadecimal representation, otherwise the decimal representation is used. The returned value is not prefixed. No leading zeros are printed. If a value is signed and negative, then a leading negative sign is printed. Hexadecimal letter literals are printed in a lowercase format.
signed t
casts t to a signed type, so that any operations applied on t
will be signed.
unsigned t
casts t
to an unsigned type, so that any operations applied to it will interpret t
as an unsigned word.
is_zero bv
is true iff all bits are set to zero.
is_ones bv
is true if the least significant bit is equal to one
bitwidth bv
return a bit-width, i.e., the amount of bits
extract bv ~hi ~lo
extracts a subvector from bv
, starting from bit hi
and ending with lo
. Bits are enumerated from right to left (from least significant to most), starting from zero. hi
maybe greater than size
.
hi
defaults to width bv - 1
lo
defaults to 0
.
Example:
extract (of_int 17 ~width:8) ~hi:4 ~lo:3
will result in a two bit vector consisting of the forth and third bits, i.e., equal to a number 2
.
lo
and hi
should be non-negative numbers. lo
must be less then a width bv
and hi >= lo
.
extract_exn bv ~hi ~lo
is the same as extract
, but will raise an exception on error.
concat b1 b2
concatenates two bitvectors
succ n
returns next value after n
. It is not guaranteed that signed (succ n) > signed n
pred n
returns a value preceding n
.
val nsucc : t -> int -> t
nsucc m n
is Fn.apply_n_times ~n succ m
, but more efficient.
val npred : t -> int -> t
npred m n
is Fn.apply_n_times ~n pred addr
, but more efficient.
gcd x y
is the greatest common divisor of x
and y
in the integers. Note that this is not always the greatest common divisor in the bitvectors of fixed length. For example, in the 32-bit unsigned integers, 2 = 2 + 2^32 = 2(1 + 2^31)
. Thus, 1 + 2^31
is a divisor of 2
, even though gcd 2 2 = 2
. Two properties that still hold are: 1. Both x
and y
are multiples of gcd x y
, and 2. gcd x y <= min (abs x) (abs y)
lcm x y
is the least common multiple of x
and y
in the integers. Note that, like gcd x y
, this is not always the least common multiple of x
and y
in the fixed- length bitvectors. See the gcd
documentation for an example. The result of this function will always be some common multiple of the inputs, even in the fixed-width bitvectors.
gcdext x y
returns (g, s, t)
where g = gcd x y
and g = s*x + t*y
. See the documentation for gcd x y
for why this function is tricky to use.
val gcd_exn : t -> t -> t
gcd_exn x y
is the same as gcd
, but will raise an exception on error.
val lcm_exn : t -> t -> t
lcm_exn x y
is the same as lcm
, but will raise an exception on error.
val gcdext_exn : t -> t -> t * t * t
gcdext_exn x y
is the same as gcdext
, but will raise an exception on error.
Iteration over bitvector components
enum_bytes x order
returns a sequence of bytes of x
in a specified order
. Each byte is represented as a bitvector
itself.
enum_bytes x order
returns bytes of x
in a specified order
, with bytes represented by char
type
enum_bits x order
returns bits of x
in a specified order
. order
defines only the ordering of words in a bitvector, bits will always be in MSB first order. The length of the sequence is always a power of 8
.
Comparison with zero
Note, we're not including With_zero
interface, since it refers to the `Sign` module, that is available only in core_kernel >= 113.33.00.
validate_positive
validates that a value is positive.
validate_non_negative
validates that a value is non negative.
validate_negative
validates that a value is negative.
validate_non_positive
validates that a value is not positive.
val is_positive : t -> bool
is_positive x
is true if x
is greater than zero. Always true if x
is unsigned.
val is_non_negative : t -> bool
is_non_negative x
is true if x
is greater than or equal to zero. Tautology if x
is unsigned.
val is_negative : t -> bool
is_negative x
is true if x
is strictly less than zero. It is a contradiction if x
is not signed.
val is_non_positive : t -> bool
is_non_positive x
is true if x
is less than zero. It is a contradiction if x
is not signed.
Arithmetic that raises exceptions.
Arithmetic operations that doesn't check the widths.
Stable marshaling interface.
module Trie : sig ... end
Prefix trees for bitvectors.