package mopsa

  1. Overview
  2. Docs

doc/itvUtils/ItvUtils/Float/index.html

Module ItvUtils.FloatSource

Float - Floating-point arihmetics with rounding.

We rely on C code to provide functions with correct rounding (rounding direction and rounding precision).

Types

Sourcetype t = float
Sourcetype bit_float = {
  1. sign : bool;
    (*

    sign bit (true means negative)

    *)
  2. fraction : Z.t;
    (*

    fraction bits

    *)
  3. exponent : int;
    (*

    exponent (positive, with bias)

    *)
}

Bit-representation of a float value.

Global rounding direction

Sourceval set_round_near : unit -> unit
Sourceval set_round_up : unit -> unit
Sourceval set_round_down : unit -> unit
Sourceval set_round_zero : unit -> unit

Set the rounding mode globally. This affects the behaviors of all floating-point operations, including OCaml's native float operations, but excluding the operations in this module (and the float interval module) that specify a rounding direction.

Note that the operations with specified rounding directions may change the rounding direction globally in some unspecified way, and not reset it to its former value (this is done for efficiency).

Operations without rounding

Sourceval neg : t -> t
Sourceval abs : t -> t
Sourceval fmod : t -> t -> t

Remainder (fmod, not equivalent to IEEE 754's float remainder).

Sourceval infinite : int -> t

Constructs an infinity with the given sign. Zero maps to zero.

Predicates

Sourceval is_nan : t -> bool
Sourceval is_finite : t -> bool

Whether x is finite or not (infinite or NaN).

Sourceval is_infinite : t -> bool
Sourceval is_normal : t -> bool
Sourceval is_denormal : t -> bool
Sourceval sign : t -> int

Sign of x: -1 (negative), 0 (zero or NaN), or 1 (positive).

Sourceval sign_zero : t -> int

As sign, but zero is signed. Returns -1 (negative or -0), 0 (NaN), or 1 (positive of +0)

Sourceval equal : t -> t -> bool

Equality comparison. Identical to =, i.e., NaN ≠ NaN

Sourceval equal_nan : t -> t -> bool

As equal, but NaN equals NaN (and NaN ≠ non-NaN).

Sourceval leq : t -> t -> bool
Sourceval geq : t -> t -> bool
Sourceval lt : t -> t -> bool
Sourceval gt : t -> t -> bool
Sourceval eq : t -> t -> bool
Sourceval neq : t -> t -> bool

Comparison predicates. Returns false if one argument is NaN.

Sourceval min : t -> t -> t
Sourceval max : t -> t -> t

Minimum and maximum.

Sourceval is_zero : t -> bool
Sourceval is_nonzero : t -> bool
Sourceval is_positive : t -> bool
Sourceval is_negative : t -> bool
Sourceval is_positive_strict : t -> bool
Sourceval is_negative_strict : t -> bool

Sign predicates.

Printing

Sourcetype print_format = unit

Control the printing of a float (precision, rounding, etc.).

Sourceval dfl_fmt : print_format

Default format.

Sourceval to_string : print_format -> t -> string
Sourceval print : print_format -> Stdlib.out_channel -> t -> unit
Sourceval fprint : print_format -> Stdlib.Format.formatter -> t -> unit
Sourceval bprint : print_format -> Stdlib.Buffer.t -> t -> unit

Operations with specific rounding direction and precision

We provide the classic operations (and more) for single and double precision and all four rounding directions.

Sourcemodule Single : sig ... end

Single precision operations.

Sourcemodule Double : sig ... end

Double precision operations.

Operations with rounding mode as argument

Sourcetype prec = [
  1. | `SINGLE
    (*

    32-bit single precision

    *)
  2. | `DOUBLE
    (*

    64-bit double precision

    *)
]

Precision.

Sourcetype round = [
  1. | `NEAR
    (*

    To nearest

    *)
  2. | `UP
    (*

    Upwards

    *)
  3. | `DOWN
    (*

    Downwards

    *)
  4. | `ZERO
    (*

    Towards 0

    *)
]

Rounding direction.

Sourceval add : prec -> round -> t -> t -> t

Addition.

Sourceval sub : prec -> round -> t -> t -> t

Subtraction.

Sourceval mul : prec -> round -> t -> t -> t

Multiplication.

Sourceval mulz : prec -> round -> t -> t -> t

Multiplication, where 0 * infinity is 0, not Nan.

Sourceval div : prec -> round -> t -> t -> t

Division.

Sourceval divz : prec -> round -> t -> t -> t

Division, where 0 / 0 is 0, not Nan.

Sourceval rem : prec -> round -> t -> t -> t

Modulo.

Sourceval square : prec -> round -> t -> t

Square.

Sourceval sqrt : prec -> round -> t -> t

Square root.

Sourceval round_int : prec -> round -> t -> t

Rounds to integer (the result remains a float).

Sourceval of_int : prec -> round -> int -> t

Conversion from int.

Sourceval of_int64 : prec -> round -> int64 -> t

Conversion from int64.

Sourceval of_z : prec -> round -> Z.t -> t

Conversion from Z.t

Sourceval of_string : prec -> [ `DOWN | `UP ] -> string -> t

Conversion from string, with safe rounding.

Sourceval succ : prec -> t -> t

Number immediately after.

Sourceval pred : prec -> t -> t

Number immediately before.

Sourceval succ_zero : prec -> t -> t

Number immediately after. Does not cross zero.

Sourceval pred_zero : prec -> t -> t

Number immediately before. Does not cross zero.

Sourceval mantissa_bits : prec -> int
Sourceval exponent_bits : prec -> int
Sourceval exponent_bias : prec -> int
Sourceval min_exponent : prec -> int
Sourceval max_exponent : prec -> int
Sourceval nan_infinity_exponent : prec -> int
Sourceval min_denormal : prec -> t
Sourceval min_normal : prec -> t
Sourceval max_normal : prec -> t
Sourceval max_exact : prec -> t
Sourceval ulp : prec -> t

Useful constants.

Sourceval to_bits : prec -> float -> bit_float
Sourceval of_bits : prec -> bit_float -> float

Bit-level extraction.