Library

Module

Module type

Parameter

Class

Class type

Floating point number utilities.

This module defines a few useful constants, functions, predicates and comparisons on floating point numbers. The formatters output a lossless textual representation of floats.

Quick recall on OCaml's floating point representation.

**Warning.** This module existed before `Stdlib.Float`

was introduced in OCaml 4.07.0. Since `Gg`

1.0.0, the module now includes `Stdlib.Float`

and some values initially provided by `Gg`

are now provided by `Stdlib.Float`

, see the release notes of the package for a precise account of the changes.

`Stdlib.Float`

`include module type of Float`

`fma x y z`

returns `x * y + z`

, with a best effort for computing this expression with a single rounding, using either hardware instructions (providing full IEEE compliance) or a software emulation.

On 64-bit Cygwin, 64-bit mingw-w64 and MSVC 2017 and earlier, this function may be emulated owing to known bugs on limitations on these platforms. Note: since software emulation of the fma is costly, make sure that you are using hardware fma support if performance matters.

`rem a b`

returns the remainder of `a`

with respect to `b`

. The returned value is `a -. n *. b`

, where `n`

is the quotient `a /. b`

rounded towards zero to an integer.

`succ x`

returns the floating point number right after `x`

i.e., the smallest floating-point number greater than `x`

. See also `next_after`

.

`pred x`

returns the floating-point number right before `x`

i.e., the greatest floating-point number smaller than `x`

. See also `next_after`

.

A special floating-point value denoting the result of an undefined operation such as `0.0 /. 0.0`

. Stands for 'not a number'. Any floating-point operation with `nan`

as argument returns `nan`

as result. As for floating-point comparisons, `=`

, `<`

, `<=`

, `>`

and `>=`

return `false`

and `<>`

returns `true`

if one or both of their arguments is `nan`

.

The difference between `1.0`

and the smallest exactly representable floating-point number greater than `1.0`

.

`is_finite x`

is `true`

if and only if `x`

is finite i.e., not infinite and not `nan`

.

`is_infinite x`

is `true`

if and only if `x`

is `infinity`

or `neg_infinity`

.

`is_nan x`

is `true`

if and only if `x`

is not a number (see `nan`

).

Truncate the given floating-point number to an integer. The result is unspecified if the argument is `nan`

or falls outside the range of representable integers.

Convert the given string to a float. The string is read in decimal (by default) or in hexadecimal (marked by `0x`

or `0X`

). The format of decimal floating-point numbers is ` [-] dd.ddd (e|E) [+|-] dd `

, where `d`

stands for a decimal digit. The format of hexadecimal floating-point numbers is ` [-] 0(x|X) hh.hhh (p|P) [+|-] dd `

, where `h`

stands for an hexadecimal digit and `d`

for a decimal digit. In both cases, at least one of the integer and fractional parts must be given; the exponent part is optional. The `_`

(underscore) character can appear anywhere in the string and is ignored. Depending on the execution platforms, other representations of floating-point numbers can be accepted, but should not be relied upon.

`type fpclass = fpclass = `

The five classes of floating-point numbers, as determined by the `classify_float`

function.

`val classify_float : float -> fpclass`

Return the class of the given floating-point number: normal, subnormal, zero, infinite, or not a number.

`expm1 x`

computes `exp x -. 1.0`

, giving numerically-accurate results even if `x`

is close to `0.0`

.

`log1p x`

computes `log(1.0 +. x)`

(natural logarithm), giving numerically-accurate results even if `x`

is close to `0.0`

.

Arc cosine. The argument must fall within the range `[-1.0, 1.0]`

. Result is in radians and is between `0.0`

and `pi`

.

Arc sine. The argument must fall within the range `[-1.0, 1.0]`

. Result is in radians and is between `-pi/2`

and `pi/2`

.

`atan2 y x`

returns the arc tangent of `y /. x`

. The signs of `x`

and `y`

are used to determine the quadrant of the result. Result is in radians and is between `-pi`

and `pi`

.

`hypot x y`

returns `sqrt(x *. x + y *. y)`

, that is, the length of the hypotenuse of a right-angled triangle with sides of length `x`

and `y`

, or, equivalently, the distance of the point `(x,y)`

to origin. If one of `x`

or `y`

is infinite, returns `infinity`

even if the other is `nan`

.

Hyperbolic arc cosine. The argument must fall within the range `[1.0, inf]`

. Result is in radians and is between `0.0`

and `inf`

.

Hyperbolic arc sine. The argument and result range over the entire real line. Result is in radians.

Hyperbolic arc tangent. The argument must fall within the range `[-1.0, 1.0]`

. Result is in radians and ranges over the entire real line.

Error function. The argument ranges over the entire real line. The result is always within `[-1.0, 1.0]`

.

Complementary error function (`erfc x = 1 - erf x`

). The argument ranges over the entire real line. The result is always within `[-1.0, 1.0]`

.

`trunc x`

rounds `x`

to the nearest integer whose absolute value is less than or equal to `x`

.

`round x`

rounds `x`

to the nearest integer with ties (fractional values of 0.5) rounded away from zero, regardless of the current rounding direction. If `x`

is an integer, `+0.`

, `-0.`

, `nan`

, or infinite, `x`

itself is returned.

On 64-bit mingw-w64, this function may be emulated owing to a bug in the C runtime library (CRT) on this platform.

Round above to an integer value. `ceil f`

returns the least integer value greater than or equal to `f`

. The result is returned as a float.

Round below to an integer value. `floor f`

returns the greatest integer value less than or equal to `f`

. The result is returned as a float.

`next_after x y`

returns the next representable floating-point value following `x`

in the direction of `y`

. More precisely, if `y`

is greater (resp. less) than `x`

, it returns the smallest (resp. largest) representable number greater (resp. less) than `x`

. If `x`

equals `y`

, the function returns `y`

. If `x`

or `y`

is `nan`

, a `nan`

is returned. Note that `next_after max_float infinity = infinity`

and that `next_after 0. infinity`

is the smallest denormalized positive number. If `x`

is the smallest denormalized positive number, `next_after x 0. = 0.`

`copy_sign x y`

returns a float whose absolute value is that of `x`

and whose sign is that of `y`

. If `x`

is `nan`

, returns `nan`

. If `y`

is `nan`

, returns either `x`

or `-. x`

, but it is not specified which.

`sign_bit x`

is `true`

if and only if the sign bit of `x`

is set. For example `sign_bit 1.`

and `signbit 0.`

are `false`

while `sign_bit (-1.)`

and `sign_bit (-0.)`

are `true`

.

`frexp f`

returns the pair of the significant and the exponent of `f`

. When `f`

is zero, the significant `x`

and the exponent `n`

of `f`

are equal to zero. When `f`

is non-zero, they are defined by `f = x *. 2 ** n`

and `0.5 <= x < 1.0`

.

`compare x y`

returns `0`

if `x`

is equal to `y`

, a negative integer if `x`

is less than `y`

, and a positive integer if `x`

is greater than `y`

. `compare`

treats `nan`

as equal to itself and less than any other float value. This treatment of `nan`

ensures that `compare`

defines a total ordering relation.

`min x y`

returns the minimum of `x`

and `y`

. It returns `nan`

when `x`

or `y`

is `nan`

. Moreover `min (-0.) (+0.) = -0.`

`max x y`

returns the maximum of `x`

and `y`

. It returns `nan`

when `x`

or `y`

is `nan`

. Moreover `max (-0.) (+0.) = +0.`

`min_max x y`

is `(min x y, max x y)`

, just more efficient.

`min_num x y`

returns the minimum of `x`

and `y`

treating `nan`

as missing values. If both `x`

and `y`

are `nan`

, `nan`

is returned. Moreover `min_num (-0.) (+0.) = -0.`

`max_num x y`

returns the maximum of `x`

and `y`

treating `nan`

as missing values. If both `x`

and `y`

are `nan`

`nan`

is returned. Moreover `max_num (-0.) (+0.) = +0.`

`min_max_num x y`

is `(min_num x y, max_num x y)`

, just more efficient. Note that in particular `min_max_num x nan = (x, x)`

and `min_max_num nan y = (y, y)`

.

`val hash : t -> int`

The hash function for floating-point numbers.

`module Array : sig ... end`

Float arrays with packed representation.

`module ArrayLabels : sig ... end`

Float arrays with packed representation (labeled functions).

## Constants

The constant e.

`2 *. pi`

, two times pi.

The greatest positive floating point number with a fractional part (the `float`

before 2^{52}). Any number outside [`-max_frac_float;max_frac_float`

] is an integer.

The greatest positive floating point number (2^{53}) such that any *integer* in the range [`-max_int_arith;max_int_arith`

] is represented exactly. Integer arithmetic can be performed exactly in this interval.

## Functions

**Note.** If applicable, a function taking NaNs returns a NaN unless otherwise specified.

`random min len ()`

is a random float in the interval [`min;min+len`

] (`min`

defaults to 0.). Uses the standard library's default `Random`

state for the generation.

**Warning.** The float generated by a given state may change in future versions of the library.

`val srandom : Random.State.t -> ?min:float -> len:float -> unit -> float`

`srandom state min len ()`

is like `random`

but uses `state`

for the generation.

**Warning.** The float generated by a given `state`

may change in future versions of the library.

`step edge x`

is `0.`

if `x < edge`

and `1.`

otherwise. The result is undefined on NaNs.

`smooth_step e0 e1 x`

is `0.`

if `x <= e0`

, `1.`

if `x >= e1`

and cubic hermite interpolation between 0. and 1. otherwise. The result is undefined on NaNs.

`clamp min max x`

is `min`

if `x < min`

, `max`

if `x > max`

and `x`

otherwise. The result is undefined on NaNs and if ```
min >
max
```

.

`remap x0 x1 y0 y1 v`

applies to `v`

the affine transform that maps `x0`

to `y0`

and `x1`

to `y1`

. If the transform is undefined (`x0 = x1`

and `y0 <> y1`

) the function returns `y0`

for any `v`

.

`int_of_round x`

is `truncate (round v)`

. The result is undefined on NaNs and infinities.

`round_dfrac d x`

rounds `x`

to the `d`

th *decimal* fractional digit. Ties are rounded towards positive infinity. If `x`

is an infinity, returns `x`

. The result is only defined for ```
0 <= d <=
16
```

.

`round_dsig d x`

rounds the normalized *decimal* significand of `x`

to the `d`

th decimal fractional digit. Ties are rounded towards positive infinity. The result is NaN on infinities. The result only defined for `0 <= d <= 16`

.

**Warning.** The current implementation overflows on large `x`

and `d`

.

`round_zero eps x`

is `0.`

if `abs_float x < eps`

and `x`

otherwise. The result is undefined if `eps`

is NaN.

`chop eps x`

is `round x`

if `abs_float (x -. round x) < eps`

and `x`

otherwise. The result is undefined if `eps`

is NaN.

`nan_with_payload payload`

is a NaN whose 51 lower significand bits are defined by the 51 lower (or less, as `int`

allows) bits of `payload`

.

`nan_payload x`

is the 51 lower significand bits (or less, as `int`

allows) of the NaN `x`

.

Raises `Invalid_argument`

if `x`

is not a NaN.

## Predicates and comparisons

`is_zero eps x`

is `true`

if `abs_float x < eps`

and `false`

otherwise. The result is undefined if `eps`

is NaN.

`equal_tol eps x y`

is `true`

iff |`x - y`

| <= `eps`

* max (1,|`x`

|,|`y`

|). On special values the function behaves like `compare x y = 0`

. The condition turns into an absolute tolerance test for small magnitudes and a relative tolerance test for large magnitudes.

`compare_tol ~eps x y`

is `0`

iff `equal_tol ~eps x y`

is `true`

and `Stdlib.compare x y`

otherwise.

## Formatters

`val pp : Format.formatter -> float -> unit`

`pp ppf x`

formats a lossless textual representation of `x`

on `ppf`

using `"%h"`

. Since 1.0.0, before this was the slower `legacy_pp`

whose output differs on the representation of nan, infinities, or zeros.

## Deprecated

Deprecated use `max_num`

.

Deprecated use `min_num`

.

Deprecated use `is_infinite`

.

Deprecated use `is_integer`

.

`val legacy_pp : Format.formatter -> float -> unit`

Deprecated use `pp`

.

`pp_legacy ppf x`

prints a lossless textual representation of `x`

on `ppf`

.

- Normals are represented by
`"[-]0x1.<f>p<e>"`

where`<f>`

is the significand bits in hexadecimal and`<e>`

the unbiased exponent in decimal. - Subnormals are represented by
`"[-]0x0.<f>p-1022"`

where`<f>`

is the significand bits in hexadecimal. - NaNs are represented by
`"[-]nan(0x<p>)"`

where`<p>`

is the payload in hexadecimal. - Infinities and zeroes are represented by
`"[-]inf"`

and`"[-]0."`

.

This format should be compatible with recent implementations of strtod and hence with `float_of_string`

(but negative NaNs seem to be problematic to get back).

## Quick recall on OCaml's `float`

s

An OCaml `float`

is an IEEE-754 64 bit double precision binary floating point number. The 64 bits are laid out as follows :

+----------------+-----------------------+-------------------------+ | sign s (1 bit) | exponent e (11 bits) | significand t (52 bits) | +----------------+-----------------------+-------------------------+ 63|62 52|51 0|

The value represented depends on s, e and t :

sign exponent significand value represented meaning ------------------------------------------------------------------------- s 0 0 -1^s * 0 zero s 0 t <> 0 -1^s * 0.t * 2^-1022 subnormal s 0 < e < 2047 f -1^s * 1.t * 2^(e - 1023) normal s 2047 0 -1^s * infinity infinity s 2047 t <> 0 NaN not a number

There are two zeros, a positive and a negative one but both are deemed equal by `=`

and `Stdlib.compare`

. A NaN is never equal (=) to *itself* or to another NaN however `Stdlib.compare`

asserts any NaN to be equal to itself and to any other NaN.

The bit layout of a `float`

can be converted to an `int64`

and back using `Int64.bits_of_float`

and `Int64.float_of_bits`

.

The bit 51 of a NaN is used to distinguish between quiet (bit set) and signaling NaNs (bit cleared); the remaining 51 lower bits of the significand are the NaN's *payload* which can be used to store diagnostic information. These features don't seem to used in OCaml.

The significand of a floating point number is made of 53 binary digits (don't forget the implicit digit), this corresponds to log_{10}(2^{53}) ~ 16 *decimal* digits.

Only `float`

values in the interval ]`-2`

^{52};2^{52}[ may have a fractional part. `Float.max_frac_float`

is the greatest positive `float`

with a fractional part.

Any integer value in the interval [`-2`

^{53};2^{53}] can be represented exactly by a `float`

value. *Integer* arithmetic performed in this interval is exact. `Float.max_int_arith`

is 2^{53}.