Library
Module
Module type
Parameter
Class
Class type
Floating point number utilities.
This module defines a few useful constants, functions, predicates and comparisons on floating point numbers. The formatters output a lossless textual representation of floats.
Quick recall on OCaml's floating point representation.
Warning. This module existed before Stdlib.Float
was introduced in OCaml 4.07.0. Since Gg
1.0.0, the module now includes Stdlib.Float
and some values initially provided by Gg
are now provided by Stdlib.Float
, see the release notes of the package for a precise account of the changes.
Stdlib.Float
include module type of Float
OCaml's floating-point numbers follow the IEEE 754 standard, using double precision (64 bits) numbers. Floating-point operations never raise an exception on overflow, underflow, division by zero, etc. Instead, special IEEE numbers are returned as appropriate, such as infinity
for 1.0 /. 0.0
, neg_infinity
for -1.0 /. 0.0
, and nan
('not a number') for 0.0 /. 0.0
. These special numbers then propagate through floating-point computations as expected: for instance, 1.0 /. infinity
is 0.0
, and any arithmetic operation with nan
as argument returns nan
as result.
fma x y z
returns x * y + z
, with a best effort for computing this expression with a single rounding, using either hardware instructions (providing full IEEE compliance) or a software emulation. Note: since software emulation of the fma is costly, make sure that you are using hardware fma support if performance matters.
rem a b
returns the remainder of a
with respect to b
. The returned value is a -. n *. b
, where n
is the quotient a /. b
rounded towards zero to an integer.
succ x
returns the floating point number right after x
i.e., the smallest floating-point number greater than x
. See also next_after
.
pred x
returns the floating-point number right before x
i.e., the greatest floating-point number smaller than x
. See also next_after
.
A special floating-point value denoting the result of an undefined operation such as 0.0 /. 0.0
. Stands for 'not a number'. Any floating-point operation with nan
as argument returns nan
as result. As for floating-point comparisons, =
, <
, <=
, >
and >=
return false
and <>
returns true
if one or both of their arguments is nan
.
The difference between 1.0
and the smallest exactly representable floating-point number greater than 1.0
.
is_finite x
is true
iff x
is finite i.e., not infinite and not nan
.
is_infinite x
is true
iff x
is infinity
or neg_infinity
.
is_nan x
is true
iff x
is not a number (see nan
).
Truncate the given floating-point number to an integer. The result is unspecified if the argument is nan
or falls outside the range of representable integers.
Convert the given string to a float. The string is read in decimal (by default) or in hexadecimal (marked by 0x
or 0X
). The format of decimal floating-point numbers is [-] dd.ddd (e|E) [+|-] dd
, where d
stands for a decimal digit. The format of hexadecimal floating-point numbers is [-] 0(x|X) hh.hhh (p|P) [+|-] dd
, where h
stands for an hexadecimal digit and d
for a decimal digit. In both cases, at least one of the integer and fractional parts must be given; the exponent part is optional. The _
(underscore) character can appear anywhere in the string and is ignored. Depending on the execution platforms, other representations of floating-point numbers can be accepted, but should not be relied upon. Raise Failure "float_of_string"
if the given string is not a valid representation of a float.
type fpclass = fpclass =
The five classes of floating-point numbers, as determined by the classify_float
function.
val classify_float : float -> fpclass
Return the class of the given floating-point number: normal, subnormal, zero, infinite, or not a number.
expm1 x
computes exp x -. 1.0
, giving numerically-accurate results even if x
is close to 0.0
.
log1p x
computes log(1.0 +. x)
(natural logarithm), giving numerically-accurate results even if x
is close to 0.0
.
Arc cosine. The argument must fall within the range [-1.0, 1.0]
. Result is in radians and is between 0.0
and pi
.
Arc sine. The argument must fall within the range [-1.0, 1.0]
. Result is in radians and is between -pi/2
and pi/2
.
atan2 y x
returns the arc tangent of y /. x
. The signs of x
and y
are used to determine the quadrant of the result. Result is in radians and is between -pi
and pi
.
hypot x y
returns sqrt(x *. x + y *. y)
, that is, the length of the hypotenuse of a right-angled triangle with sides of length x
and y
, or, equivalently, the distance of the point (x,y)
to origin. If one of x
or y
is infinite, returns infinity
even if the other is nan
.
trunc x
rounds x
to the nearest integer whose absolute value is less than or equal to x
.
round x
rounds x
to the nearest integer with ties (fractional values of 0.5) rounded away from zero, regardless of the current rounding direction. If x
is an integer, +0.
, -0.
, nan
, or infinite, x
itself is returned.
Round above to an integer value. ceil f
returns the least integer value greater than or equal to f
. The result is returned as a float.
Round below to an integer value. floor f
returns the greatest integer value less than or equal to f
. The result is returned as a float.
next_after x y
returns the next representable floating-point value following x
in the direction of y
. More precisely, if y
is greater (resp. less) than x
, it returns the smallest (resp. largest) representable number greater (resp. less) than x
. If x
equals y
, the function returns y
. If x
or y
is nan
, a nan
is returned. Note that next_after max_float infinity = infinity
and that next_after 0. infinity
is the smallest denormalized positive number. If x
is the smallest denormalized positive number, next_after x 0. = 0.
copy_sign x y
returns a float whose absolute value is that of x
and whose sign is that of y
. If x
is nan
, returns nan
. If y
is nan
, returns either x
or -. x
, but it is not specified which.
sign_bit x
is true
iff the sign bit of x
is set. For example sign_bit 1.
and signbit 0.
are false
while sign_bit (-1.)
and sign_bit (-0.)
are true
.
frexp f
returns the pair of the significant and the exponent of f
. When f
is zero, the significant x
and the exponent n
of f
are equal to zero. When f
is non-zero, they are defined by f = x *. 2 ** n
and 0.5 <= x < 1.0
.
compare x y
returns 0
if x
is equal to y
, a negative integer if x
is less than y
, and a positive integer if x
is greater than y
. compare
treats nan
as equal to itself and less than any other float value. This treatment of nan
ensures that compare
defines a total ordering relation.
min x y
returns the minimum of x
and y
. It returns nan
when x
or y
is nan
. Moreover min (-0.) (+0.) = -0.
max x y
returns the maximum of x
and y
. It returns nan
when x
or y
is nan
. Moreover max (-0.) (+0.) = +0.
min_max x y
is (min x y, max x y)
, just more efficient.
min_num x y
returns the minimum of x
and y
treating nan
as missing values. If both x
and y
are nan
, nan
is returned. Moreover min_num (-0.) (+0.) = -0.
max_num x y
returns the maximum of x
and y
treating nan
as missing values. If both x
and y
are nan
nan
is returned. Moreover max_num (-0.) (+0.) = +0.
min_max_num x y
is (min_num x y, max_num x y)
, just more efficient. Note that in particular min_max_num x nan = (x, x)
and min_max_num nan y = (y, y)
.
val hash : t -> int
The hash function for floating-point numbers.
module Array : sig ... end
module ArrayLabels : sig ... end
The constant e.
2 *. pi
, two times pi.
The greatest positive floating point number with a fractional part (the float
before 252). Any number outside [-max_frac_float;max_frac_float
] is an integer.
The greatest positive floating point number (253) such that any integer in the range [-max_int_arith;max_int_arith
] is represented exactly. Integer arithmetic can be performed exactly in this interval.
Note. If applicable, a function taking NaNs returns a NaN unless otherwise specified.
random min len ()
is a random float in the interval [min;min+len
] (min
defaults to 0.). Uses the standard library's default Random
state for the generation.
Warning. The float generated by a given state may change in future versions of the library.
val srandom : Random.State.t -> ?min:float -> len:float -> unit -> float
srandom state min len ()
is like random
but uses state
for the generation.
Warning. The float generated by a given state
may change in future versions of the library.
step edge x
is 0.
if x < edge
and 1.
otherwise. The result is undefined on NaNs.
smooth_step e0 e1 x
is 0.
if x <= e0
, 1.
if x >= e1
and cubic hermite interpolation between 0. and 1. otherwise. The result is undefined on NaNs.
clamp min max x
is min
if x < min
, max
if x > max
and x
otherwise. The result is undefined on NaNs and if min >
max
.
remap x0 x1 y0 y1 v
applies to v
the affine transform that maps x0
to y0
and x1
to y1
. If the transform is undefined (x0 = x1
and y0 <> y1
) the function returns y0
for any v
.
int_of_round x
is truncate (round v)
. The result is undefined on NaNs and infinities.
round_dfrac d x
rounds x
to the d
th decimal fractional digit. Ties are rounded towards positive infinity. If x
is an infinity, returns x
. The result is only defined for 0 <= d <=
16
.
round_dsig d x
rounds the normalized decimal significand of x
to the d
th decimal fractional digit. Ties are rounded towards positive infinity. The result is NaN on infinities. The result only defined for 0 <= d <= 16
.
Warning. The current implementation overflows on large x
and d
.
round_zero eps x
is 0.
if abs_float x < eps
and x
otherwise. The result is undefined if eps
is NaN.
chop eps x
is round x
if abs_float (x -. round x) < eps
and x
otherwise. The result is undefined if eps
is NaN.
nan_with_payload payload
is a NaN whose 51 lower significand bits are defined by the 51 lower (or less, as int
allows) bits of payload
.
nan_payload x
is the 51 lower significand bits (or less, as int
allows) of the NaN x
.
Raises Invalid_argument
if x
is not a NaN.
is_zero eps x
is true
if abs_float x < eps
and false
otherwise. The result is undefined if eps
is NaN.
equal_tol eps x y
is true
iff |x - y
| <= eps
* max (1,|x
|,|y
|). On special values the function behaves like compare x y = 0
. The condition turns into an absolute tolerance test for small magnitudes and a relative tolerance test for large magnitudes.
compare_tol ~eps x y
is 0
iff equal_tol ~eps x y
is true
and Stdlib.compare x y
otherwise.
val pp : Format.formatter -> float -> unit
pp ppf x
formats a lossless textual representation of x
on ppf
using "%h"
. Since 1.0.0, before this was the slower legacy_pp
whose output differs on the representation of nan, infinities, or zeros.
Deprecated use max_num
.
Deprecated use min_num
.
Deprecated use is_infinite
.
Deprecated use is_integer
.
val legacy_pp : Format.formatter -> float -> unit
Deprecated use pp
.
pp_legacy ppf x
prints a lossless textual representation of x
on ppf
.
"[-]0x1.<f>p<e>"
where <f>
is the significand bits in hexadecimal and <e>
the unbiased exponent in decimal."[-]0x0.<f>p-1022"
where <f>
is the significand bits in hexadecimal."[-]nan(0x<p>)"
where <p>
is the payload in hexadecimal."[-]inf"
and "[-]0."
.This format should be compatible with recent implementations of strtod and hence with float_of_string
(but negative NaNs seem to be problematic to get back).
float
sAn OCaml float
is an IEEE-754 64 bit double precision binary floating point number. The 64 bits are laid out as follows :
+----------------+-----------------------+-------------------------+ | sign s (1 bit) | exponent e (11 bits) | significand t (52 bits) | +----------------+-----------------------+-------------------------+ 63|62 52|51 0|
The value represented depends on s, e and t :
sign exponent significand value represented meaning ------------------------------------------------------------------------- s 0 0 -1^s * 0 zero s 0 t <> 0 -1^s * 0.t * 2^-1022 subnormal s 0 < e < 2047 f -1^s * 1.t * 2^(e - 1023) normal s 2047 0 -1^s * infinity infinity s 2047 t <> 0 NaN not a number
There are two zeros, a positive and a negative one but both are deemed equal by =
and Stdlib.compare
. A NaN is never equal (=) to itself or to another NaN however Stdlib.compare
asserts any NaN to be equal to itself and to any other NaN.
The bit layout of a float
can be converted to an int64
and back using Int64.bits_of_float
and Int64.float_of_bits
.
The bit 51 of a NaN is used to distinguish between quiet (bit set) and signaling NaNs (bit cleared); the remaining 51 lower bits of the significand are the NaN's payload which can be used to store diagnostic information. These features don't seem to used in OCaml.
The significand of a floating point number is made of 53 binary digits (don't forget the implicit digit), this corresponds to log10(253) ~ 16 decimal digits.
Only float
values in the interval ]-2
52;252[ may have a fractional part. Float.max_frac_float
is the greatest positive float
with a fractional part.
Any integer value in the interval [-2
53;253] can be represented exactly by a float
value. Integer arithmetic performed in this interval is exact. Float.max_int_arith
is 253.