Library
Module
Module type
Parameter
Class
Class type
delta
is the compression factor, the max fraction of mass that can be owned by one centroid (bigger, up to 1.0, means more compression). ~delta:Discrete
switches off TDigest behavior and treats the distribution as discrete, with no merging and exact values reported.
val sexp_of_delta : delta -> Sexplib0.Sexp.t
val delta_of_sexp : Sexplib0.Sexp.t -> delta
k
is a size threshold that triggers recompression as the TDigest grows during input. ~k:Manual
disables automatic recompression.
val sexp_of_k : k -> Sexplib0.Sexp.t
val k_of_sexp : Sexplib0.Sexp.t -> k
cx
(default: 1.1
) specifies how often to update cached cumulative totals used for quantile estimation during ingest. This is a tradeoff between performance and accuracy. ~cx:Always
will recompute cumulatives on every new datapoint, but the performance drops by 15-25x or even more depending on the size of the dataset.
val sexp_of_cx : cx -> Sexplib0.Sexp.t
val cx_of_sexp : Sexplib0.Sexp.t -> cx
include Sexplib0.Sexpable.S with type t := t
val t_of_sexp : Sexplib0.Sexp.t -> t
val sexp_of_t : t -> Sexplib0.Sexp.t
count
: sum of all n
size
: size of the internal B-Tree. Calling Tdigest.compress
will usually reduce this size.
cumulates_count
: number of cumulate operations over the life of this Tdigest instance.
compress_count
: number of compression operations over the life of this Tdigest instance.
auto_cumulates_count
: number of compression operations over the life of this Tdigest instance that were not triggered by a manual call to Tdigest.compress
.
Tdigest.create ?delta ?k ?cx ()
Allocate an empty Tdigest instance.
delta
(default: 0.01
) is the compression factor, the max fraction of mass that can be owned by one centroid (bigger, up to 1.0, means more compression). ~delta:Discrete
switches off TDigest behavior and treats the distribution as discrete, with no merging and exact values reported.
k
(default: 25
) is a size threshold that triggers recompression as the TDigest grows during input. ~k:Manual
disables automatic recompression.
cx
(default: 1.1
) specifies how often to update cached cumulative totals used for quantile estimation during ingest. This is a tradeoff between performance and accuracy. ~cx:Always
will recompute cumulatives on every new datapoint, but the performance drops by 15-25x or even more depending on the size of the dataset.
val is_empty : t -> bool
Tdigest.is_empty td
returns true
when the T-Digest does not contain any values.
Tdigest.info td
returns a record with these fields:
count
: sum of all n
size
: size of the internal B-Tree. Calling Tdigest.compress
will usually reduce this size.
cumulates_count
: number of cumulate operations over the life of this Tdigest instance.
compress_count
: number of compression operations over the life of this Tdigest instance.
auto_cumulates_count
: number of compression operations over the life of this Tdigest instance that were not triggered by a manual call to Tdigest.compress
.
Tdigest.add ?n ~data td
Incorporate a value (data
) having count n
(default: 1
) into a new Tdigest.
Tdigest.add_list ?n ll td
Incorporate a list of values each having count n
(default: 1
) into a new Tdigest.
Tdigest.merge ?delta ?k ?cx tdigests
Efficiently combine multiple Tdigests into a new one.
Tdigest.p_rank td q
For a value q
estimate the percentage (0..1
) of values <= q
.
Returns a new Tdigest to reuse intermediate computations.
Same as Tdigest.p_rank
but for a list of values.
Returns a new Tdigest to reuse intermediate computations.
Tdigest.percentile td p
For a percentage p
(0..1
) estimate the smallest value q
at which at least p
percent of the values <= q
.
For discrete distributions, this selects q using the Nearest Rank Method https://en.wikipedia.org/wiki/Percentile#The_Nearest_Rank_method
For continuous distributions, interpolates data values between count-weighted bracketing means.
Returns a new Tdigest to reuse intermediate computations.
Same as Tdigest.percentile
but for a list of values.
Returns a new Tdigest to reuse intermediate computations.
Tdigest.compress ?delta td
Manual recompression. Not guaranteed to reduce size further if too few values have been added since the last compression.
delta
(default: initial value passed to Tdigest.create
) The compression level to use for this operation only. This does not alter the delta
used by the Tdigest going forward.
Tdigest.to_string td
Serialize the internal state into a binary string that can be stored or concatenated with other such binary strings.
Use Tdigest.of_string
to create a new Tdigest instance from it.
Returns a new Tdigest to reuse intermediate computations.
Tdigest.of_string ?delta ?k ?cx str
See Tdigest.create
for the meaning of the optional parameters.
Allocate a new Tdigest from a string or concatenation of strings originally created by Tdigest.to_string
.
module Private : sig ... end
For internal use