Legend:
Library
Module
Module type
Parameter
Class
Class type
Library
Module
Module type
Parameter
Class
Class type
Tdigest.create ?delta ?k ?cx ()
Allocate an empty Tdigest instance.
delta
(default: 0.01
) is the compression factor, the max fraction of mass that can be owned by one centroid (bigger, up to 1.0, means more compression). ~delta:Discrete
switches off TDigest behavior and treats the distribution as discrete, with no merging and exact values reported.
k
(default: 25
) is a size threshold that triggers recompression as the TDigest grows during input. ~k:Manual
disables automatic recompression.
cx
(default: 1.1
) specifies how often to update cached cumulative totals used for quantile estimation during ingest. This is a tradeoff between performance and accuracy. ~cx:Always
will recompute cumulatives on every new datapoint, but the performance drops by 15-25x or even more depending on the size of the dataset.
val is_empty : t -> bool
Tdigest.is_empty td
returns true
when the T-Digest does not contain any values.
Tdigest.info td
returns a record with these fields:
count
: sum of all n
size
: size of the internal B-Tree. Calling Tdigest.compress
will usually reduce this size.
cumulates_count
: number of cumulate operations over the life of this Tdigest instance.
compress_count
: number of compression operations over the life of this Tdigest instance.
auto_cumulates_count
: number of compression operations over the life of this Tdigest instance that were not triggered by a manual call to Tdigest.compress
.
Tdigest.add ?n ~data td
Incorporate a value (data
) having count n
(default: 1
) into a new Tdigest.
Tdigest.add_list ?n ll td
Incorporate a list of values each having count n
(default: 1
) into a new Tdigest.
Tdigest.merge ?delta ?k ?cx tdigests
Efficiently combine multiple Tdigests into a new one.
Tdigest.p_rank td q
For a value q
estimate the percentage (0..1
) of values <= q
.
Returns a new Tdigest to reuse intermediate computations.
Same as Tdigest.p_rank
but for a list of values.
Returns a new Tdigest to reuse intermediate computations.
Tdigest.percentile td p
For a percentage p
(0..1
) estimate the smallest value q
at which at least p
percent of the values <= q
.
For discrete distributions, this selects q using the Nearest Rank Method https://en.wikipedia.org/wiki/Percentile#The_Nearest_Rank_method
For continuous distributions, interpolates data values between count-weighted bracketing means.
Returns a new Tdigest to reuse intermediate computations.
Same as Tdigest.percentile
but for a list of values.
Returns a new Tdigest to reuse intermediate computations.
Tdigest.compress ?delta td
Manual recompression. Not guaranteed to reduce size further if too few values have been added since the last compression.
delta
(default: initial value passed to Tdigest.create
) The compression level to use for this operation only. This does not alter the delta
used by the Tdigest going forward.
Tdigest.to_string td
Serialize the internal state into a binary string that can be stored or concatenated with other such binary strings.
Use Tdigest.of_string
to create a new Tdigest instance from it.
Returns a new Tdigest to reuse intermediate computations.
Tdigest.of_string ?delta ?k ?cx str
See Tdigest.create
for the meaning of the optional parameters.
Allocate a new Tdigest from a string or concatenation of strings originally created by Tdigest.to_string
.
module Private : sig ... end
For internal use