package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
type tag = [
  1. | `KDTree
]
type t = [ `KDTree | `Object ] Obj.t
val of_pyobject : Py.Object.t -> t
val to_pyobject : [> tag ] Obj.t -> Py.Object.t
val create : ?leaf_size:Py.Object.t -> ?metric:[ `S of string | `DistanceMetric_object of Py.Object.t ] -> ?kwargs:(string * Py.Object.t) list -> x:[> `ArrayLike ] Np.Obj.t -> unit -> t

KDTree(X, leaf_size=40, metric='minkowski', **kwargs)

KDTree for fast generalized N-point problems

Parameters ---------- X : array-like of shape (n_samples, n_features) n_samples is the number of points in the data set, and n_features is the dimension of the parameter space. Note: if X is a C-contiguous array of doubles then data will not be copied. Otherwise, an internal copy will be made.

leaf_size : positive int, default=40 Number of points at which to switch to brute-force. Changing leaf_size will not affect the results of a query, but can significantly impact the speed of a query and the memory required to store the constructed tree. The amount of memory needed to store the tree scales as approximately n_samples / leaf_size. For a specified ``leaf_size``, a leaf node is guaranteed to satisfy ``leaf_size <= n_points <= 2 * leaf_size``, except in the case that ``n_samples < leaf_size``.

metric : str or DistanceMetric object the distance metric to use for the tree. Default='minkowski' with p=2 (that is, a euclidean metric). See the documentation of the DistanceMetric class for a list of available metrics. kd_tree.valid_metrics gives a list of the metrics which are valid for KDTree.

Additional keywords are passed to the distance metric class. Note: Callable functions in the metric parameter are NOT supported for KDTree and Ball Tree. Function call overhead will result in very poor performance.

Attributes ---------- data : memory view The training data

Examples -------- Query for k-nearest neighbors

>>> import numpy as np >>> rng = np.random.RandomState(0) >>> X = rng.random_sample((10, 3)) # 10 points in 3 dimensions >>> tree = KDTree(X, leaf_size=2) # doctest: +SKIP >>> dist, ind = tree.query(X:1, k=3) # doctest: +SKIP >>> print(ind) # indices of 3 closest neighbors 0 3 1 >>> print(dist) # distances to 3 closest neighbors 0. 0.19662693 0.29473397

Pickle and Unpickle a tree. Note that the state of the tree is saved in the pickle operation: the tree needs not be rebuilt upon unpickling.

>>> import numpy as np >>> import pickle >>> rng = np.random.RandomState(0) >>> X = rng.random_sample((10, 3)) # 10 points in 3 dimensions >>> tree = KDTree(X, leaf_size=2) # doctest: +SKIP >>> s = pickle.dumps(tree) # doctest: +SKIP >>> tree_copy = pickle.loads(s) # doctest: +SKIP >>> dist, ind = tree_copy.query(X:1, k=3) # doctest: +SKIP >>> print(ind) # indices of 3 closest neighbors 0 3 1 >>> print(dist) # distances to 3 closest neighbors 0. 0.19662693 0.29473397

Query for neighbors within a given radius

>>> import numpy as np >>> rng = np.random.RandomState(0) >>> X = rng.random_sample((10, 3)) # 10 points in 3 dimensions >>> tree = KDTree(X, leaf_size=2) # doctest: +SKIP >>> print(tree.query_radius(X:1, r=0.3, count_only=True)) 3 >>> ind = tree.query_radius(X:1, r=0.3) # doctest: +SKIP >>> print(ind) # indices of neighbors within distance 0.3 3 0 1

Compute a gaussian kernel density estimate:

>>> import numpy as np >>> rng = np.random.RandomState(42) >>> X = rng.random_sample((100, 3)) >>> tree = KDTree(X) # doctest: +SKIP >>> tree.kernel_density(X:3, h=0.1, kernel='gaussian') array( 6.94114649, 7.83281226, 7.2071716 )

Compute a two-point auto-correlation function

>>> import numpy as np >>> rng = np.random.RandomState(0) >>> X = rng.random_sample((30, 3)) >>> r = np.linspace(0, 1, 5) >>> tree = KDTree(X) # doctest: +SKIP >>> tree.two_point_correlation(X, r) array( 30, 62, 278, 580, 820)

val get_arrays : [> tag ] Obj.t -> Py.Object.t

get_arrays(self)

Get data and node arrays.

Returns ------- arrays: tuple of array Arrays for storing tree data, index, node data and node bounds.

val get_n_calls : [> tag ] Obj.t -> int

get_n_calls(self)

Get number of calls.

Returns ------- n_calls: int number of distance computation calls

val get_tree_stats : [> tag ] Obj.t -> Py.Object.t

get_tree_stats(self)

Get tree status.

Returns ------- tree_stats: tuple of int (number of trims, number of leaves, number of splits)

val kernel_density : ?kernel:string -> ?atol:Py.Object.t -> ?rtol:Py.Object.t -> ?breadth_first:bool -> ?return_log:bool -> x:[> `ArrayLike ] Np.Obj.t -> h:float -> [> tag ] Obj.t -> Py.Object.t

kernel_density(self, X, h, kernel='gaussian', atol=0, rtol=1E-8, breadth_first=True, return_log=False)

Compute the kernel density estimate at points X with the given kernel, using the distance metric specified at tree creation.

Parameters ---------- X : array-like of shape (n_samples, n_features) An array of points to query. Last dimension should match dimension of training data. h : float the bandwidth of the kernel kernel : str, default='gaussian' specify the kernel to use. Options are

  • 'gaussian'
  • 'tophat'
  • 'epanechnikov'
  • 'exponential'
  • 'linear'
  • 'cosine' Default is kernel = 'gaussian' atol, rtol : float, default=0, 1e-8 Specify the desired relative and absolute tolerance of the result. If the true result is K_true, then the returned result K_ret satisfies ``abs(K_true - K_ret) < atol + rtol * K_ret`` The default is zero (i.e. machine precision) for both. breadth_first : bool, default=False If True, use a breadth-first search. If False (default) use a depth-first search. Breadth-first is generally faster for compact kernels and/or high tolerances. return_log : bool, default=False Return the logarithm of the result. This can be more accurate than returning the result itself for narrow kernels.

Returns ------- density : ndarray of shape X.shape:-1 The array of (log)-density evaluations

val reset_n_calls : [> tag ] Obj.t -> Py.Object.t

reset_n_calls(self)

Reset number of calls to 0.

val data : t -> Py.Object.t

Attribute data: get value or raise Not_found if None.

val data_opt : t -> Py.Object.t option

Attribute data: get value as an option.

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Stdlib.Format.formatter -> t -> unit

Pretty-print the object to a formatter.