package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
type t
val of_pyobject : Py.Object.t -> t
val to_pyobject : t -> Py.Object.t
val create : ?n_clusters:[ `Int of int | `None ] -> ?affinity:[ `String of string | `Callable of Py.Object.t ] -> ?memory:[ `String of string | `JoblibMemory of Py.Object.t ] -> ?connectivity:[ `Ndarray of Ndarray.t | `Callable of Py.Object.t ] -> ?compute_full_tree:[ `Auto | `Bool of bool ] -> ?linkage:[ `Ward | `Complete | `Average | `Single ] -> ?distance_threshold:float -> unit -> t

Agglomerative Clustering

Recursively merges the pair of clusters that minimally increases a given linkage distance.

Read more in the :ref:`User Guide <hierarchical_clustering>`.

Parameters ---------- n_clusters : int or None, default=2 The number of clusters to find. It must be ``None`` if ``distance_threshold`` is not ``None``.

affinity : str or callable, default='euclidean' Metric used to compute the linkage. Can be "euclidean", "l1", "l2", "manhattan", "cosine", or "precomputed". If linkage is "ward", only "euclidean" is accepted. If "precomputed", a distance matrix (instead of a similarity matrix) is needed as input for the fit method.

memory : str or object with the joblib.Memory interface, default=None Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory.

connectivity : array-like or callable, default=None Connectivity matrix. Defines for each sample the neighboring samples following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix, such as derived from kneighbors_graph. Default is None, i.e, the hierarchical clustering algorithm is unstructured.

compute_full_tree : 'auto' or bool, default='auto' Stop early the construction of the tree at n_clusters. This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be ``True`` if ``distance_threshold`` is not ``None``. By default `compute_full_tree` is "auto", which is equivalent to `True` when `distance_threshold` is not `None` or that `n_clusters` is inferior to the maximum between 100 or `0.02 * n_samples`. Otherwise, "auto" is equivalent to `False`.

linkage : "ward", "complete", "average", "single", default="ward" Which linkage criterion to use. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.

  • ward minimizes the variance of the clusters being merged.
  • average uses the average of the distances of each observation of the two sets.
  • complete or maximum linkage uses the maximum distances between all observations of the two sets.
  • single uses the minimum of the distances between all observations of the two sets.

distance_threshold : float, default=None The linkage distance threshold above which, clusters will not be merged. If not ``None``, ``n_clusters`` must be ``None`` and ``compute_full_tree`` must be ``True``.

.. versionadded:: 0.21

Attributes ---------- n_clusters_ : int The number of clusters found by the algorithm. If ``distance_threshold=None``, it will be equal to the given ``n_clusters``.

labels_ : ndarray of shape (n_samples) cluster labels for each point

n_leaves_ : int Number of leaves in the hierarchical tree.

n_connected_components_ : int The estimated number of connected components in the graph.

children_ : array-like of shape (n_samples-1, 2) The children of each non-leaf node. Values less than `n_samples` correspond to leaves of the tree which are the original samples. A node `i` greater than or equal to `n_samples` is a non-leaf node and has children `children_i - n_samples`. Alternatively at the i-th iteration, childreni0 and childreni1 are merged to form node `n_samples + i`

Examples -------- >>> from sklearn.cluster import AgglomerativeClustering >>> import numpy as np >>> X = np.array([1, 2], [1, 4], [1, 0], ... [4, 2], [4, 4], [4, 0]) >>> clustering = AgglomerativeClustering().fit(X) >>> clustering AgglomerativeClustering() >>> clustering.labels_ array(1, 1, 1, 0, 0, 0)

val fit : ?y:Py.Object.t -> x:Ndarray.t -> t -> t

Fit the hierarchical clustering from features, or distance matrix.

Parameters ---------- X : array-like, shape (n_samples, n_features) or (n_samples, n_samples) Training instances to cluster, or distances between instances if ``affinity='precomputed'``.

y : Ignored Not used, present here for API consistency by convention.

Returns ------- self

val fit_predict : ?y:Py.Object.t -> x:Ndarray.t -> t -> Ndarray.t

Fit the hierarchical clustering from features or distance matrix, and return cluster labels.

Parameters ---------- X : array-like, shape (n_samples, n_features) or (n_samples, n_samples) Training instances to cluster, or distances between instances if ``affinity='precomputed'``.

y : Ignored Not used, present here for API consistency by convention.

Returns ------- labels : ndarray, shape (n_samples,) Cluster labels.

val get_params : ?deep:bool -> t -> Py.Object.t

Get parameters for this estimator.

Parameters ---------- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns ------- params : mapping of string to any Parameter names mapped to their values.

val set_params : ?params:(string * Py.Object.t) list -> t -> t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form ``<component>__<parameter>`` so that it's possible to update each component of a nested object.

Parameters ---------- **params : dict Estimator parameters.

Returns ------- self : object Estimator instance.

val n_clusters_ : t -> int

Attribute n_clusters_: see constructor for documentation

val labels_ : t -> Ndarray.t

Attribute labels_: see constructor for documentation

val n_leaves_ : t -> int

Attribute n_leaves_: see constructor for documentation

val n_connected_components_ : t -> int

Attribute n_connected_components_: see constructor for documentation

val children_ : t -> Ndarray.t

Attribute children_: see constructor for documentation

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Format.formatter -> t -> unit

Pretty-print the object to a formatter.

OCaml

Innovation. Community. Security.