package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
type tag = [
  1. | `AgglomerativeClustering
]
type t = [ `AgglomerativeClustering | `BaseEstimator | `ClusterMixin | `Object ] Obj.t
val of_pyobject : Py.Object.t -> t
val to_pyobject : [> tag ] Obj.t -> Py.Object.t
val as_estimator : t -> [ `BaseEstimator ] Obj.t
val as_cluster : t -> [ `ClusterMixin ] Obj.t
val create : ?n_clusters:[ `I of int | `None ] -> ?affinity:[ `S of string | `Callable of Py.Object.t ] -> ?memory: [ `S of string | `Object_with_the_joblib_Memory_interface of Py.Object.t ] -> ?connectivity:[ `Callable of Py.Object.t | `Arr of [> `ArrayLike ] Np.Obj.t ] -> ?compute_full_tree:[ `Auto | `Bool of bool ] -> ?linkage:[ `Ward | `Complete | `Average | `Single ] -> ?distance_threshold:float -> unit -> t

Agglomerative Clustering

Recursively merges the pair of clusters that minimally increases a given linkage distance.

Read more in the :ref:`User Guide <hierarchical_clustering>`.

Parameters ---------- n_clusters : int or None, default=2 The number of clusters to find. It must be ``None`` if ``distance_threshold`` is not ``None``.

affinity : str or callable, default='euclidean' Metric used to compute the linkage. Can be 'euclidean', 'l1', 'l2', 'manhattan', 'cosine', or 'precomputed'. If linkage is 'ward', only 'euclidean' is accepted. If 'precomputed', a distance matrix (instead of a similarity matrix) is needed as input for the fit method.

memory : str or object with the joblib.Memory interface, default=None Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory.

connectivity : array-like or callable, default=None Connectivity matrix. Defines for each sample the neighboring samples following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix, such as derived from kneighbors_graph. Default is None, i.e, the hierarchical clustering algorithm is unstructured.

compute_full_tree : 'auto' or bool, default='auto' Stop early the construction of the tree at n_clusters. This is useful to decrease computation time if the number of clusters is not small compared to the number of samples. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be ``True`` if ``distance_threshold`` is not ``None``. By default `compute_full_tree` is 'auto', which is equivalent to `True` when `distance_threshold` is not `None` or that `n_clusters` is inferior to the maximum between 100 or `0.02 * n_samples`. Otherwise, 'auto' is equivalent to `False`.

linkage : 'ward', 'complete', 'average', 'single', default='ward' Which linkage criterion to use. The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion.

  • ward minimizes the variance of the clusters being merged.
  • average uses the average of the distances of each observation of the two sets.
  • complete or maximum linkage uses the maximum distances between all observations of the two sets.
  • single uses the minimum of the distances between all observations of the two sets.

.. versionadded:: 0.20 Added the 'single' option

distance_threshold : float, default=None The linkage distance threshold above which, clusters will not be merged. If not ``None``, ``n_clusters`` must be ``None`` and ``compute_full_tree`` must be ``True``.

.. versionadded:: 0.21

Attributes ---------- n_clusters_ : int The number of clusters found by the algorithm. If ``distance_threshold=None``, it will be equal to the given ``n_clusters``.

labels_ : ndarray of shape (n_samples) cluster labels for each point

n_leaves_ : int Number of leaves in the hierarchical tree.

n_connected_components_ : int The estimated number of connected components in the graph.

.. versionadded:: 0.21 ``n_connected_components_`` was added to replace ``n_components_``.

children_ : array-like of shape (n_samples-1, 2) The children of each non-leaf node. Values less than `n_samples` correspond to leaves of the tree which are the original samples. A node `i` greater than or equal to `n_samples` is a non-leaf node and has children `children_i - n_samples`. Alternatively at the i-th iteration, childreni0 and childreni1 are merged to form node `n_samples + i`

Examples -------- >>> from sklearn.cluster import AgglomerativeClustering >>> import numpy as np >>> X = np.array([1, 2], [1, 4], [1, 0], ... [4, 2], [4, 4], [4, 0]) >>> clustering = AgglomerativeClustering().fit(X) >>> clustering AgglomerativeClustering() >>> clustering.labels_ array(1, 1, 1, 0, 0, 0)

val fit : ?y:Py.Object.t -> x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> t

Fit the hierarchical clustering from features, or distance matrix.

Parameters ---------- X : array-like, shape (n_samples, n_features) or (n_samples, n_samples) Training instances to cluster, or distances between instances if ``affinity='precomputed'``.

y : Ignored Not used, present here for API consistency by convention.

Returns ------- self

val fit_predict : ?y:Py.Object.t -> x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> [> `ArrayLike ] Np.Obj.t

Fit the hierarchical clustering from features or distance matrix, and return cluster labels.

Parameters ---------- X : array-like, shape (n_samples, n_features) or (n_samples, n_samples) Training instances to cluster, or distances between instances if ``affinity='precomputed'``.

y : Ignored Not used, present here for API consistency by convention.

Returns ------- labels : ndarray, shape (n_samples,) Cluster labels.

val get_params : ?deep:bool -> [> tag ] Obj.t -> Dict.t

Get parameters for this estimator.

Parameters ---------- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns ------- params : mapping of string to any Parameter names mapped to their values.

val set_params : ?params:(string * Py.Object.t) list -> [> tag ] Obj.t -> t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form ``<component>__<parameter>`` so that it's possible to update each component of a nested object.

Parameters ---------- **params : dict Estimator parameters.

Returns ------- self : object Estimator instance.

val n_clusters_ : t -> int

Attribute n_clusters_: get value or raise Not_found if None.

val n_clusters_opt : t -> int option

Attribute n_clusters_: get value as an option.

val labels_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute labels_: get value or raise Not_found if None.

val labels_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute labels_: get value as an option.

val n_leaves_ : t -> int

Attribute n_leaves_: get value or raise Not_found if None.

val n_leaves_opt : t -> int option

Attribute n_leaves_: get value as an option.

val n_connected_components_ : t -> int

Attribute n_connected_components_: get value or raise Not_found if None.

val n_connected_components_opt : t -> int option

Attribute n_connected_components_: get value as an option.

val children_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute children_: get value or raise Not_found if None.

val children_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute children_: get value as an option.

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Stdlib.Format.formatter -> t -> unit

Pretty-print the object to a formatter.