package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
type tag = [
  1. | `FeatureAgglomeration
]
type t = [ `BaseEstimator | `ClusterMixin | `FeatureAgglomeration | `Object | `TransformerMixin ] Obj.t
val of_pyobject : Py.Object.t -> t
val to_pyobject : [> tag ] Obj.t -> Py.Object.t
val as_estimator : t -> [ `BaseEstimator ] Obj.t
val as_transformer : t -> [ `TransformerMixin ] Obj.t
val as_cluster : t -> [ `ClusterMixin ] Obj.t
val create : ?n_clusters:int -> ?affinity:[ `S of string | `Callable of Py.Object.t ] -> ?memory: [ `S of string | `Object_with_the_joblib_Memory_interface of Py.Object.t ] -> ?connectivity:[ `Callable of Py.Object.t | `Arr of [> `ArrayLike ] Np.Obj.t ] -> ?compute_full_tree:[ `Auto | `Bool of bool ] -> ?linkage:[ `Ward | `Complete | `Average | `Single ] -> ?pooling_func:Py.Object.t -> ?distance_threshold:float -> unit -> t

Agglomerate features.

Similar to AgglomerativeClustering, but recursively merges features instead of samples.

Read more in the :ref:`User Guide <hierarchical_clustering>`.

Parameters ---------- n_clusters : int, default=2 The number of clusters to find. It must be ``None`` if ``distance_threshold`` is not ``None``.

affinity : str or callable, default='euclidean' Metric used to compute the linkage. Can be 'euclidean', 'l1', 'l2', 'manhattan', 'cosine', or 'precomputed'. If linkage is 'ward', only 'euclidean' is accepted.

memory : str or object with the joblib.Memory interface, default=None Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory.

connectivity : array-like or callable, default=None Connectivity matrix. Defines for each feature the neighboring features following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix, such as derived from kneighbors_graph. Default is None, i.e, the hierarchical clustering algorithm is unstructured.

compute_full_tree : 'auto' or bool, optional, default='auto' Stop early the construction of the tree at n_clusters. This is useful to decrease computation time if the number of clusters is not small compared to the number of features. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree. It must be ``True`` if ``distance_threshold`` is not ``None``. By default `compute_full_tree` is 'auto', which is equivalent to `True` when `distance_threshold` is not `None` or that `n_clusters` is inferior to the maximum between 100 or `0.02 * n_samples`. Otherwise, 'auto' is equivalent to `False`.

linkage : 'ward', 'complete', 'average', 'single', default='ward' Which linkage criterion to use. The linkage criterion determines which distance to use between sets of features. The algorithm will merge the pairs of cluster that minimize this criterion.

  • ward minimizes the variance of the clusters being merged.
  • average uses the average of the distances of each feature of the two sets.
  • complete or maximum linkage uses the maximum distances between all features of the two sets.
  • single uses the minimum of the distances between all observations of the two sets.

pooling_func : callable, default=np.mean This combines the values of agglomerated features into a single value, and should accept an array of shape M, N and the keyword argument `axis=1`, and reduce it to an array of size M.

distance_threshold : float, default=None The linkage distance threshold above which, clusters will not be merged. If not ``None``, ``n_clusters`` must be ``None`` and ``compute_full_tree`` must be ``True``.

.. versionadded:: 0.21

Attributes ---------- n_clusters_ : int The number of clusters found by the algorithm. If ``distance_threshold=None``, it will be equal to the given ``n_clusters``.

labels_ : array-like of (n_features,) cluster labels for each feature.

n_leaves_ : int Number of leaves in the hierarchical tree.

n_connected_components_ : int The estimated number of connected components in the graph.

.. versionadded:: 0.21 ``n_connected_components_`` was added to replace ``n_components_``.

children_ : array-like of shape (n_nodes-1, 2) The children of each non-leaf node. Values less than `n_features` correspond to leaves of the tree which are the original samples. A node `i` greater than or equal to `n_features` is a non-leaf node and has children `children_i - n_features`. Alternatively at the i-th iteration, childreni0 and childreni1 are merged to form node `n_features + i`

distances_ : array-like of shape (n_nodes-1,) Distances between nodes in the corresponding place in `children_`. Only computed if distance_threshold is not None.

Examples -------- >>> import numpy as np >>> from sklearn import datasets, cluster >>> digits = datasets.load_digits() >>> images = digits.images >>> X = np.reshape(images, (len(images), -1)) >>> agglo = cluster.FeatureAgglomeration(n_clusters=32) >>> agglo.fit(X) FeatureAgglomeration(n_clusters=32) >>> X_reduced = agglo.transform(X) >>> X_reduced.shape (1797, 32)

val fit : ?y:Py.Object.t -> ?params:(string * Py.Object.t) list -> x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> t

Fit the hierarchical clustering on the data

Parameters ---------- X : array-like of shape (n_samples, n_features) The data

y : Ignored

Returns ------- self

val fit_transform : ?y:[> `ArrayLike ] Np.Obj.t -> ?fit_params:(string * Py.Object.t) list -> x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> [> `ArrayLike ] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters ---------- X : array-like, sparse matrix, dataframe of shape (n_samples, n_features)

y : ndarray of shape (n_samples,), default=None Target values.

**fit_params : dict Additional fit parameters.

Returns ------- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

val get_params : ?deep:bool -> [> tag ] Obj.t -> Dict.t

Get parameters for this estimator.

Parameters ---------- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns ------- params : mapping of string to any Parameter names mapped to their values.

val inverse_transform : xred:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> [> `ArrayLike ] Np.Obj.t

Inverse the transformation. Return a vector of size nb_features with the values of Xred assigned to each group of features

Parameters ---------- Xred : array-like of shape (n_samples, n_clusters) or (n_clusters,) The values to be assigned to each cluster of samples

Returns ------- X : array, shape=n_samples, n_features or n_features A vector of size n_samples with the values of Xred assigned to each of the cluster of samples.

val set_params : ?params:(string * Py.Object.t) list -> [> tag ] Obj.t -> t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form ``<component>__<parameter>`` so that it's possible to update each component of a nested object.

Parameters ---------- **params : dict Estimator parameters.

Returns ------- self : object Estimator instance.

val transform : x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> [> `ArrayLike ] Np.Obj.t

Transform a new matrix using the built clustering

Parameters ---------- X : array-like of shape (n_samples, n_features) or (n_samples,) A M by N array of M observations in N dimensions or a length M array of M one-dimensional observations.

Returns ------- Y : array, shape = n_samples, n_clusters or n_clusters The pooled values for each feature cluster.

val n_clusters_ : t -> int

Attribute n_clusters_: get value or raise Not_found if None.

val n_clusters_opt : t -> int option

Attribute n_clusters_: get value as an option.

val labels_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute labels_: get value or raise Not_found if None.

val labels_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute labels_: get value as an option.

val n_leaves_ : t -> int

Attribute n_leaves_: get value or raise Not_found if None.

val n_leaves_opt : t -> int option

Attribute n_leaves_: get value as an option.

val n_connected_components_ : t -> int

Attribute n_connected_components_: get value or raise Not_found if None.

val n_connected_components_opt : t -> int option

Attribute n_connected_components_: get value as an option.

val children_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute children_: get value or raise Not_found if None.

val children_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute children_: get value as an option.

val distances_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute distances_: get value or raise Not_found if None.

val distances_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute distances_: get value as an option.

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Stdlib.Format.formatter -> t -> unit

Pretty-print the object to a formatter.