package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
type tag = [
  1. | `SpectralClustering
]
type t = [ `BaseEstimator | `ClusterMixin | `Object | `SpectralClustering ] Obj.t
val of_pyobject : Py.Object.t -> t
val to_pyobject : [> tag ] Obj.t -> Py.Object.t
val as_estimator : t -> [ `BaseEstimator ] Obj.t
val as_cluster : t -> [ `ClusterMixin ] Obj.t
val create : ?n_clusters:int -> ?eigen_solver:[ `Arpack | `PyObject of Py.Object.t | `Lobpcg ] -> ?n_components:int -> ?random_state:int -> ?n_init:int -> ?gamma:float -> ?affinity:[ `S of string | `Callable of Py.Object.t ] -> ?n_neighbors:int -> ?eigen_tol:float -> ?assign_labels:[ `Kmeans | `Discretize ] -> ?degree:float -> ?coef0:float -> ?kernel_params:Dict.t -> ?n_jobs:int -> unit -> t

Apply clustering to a projection of the normalized Laplacian.

In practice Spectral Clustering is very useful when the structure of the individual clusters is highly non-convex or more generally when a measure of the center and spread of the cluster is not a suitable description of the complete cluster. For instance when clusters are nested circles on the 2D plane.

If affinity is the adjacency matrix of a graph, this method can be used to find normalized graph cuts.

When calling ``fit``, an affinity matrix is constructed using either kernel function such the Gaussian (aka RBF) kernel of the euclidean distanced ``d(X, X)``::

np.exp(-gamma * d(X,X) ** 2)

or a k-nearest neighbors connectivity matrix.

Alternatively, using ``precomputed``, a user-provided affinity matrix can be used.

Read more in the :ref:`User Guide <spectral_clustering>`.

Parameters ---------- n_clusters : integer, optional The dimension of the projection subspace.

eigen_solver : None, 'arpack', 'lobpcg', or 'amg' The eigenvalue decomposition strategy to use. AMG requires pyamg to be installed. It can be faster on very large, sparse problems, but may also lead to instabilities.

n_components : integer, optional, default=n_clusters Number of eigen vectors to use for the spectral embedding

random_state : int, RandomState instance, default=None A pseudo random number generator used for the initialization of the lobpcg eigen vectors decomposition when ``eigen_solver='amg'`` and by the K-Means initialization. Use an int to make the randomness deterministic. See :term:`Glossary <random_state>`.

n_init : int, optional, default: 10 Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

gamma : float, default=1.0 Kernel coefficient for rbf, poly, sigmoid, laplacian and chi2 kernels. Ignored for ``affinity='nearest_neighbors'``.

affinity : string or callable, default 'rbf' How to construct the affinity matrix.

  • 'nearest_neighbors' : construct the affinity matrix by computing a graph of nearest neighbors.
  • 'rbf' : construct the affinity matrix using a radial basis function (RBF) kernel.
  • 'precomputed' : interpret ``X`` as a precomputed affinity matrix.
  • 'precomputed_nearest_neighbors' : interpret ``X`` as a sparse graph of precomputed nearest neighbors, and constructs the affinity matrix by selecting the ``n_neighbors`` nearest neighbors.
  • one of the kernels supported by :func:`~sklearn.metrics.pairwise_kernels`.

Only kernels that produce similarity scores (non-negative values that increase with similarity) should be used. This property is not checked by the clustering algorithm.

n_neighbors : integer Number of neighbors to use when constructing the affinity matrix using the nearest neighbors method. Ignored for ``affinity='rbf'``.

eigen_tol : float, optional, default: 0.0 Stopping criterion for eigendecomposition of the Laplacian matrix when ``eigen_solver='arpack'``.

assign_labels : 'kmeans', 'discretize', default: 'kmeans' The strategy to use to assign labels in the embedding space. There are two ways to assign labels after the laplacian embedding. k-means can be applied and is a popular choice. But it can also be sensitive to initialization. Discretization is another approach which is less sensitive to random initialization.

degree : float, default=3 Degree of the polynomial kernel. Ignored by other kernels.

coef0 : float, default=1 Zero coefficient for polynomial and sigmoid kernels. Ignored by other kernels.

kernel_params : dictionary of string to any, optional Parameters (keyword arguments) and values for kernel passed as callable object. Ignored by other kernels.

n_jobs : int or None, optional (default=None) The number of parallel jobs to run. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details.

Attributes ---------- affinity_matrix_ : array-like, shape (n_samples, n_samples) Affinity matrix used for clustering. Available only if after calling ``fit``.

labels_ : array, shape (n_samples,) Labels of each point

Examples -------- >>> from sklearn.cluster import SpectralClustering >>> import numpy as np >>> X = np.array([1, 1], [2, 1], [1, 0], ... [4, 7], [3, 5], [3, 6]) >>> clustering = SpectralClustering(n_clusters=2, ... assign_labels='discretize', ... random_state=0).fit(X) >>> clustering.labels_ array(1, 1, 1, 0, 0, 0) >>> clustering SpectralClustering(assign_labels='discretize', n_clusters=2, random_state=0)

Notes ----- If you have an affinity matrix, such as a distance matrix, for which 0 means identical elements, and high values means very dissimilar elements, it can be transformed in a similarity matrix that is well suited for the algorithm by applying the Gaussian (RBF, heat) kernel::

np.exp(- dist_matrix ** 2 / (2. * delta ** 2))

Where ``delta`` is a free parameter representing the width of the Gaussian kernel.

Another alternative is to take a symmetric version of the k nearest neighbors connectivity matrix of the points.

If the pyamg package is installed, it is used: this greatly speeds up computation.

References ----------

  • Normalized cuts and image segmentation, 2000 Jianbo Shi, Jitendra Malik http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.2324
  • A Tutorial on Spectral Clustering, 2007 Ulrike von Luxburg http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.9323
  • Multiclass spectral clustering, 2003 Stella X. Yu, Jianbo Shi https://www1.icsi.berkeley.edu/~stellayu/publication/doc/2003kwayICCV.pdf
val fit : ?y:Py.Object.t -> x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> t

Perform spectral clustering from features, or affinity matrix.

Parameters ---------- X : array-like or sparse matrix, shape (n_samples, n_features), or array-like, shape (n_samples, n_samples) Training instances to cluster, or similarities / affinities between instances if ``affinity='precomputed'``. If a sparse matrix is provided in a format other than ``csr_matrix``, ``csc_matrix``, or ``coo_matrix``, it will be converted into a sparse ``csr_matrix``.

y : Ignored Not used, present here for API consistency by convention.

Returns ------- self

val fit_predict : ?y:Py.Object.t -> x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> [> `ArrayLike ] Np.Obj.t

Perform spectral clustering from features, or affinity matrix, and return cluster labels.

Parameters ---------- X : array-like or sparse matrix, shape (n_samples, n_features), or array-like, shape (n_samples, n_samples) Training instances to cluster, or similarities / affinities between instances if ``affinity='precomputed'``. If a sparse matrix is provided in a format other than ``csr_matrix``, ``csc_matrix``, or ``coo_matrix``, it will be converted into a sparse ``csr_matrix``.

y : Ignored Not used, present here for API consistency by convention.

Returns ------- labels : ndarray, shape (n_samples,) Cluster labels.

val get_params : ?deep:bool -> [> tag ] Obj.t -> Dict.t

Get parameters for this estimator.

Parameters ---------- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns ------- params : mapping of string to any Parameter names mapped to their values.

val set_params : ?params:(string * Py.Object.t) list -> [> tag ] Obj.t -> t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form ``<component>__<parameter>`` so that it's possible to update each component of a nested object.

Parameters ---------- **params : dict Estimator parameters.

Returns ------- self : object Estimator instance.

val affinity_matrix_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute affinity_matrix_: get value or raise Not_found if None.

val affinity_matrix_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute affinity_matrix_: get value as an option.

val labels_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute labels_: get value or raise Not_found if None.

val labels_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute labels_: get value as an option.

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Stdlib.Format.formatter -> t -> unit

Pretty-print the object to a formatter.