package sklearn

You can search for identifiers within the package.

in-package search v0.2.0

sklearn
- Sklearn

Legend:
Library
Module
Module type
Parameter
Class
Class type

type t

val of_pyobject : Py.Object.t -> t

val to_pyobject : t -> Py.Object.t

val create : 
  ?final_estimator:Py.Object.t ->
  ?cv:
    [ `Int of int | `CrossValGenerator of Py.Object.t | `Ndarray of Ndarray.t ] ->
  ?stack_method:[ `Auto | `Predict_proba | `Decision_function | `Predict ] ->
  ?n_jobs:int ->
  ?passthrough:bool ->
  ?verbose:Py.Object.t ->
  estimators:Py.Object.t ->
  unit ->
  t

Stack of estimators with a final classifier.

Stacked generalization consists in stacking the output of individual estimator and use a classifier to compute the final prediction. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator.

Note that `estimators_` are fitted on the full `X` while `final_estimator_` is trained using cross-validated predictions of the base estimators using `cross_val_predict`.

.. versionadded:: 0.22

Read more in the :ref:`User Guide <stacking>`.

Parameters ---------- estimators : list of (str, estimator) Base estimators which will be stacked together. Each element of the list is defined as a tuple of string (i.e. name) and an estimator instance. An estimator can be set to 'drop' using `set_params`.

final_estimator : estimator, default=None A classifier which will be used to combine the base estimators. The default classifier is a `LogisticRegression`.

cv : int, cross-validation generator or an iterable, default=None Determines the cross-validation splitting strategy used in `cross_val_predict` to train `final_estimator`. Possible inputs for cv are:

* None, to use the default 5-fold cross validation, * integer, to specify the number of folds in a (Stratified) KFold, * An object to be used as a cross-validation generator, * An iterable yielding train, test splits.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, `StratifiedKFold` is used. In all other cases, `KFold` is used.

Refer :ref:`User Guide <cross_validation>` for the various cross-validation strategies that can be used here.

.. note:: A larger number of split will provide no benefits if the number of training samples is large enough. Indeed, the training time will increase. ``cv`` is not used for model evaluation but for prediction.

stack_method : 'auto', 'predict_proba', 'decision_function', 'predict', default='auto' Methods called for each base estimator. It can be:

* if 'auto', it will try to invoke, for each estimator, `'predict_proba'`, `'decision_function'` or `'predict'` in that order. * otherwise, one of `'predict_proba'`, `'decision_function'` or `'predict'`. If the method is not implemented by the estimator, it will raise an error.

n_jobs : int, default=None The number of jobs to run in parallel all `estimators` `fit`. `None` means 1 unless in a `joblib.parallel_backend` context. -1 means using all processors. See Glossary for more details.

passthrough : bool, default=False When False, only the predictions of estimators will be used as training data for `final_estimator`. When True, the `final_estimator` is trained on the predictions as well as the original training data.

Attributes ---------- estimators_ : list of estimators The elements of the estimators parameter, having been fitted on the training data. If an estimator has been set to `'drop'`, it will not appear in `estimators_`.

named_estimators_ : Bunch Attribute to access any fitted sub-estimators by name.

final_estimator_ : estimator The classifier which predicts given the output of `estimators_`.

stack_method_ : list of str The method used by each base estimator.

Notes ----- When `predict_proba` is used by each estimator (i.e. most of the time for `stack_method='auto'` or specifically for `stack_method='predict_proba'`), The first column predicted by each estimator will be dropped in the case of a binary classification problem. Indeed, both feature will be perfectly collinear.

References ---------- .. 1 Wolpert, David H. "Stacked generalization." Neural networks 5.2 (1992): 241-259.

Examples -------- >>> from sklearn.datasets import load_iris >>> from sklearn.ensemble import RandomForestClassifier >>> from sklearn.svm import LinearSVC >>> from sklearn.linear_model import LogisticRegression >>> from sklearn.preprocessing import StandardScaler >>> from sklearn.pipeline import make_pipeline >>> from sklearn.ensemble import StackingClassifier >>> X, y = load_iris(return_X_y=True) >>> estimators = ... ('rf', RandomForestClassifier(n_estimators=10, random_state=42)), ... ('svr', make_pipeline(StandardScaler(), ... LinearSVC(random_state=42))) ... >>> clf = StackingClassifier( ... estimators=estimators, final_estimator=LogisticRegression() ... ) >>> from sklearn.model_selection import train_test_split >>> X_train, X_test, y_train, y_test = train_test_split( ... X, y, stratify=y, random_state=42 ... ) >>> clf.fit(X_train, y_train).score(X_test, y_test) 0.9...

val decision_function : 
  x:[ `Ndarray of Ndarray.t | `SparseMatrix of Csr_matrix.t ] ->
  t ->
  Ndarray.t

Predict decision function for samples in X using `final_estimator_.decision_function`.

Parameters ---------- X : array-like, sparse matrix of shape (n_samples, n_features) Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns ------- decisions : ndarray of shape (n_samples,), (n_samples, n_classes), or (n_samples, n_classes * (n_classes-1) / 2) The decision function computed the final estimator.

val fit : 
  ?sample_weight:[ `Ndarray of Ndarray.t | `None ] ->
  x:[ `Ndarray of Ndarray.t | `SparseMatrix of Csr_matrix.t ] ->
  y:Ndarray.t ->
  t ->
  t

Fit the estimators.

Parameters ---------- X : array-like, sparse matrix of shape (n_samples, n_features) Training vectors, where `n_samples` is the number of samples and `n_features` is the number of features.

y : array-like of shape (n_samples,) Target values.

sample_weight : array-like of shape (n_samples,) or None Sample weights. If None, then samples are equally weighted. Note that this is supported only if all underlying estimators support sample weights.

Returns ------- self : object

val fit_transform : 
  ?y:Ndarray.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:Ndarray.t ->
  t ->
  Ndarray.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters ---------- X : numpy array of shape n_samples, n_features Training set.

y : numpy array of shape n_samples Target values.

**fit_params : dict Additional fit parameters.

Returns ------- X_new : numpy array of shape n_samples, n_features_new Transformed array.

val get_params : ?deep:bool -> t -> Py.Object.t

Get the parameters of an estimator from the ensemble.

Parameters ---------- deep : bool Setting it to True gets the various classifiers and the parameters of the classifiers as well.

val predict : 
  ?predict_params:(string * Py.Object.t) list ->
  x:[ `Ndarray of Ndarray.t | `SparseMatrix of Csr_matrix.t ] ->
  t ->
  Ndarray.t

Predict target for X.

Parameters ---------- X : array-like, sparse matrix of shape (n_samples, n_features) Training vectors, where n_samples is the number of samples and n_features is the number of features.

**predict_params : dict of str -> obj Parameters to the `predict` called by the `final_estimator`. Note that this may be used to return uncertainties from some estimators with `return_std` or `return_cov`. Be aware that it will only accounts for uncertainty in the final estimator.

Returns ------- y_pred : ndarray of shape (n_samples,) or (n_samples, n_output) Predicted targets.

val predict_proba : 
  x:[ `Ndarray of Ndarray.t | `SparseMatrix of Csr_matrix.t ] ->
  t ->
  Ndarray.t

Predict class probabilities for X using `final_estimator_.predict_proba`.

Parameters ---------- X : array-like, sparse matrix of shape (n_samples, n_features) Training vectors, where n_samples is the number of samples and n_features is the number of features.

Returns ------- probabilities : ndarray of shape (n_samples, n_classes) or list of ndarray of shape (n_output,) The class probabilities of the input samples.

val score : 
  ?sample_weight:Ndarray.t ->
  x:Ndarray.t ->
  y:Ndarray.t ->
  t ->
  float

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters ---------- X : array-like of shape (n_samples, n_features) Test samples.

y : array-like of shape (n_samples,) or (n_samples, n_outputs) True labels for X.

sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns ------- score : float Mean accuracy of self.predict(X) wrt. y.

val set_params : ?params:(string * Py.Object.t) list -> t -> t

Set the parameters of an estimator from the ensemble.

Valid parameter keys can be listed with `get_params()`.

Parameters ---------- **params : keyword arguments Specific parameters using e.g. `set_params(parameter_name=new_value)`. In addition, to setting the parameters of the stacking estimator, the individual estimator of the stacking estimators can also be set, or can be removed by setting them to 'drop'.

val transform : 
  x:[ `Ndarray of Ndarray.t | `SparseMatrix of Csr_matrix.t ] ->
  t ->
  Ndarray.t

Return class labels or probabilities for X for each estimator.

Parameters ---------- X : array-like, sparse matrix of shape (n_samples, n_features) Training vectors, where `n_samples` is the number of samples and `n_features` is the number of features.

Returns ------- y_preds : ndarray of shape (n_samples, n_estimators) or (n_samples, n_classes * n_estimators) Prediction outputs for each estimator.

val estimators_ : t -> Py.Object.t

Attribute estimators_: see constructor for documentation

val named_estimators_ : t -> Py.Object.t

Attribute named_estimators_: see constructor for documentation

val final_estimator_ : t -> Py.Object.t

Attribute final_estimator_: see constructor for documentation

val stack_method_ : t -> string list

Attribute stack_method_: see constructor for documentation

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Format.formatter -> t -> unit

Pretty-print the object to a formatter.