package sklearn

You can search for identifiers within the package.

in-package search v0.2.0

sklearn
- Sklearn

Legend:
Library
Module
Module type
Parameter
Class
Class type

type tag = [

| `StackingRegressor

]

type t =
  [ `BaseEstimator
  | `MetaEstimatorMixin
  | `Object
  | `RegressorMixin
  | `StackingRegressor
  | `TransformerMixin ]
    Obj.t

val of_pyobject : Py.Object.t -> t

val to_pyobject : [> tag ] Obj.t -> Py.Object.t

val as_transformer : t -> [ `TransformerMixin ] Obj.t

val as_meta_estimator : t -> [ `MetaEstimatorMixin ] Obj.t

val as_regressor : t -> [ `RegressorMixin ] Obj.t

val as_estimator : t -> [ `BaseEstimator ] Obj.t

val create : 
  ?final_estimator:[> `BaseEstimator ] Np.Obj.t ->
  ?cv:
    [ `BaseCrossValidator of [> `BaseCrossValidator ] Np.Obj.t
    | `Arr of [> `ArrayLike ] Np.Obj.t
    | `I of int ] ->
  ?n_jobs:int ->
  ?passthrough:bool ->
  ?verbose:int ->
  estimators:(string * [> `BaseEstimator ] Np.Obj.t) list ->
  unit ->
  t

Stack of estimators with a final regressor.

Stacked generalization consists in stacking the output of individual estimator and use a regressor to compute the final prediction. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator.

Note that `estimators_` are fitted on the full `X` while `final_estimator_` is trained using cross-validated predictions of the base estimators using `cross_val_predict`.

.. versionadded:: 0.22

Read more in the :ref:`User Guide <stacking>`.

Parameters ---------- estimators : list of (str, estimator) Base estimators which will be stacked together. Each element of the list is defined as a tuple of string (i.e. name) and an estimator instance. An estimator can be set to 'drop' using `set_params`.

final_estimator : estimator, default=None A regressor which will be used to combine the base estimators. The default regressor is a `RidgeCV`.

cv : int, cross-validation generator or an iterable, default=None Determines the cross-validation splitting strategy used in `cross_val_predict` to train `final_estimator`. Possible inputs for cv are:

* None, to use the default 5-fold cross validation, * integer, to specify the number of folds in a (Stratified) KFold, * An object to be used as a cross-validation generator, * An iterable yielding train, test splits.

For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, `StratifiedKFold` is used. In all other cases, `KFold` is used.

Refer :ref:`User Guide <cross_validation>` for the various cross-validation strategies that can be used here.

.. note:: A larger number of split will provide no benefits if the number of training samples is large enough. Indeed, the training time will increase. ``cv`` is not used for model evaluation but for prediction.

n_jobs : int, default=None The number of jobs to run in parallel for `fit` of all `estimators`. `None` means 1 unless in a `joblib.parallel_backend` context. -1 means using all processors. See Glossary for more details.

passthrough : bool, default=False When False, only the predictions of estimators will be used as training data for `final_estimator`. When True, the `final_estimator` is trained on the predictions as well as the original training data.

verbose : int, default=0 Verbosity level.

Attributes ---------- estimators_ : list of estimator The elements of the estimators parameter, having been fitted on the training data. If an estimator has been set to `'drop'`, it will not appear in `estimators_`.

named_estimators_ : :class:`~sklearn.utils.Bunch` Attribute to access any fitted sub-estimators by name.

final_estimator_ : estimator The regressor to stacked the base estimators fitted.

References ---------- .. 1 Wolpert, David H. 'Stacked generalization.' Neural networks 5.2 (1992): 241-259.

Examples -------- >>> from sklearn.datasets import load_diabetes >>> from sklearn.linear_model import RidgeCV >>> from sklearn.svm import LinearSVR >>> from sklearn.ensemble import RandomForestRegressor >>> from sklearn.ensemble import StackingRegressor >>> X, y = load_diabetes(return_X_y=True) >>> estimators = ... ('lr', RidgeCV()), ... ('svr', LinearSVR(random_state=42)) ... >>> reg = StackingRegressor( ... estimators=estimators, ... final_estimator=RandomForestRegressor(n_estimators=10, ... random_state=42) ... ) >>> from sklearn.model_selection import train_test_split >>> X_train, X_test, y_train, y_test = train_test_split( ... X, y, random_state=42 ... ) >>> reg.fit(X_train, y_train).score(X_test, y_test) 0.3...

val fit : 
  ?sample_weight:[> `ArrayLike ] Np.Obj.t ->
  x:[> `ArrayLike ] Np.Obj.t ->
  y:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  t

Fit the estimators.

Parameters ---------- X : array-like, sparse matrix of shape (n_samples, n_features) Training vectors, where n_samples is the number of samples and n_features is the number of features.

y : array-like of shape (n_samples,) Target values.

sample_weight : array-like of shape (n_samples,), default=None Sample weights. If None, then samples are equally weighted. Note that this is supported only if all underlying estimators support sample weights.

Returns ------- self : object

val fit_transform : 
  ?y:[> `ArrayLike ] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  [> `ArrayLike ] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters ---------- X : array-like, sparse matrix, dataframe of shape (n_samples, n_features)

y : ndarray of shape (n_samples,), default=None Target values.

**fit_params : dict Additional fit parameters.

Returns ------- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

val get_params : ?deep:bool -> [> tag ] Obj.t -> Py.Object.t

Get the parameters of an estimator from the ensemble.

Parameters ---------- deep : bool, default=True Setting it to True gets the various classifiers and the parameters of the classifiers as well.

val predict : 
  ?predict_params:(string * Py.Object.t) list ->
  x:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  [> `ArrayLike ] Np.Obj.t

Predict target for X.

Parameters ---------- X : array-like, sparse matrix of shape (n_samples, n_features) Training vectors, where n_samples is the number of samples and n_features is the number of features.

**predict_params : dict of str -> obj Parameters to the `predict` called by the `final_estimator`. Note that this may be used to return uncertainties from some estimators with `return_std` or `return_cov`. Be aware that it will only accounts for uncertainty in the final estimator.

Returns ------- y_pred : ndarray of shape (n_samples,) or (n_samples, n_output) Predicted targets.

val score : 
  ?sample_weight:[> `ArrayLike ] Np.Obj.t ->
  x:[> `ArrayLike ] Np.Obj.t ->
  y:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  float

Return the coefficient of determination R^2 of the prediction.

The coefficient R^2 is defined as (1 - u/v), where u is the residual sum of squares ((y_true - y_pred) ** 2).sum() and v is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0.

Parameters ---------- X : array-like of shape (n_samples, n_features) Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead, shape = (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

y : array-like of shape (n_samples,) or (n_samples, n_outputs) True values for X.

sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns ------- score : float R^2 of self.predict(X) wrt. y.

Notes ----- The R2 score used when calling ``score`` on a regressor uses ``multioutput='uniform_average'`` from version 0.23 to keep consistent with default value of :func:`~sklearn.metrics.r2_score`. This influences the ``score`` method of all the multioutput regressors (except for :class:`~sklearn.multioutput.MultiOutputRegressor`).

val set_params : ?params:(string * Py.Object.t) list -> [> tag ] Obj.t -> t

Set the parameters of an estimator from the ensemble.

Valid parameter keys can be listed with `get_params()`.

Parameters ---------- **params : keyword arguments Specific parameters using e.g. `set_params(parameter_name=new_value)`. In addition, to setting the parameters of the stacking estimator, the individual estimator of the stacking estimators can also be set, or can be removed by setting them to 'drop'.

val transform : 
  x:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  [> `ArrayLike ] Np.Obj.t

Return the predictions for X for each estimator.

Parameters ---------- X : array-like, sparse matrix of shape (n_samples, n_features) Training vectors, where `n_samples` is the number of samples and `n_features` is the number of features.

Returns ------- y_preds : ndarray of shape (n_samples, n_estimators) Prediction outputs for each estimator.

val estimators_ : t -> [ `BaseEstimator | `Object ] Np.Obj.t list

Attribute estimators_: get value or raise Not_found if None.

val estimators_opt : t -> [ `BaseEstimator | `Object ] Np.Obj.t list option

Attribute estimators_: get value as an option.

val named_estimators_ : t -> Dict.t

Attribute named_estimators_: get value or raise Not_found if None.

val named_estimators_opt : t -> Dict.t option

Attribute named_estimators_: get value as an option.

val final_estimator_ : t -> [ `BaseEstimator | `Object ] Np.Obj.t

Attribute final_estimator_: get value or raise Not_found if None.

val final_estimator_opt : t -> [ `BaseEstimator | `Object ] Np.Obj.t option

Attribute final_estimator_: get value as an option.

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Format.formatter -> t -> unit

Pretty-print the object to a formatter.