package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
type tag = [
  1. | `SimpleImputer
]
type t = [ `BaseEstimator | `Object | `SimpleImputer | `TransformerMixin ] Obj.t
val of_pyobject : Py.Object.t -> t
val to_pyobject : [> tag ] Obj.t -> Py.Object.t
val as_transformer : t -> [ `TransformerMixin ] Obj.t
val as_estimator : t -> [ `BaseEstimator ] Obj.t
val create : ?missing_values: [ `S of string | `I of int | `Np_nan of Py.Object.t | `F of float | `None ] -> ?strategy:[ `Mean | `Median | `Most_frequent | `Constant ] -> ?fill_value:[ `S of string | `I of int | `F of float ] -> ?verbose:int -> ?copy:bool -> ?add_indicator:bool -> unit -> t

Imputation transformer for completing missing values.

Read more in the :ref:`User Guide <impute>`.

.. versionadded:: 0.20 `SimpleImputer` replaces the previous `sklearn.preprocessing.Imputer` estimator which is now removed.

Parameters ---------- missing_values : number, string, np.nan (default) or None The placeholder for the missing values. All occurrences of `missing_values` will be imputed. For pandas' dataframes with nullable integer dtypes with missing values, `missing_values` should be set to `np.nan`, since `pd.NA` will be converted to `np.nan`.

strategy : string, default='mean' The imputation strategy.

  • If 'mean', then replace missing values using the mean along each column. Can only be used with numeric data.
  • If 'median', then replace missing values using the median along each column. Can only be used with numeric data.
  • If 'most_frequent', then replace missing using the most frequent value along each column. Can be used with strings or numeric data.
  • If 'constant', then replace missing values with fill_value. Can be used with strings or numeric data.

.. versionadded:: 0.20 strategy='constant' for fixed value imputation.

fill_value : string or numerical value, default=None When strategy == 'constant', fill_value is used to replace all occurrences of missing_values. If left to the default, fill_value will be 0 when imputing numerical data and 'missing_value' for strings or object data types.

verbose : integer, default=0 Controls the verbosity of the imputer.

copy : boolean, default=True If True, a copy of X will be created. If False, imputation will be done in-place whenever possible. Note that, in the following cases, a new copy will always be made, even if `copy=False`:

  • If X is not an array of floating values;
  • If X is encoded as a CSR matrix;
  • If add_indicator=True.

add_indicator : boolean, default=False If True, a :class:`MissingIndicator` transform will stack onto output of the imputer's transform. This allows a predictive estimator to account for missingness despite imputation. If a feature has no missing values at fit/train time, the feature won't appear on the missing indicator even if there are missing values at transform/test time.

Attributes ---------- statistics_ : array of shape (n_features,) The imputation fill value for each feature. Computing statistics can result in `np.nan` values. During :meth:`transform`, features corresponding to `np.nan` statistics will be discarded.

indicator_ : :class:`sklearn.impute.MissingIndicator` Indicator used to add binary indicators for missing values. ``None`` if add_indicator is False.

See also -------- IterativeImputer : Multivariate imputation of missing values.

Examples -------- >>> import numpy as np >>> from sklearn.impute import SimpleImputer >>> imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean') >>> imp_mean.fit([7, 2, 3], [4, np.nan, 6], [10, 5, 9]) SimpleImputer() >>> X = [np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9] >>> print(imp_mean.transform(X)) [ 7. 2. 3. ] [ 4. 3.5 6. ] [10. 3.5 9. ]

Notes ----- Columns which only contained missing values at :meth:`fit` are discarded upon :meth:`transform` if strategy is not 'constant'.

val fit : ?y:Py.Object.t -> x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> t

Fit the imputer on X.

Parameters ---------- X : array-like, sparse matrix, shape (n_samples, n_features) Input data, where ``n_samples`` is the number of samples and ``n_features`` is the number of features.

Returns ------- self : SimpleImputer

val fit_transform : ?y:[> `ArrayLike ] Np.Obj.t -> ?fit_params:(string * Py.Object.t) list -> x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> [> `ArrayLike ] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters ---------- X : array-like, sparse matrix, dataframe of shape (n_samples, n_features)

y : ndarray of shape (n_samples,), default=None Target values.

**fit_params : dict Additional fit parameters.

Returns ------- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

val get_params : ?deep:bool -> [> tag ] Obj.t -> Dict.t

Get parameters for this estimator.

Parameters ---------- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns ------- params : mapping of string to any Parameter names mapped to their values.

val set_params : ?params:(string * Py.Object.t) list -> [> tag ] Obj.t -> t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form ``<component>__<parameter>`` so that it's possible to update each component of a nested object.

Parameters ---------- **params : dict Estimator parameters.

Returns ------- self : object Estimator instance.

val transform : x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> [> `ArrayLike ] Np.Obj.t

Impute all missing values in X.

Parameters ---------- X : array-like, sparse matrix, shape (n_samples, n_features) The input data to complete.

val statistics_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute statistics_: get value or raise Not_found if None.

val statistics_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute statistics_: get value as an option.

val indicator_ : t -> Py.Object.t

Attribute indicator_: get value or raise Not_found if None.

val indicator_opt : t -> Py.Object.t option

Attribute indicator_: get value as an option.

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Stdlib.Format.formatter -> t -> unit

Pretty-print the object to a formatter.