Imputation transformer for completing missing values.
Read more in the :ref:`User Guide <impute>`.
.. versionadded:: 0.20 `SimpleImputer` replaces the previous `sklearn.preprocessing.Imputer` estimator which is now removed.
Parameters ---------- missing_values : number, string, np.nan (default) or None The placeholder for the missing values. All occurrences of `missing_values` will be imputed. For pandas' dataframes with nullable integer dtypes with missing values, `missing_values` should be set to `np.nan`, since `pd.NA` will be converted to `np.nan`.
strategy : string, default='mean' The imputation strategy.
- If 'mean', then replace missing values using the mean along each column. Can only be used with numeric data.
- If 'median', then replace missing values using the median along each column. Can only be used with numeric data.
- If 'most_frequent', then replace missing using the most frequent value along each column. Can be used with strings or numeric data.
- If 'constant', then replace missing values with fill_value. Can be used with strings or numeric data.
.. versionadded:: 0.20 strategy='constant' for fixed value imputation.
fill_value : string or numerical value, default=None When strategy == 'constant', fill_value is used to replace all occurrences of missing_values. If left to the default, fill_value will be 0 when imputing numerical data and 'missing_value' for strings or object data types.
verbose : integer, default=0 Controls the verbosity of the imputer.
copy : boolean, default=True If True, a copy of X will be created. If False, imputation will be done in-place whenever possible. Note that, in the following cases, a new copy will always be made, even if `copy=False`:
- If X is not an array of floating values;
- If X is encoded as a CSR matrix;
- If add_indicator=True.
add_indicator : boolean, default=False If True, a :class:`MissingIndicator` transform will stack onto output of the imputer's transform. This allows a predictive estimator to account for missingness despite imputation. If a feature has no missing values at fit/train time, the feature won't appear on the missing indicator even if there are missing values at transform/test time.
Attributes ---------- statistics_ : array of shape (n_features,) The imputation fill value for each feature. Computing statistics can result in `np.nan` values. During :meth:`transform`, features corresponding to `np.nan` statistics will be discarded.
indicator_ : :class:`sklearn.impute.MissingIndicator` Indicator used to add binary indicators for missing values. ``None`` if add_indicator is False.
See also -------- IterativeImputer : Multivariate imputation of missing values.
Examples -------- >>> import numpy as np >>> from sklearn.impute import SimpleImputer >>> imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean') >>> imp_mean.fit([7, 2, 3], [4, np.nan, 6], [10, 5, 9]
) SimpleImputer() >>> X = [np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9]
>>> print(imp_mean.transform(X)) [ 7. 2. 3. ]
[ 4. 3.5 6. ]
[10. 3.5 9. ]
Notes ----- Columns which only contained missing values at :meth:`fit` are discarded upon :meth:`transform` if strategy is not 'constant'.