package sklearn

You can search for identifiers within the package.

in-package search v0.2.0

package sklearn

sklearn

Legend:
Page
Library
Module
Module type
Parameter
Class
Class type
Source

Module `Impute.SimpleImputer`Source

Sourcetype tag = [

| `SimpleImputer

]

Source

type t =
  [ `BaseEstimator | `Object | `SimpleImputer | `TransformerMixin ] Obj.t

Sourceval of_pyobject : Py.Object.t -> t

Sourceval to_pyobject : [> tag ] Obj.t -> Py.Object.t

Sourceval as_transformer : t -> [ `TransformerMixin ] Obj.t

Sourceval as_estimator : t -> [ `BaseEstimator ] Obj.t

Source

val create : 
  ?missing_values:
    [ `S of string | `I of int | `Np_nan of Py.Object.t | `F of float | `None ] ->
  ?strategy:[ `Mean | `Median | `Most_frequent | `Constant ] ->
  ?fill_value:[ `S of string | `I of int | `F of float ] ->
  ?verbose:int ->
  ?copy:bool ->
  ?add_indicator:bool ->
  unit ->
  t

Imputation transformer for completing missing values.

Read more in the :ref:`User Guide <impute>`.

.. versionadded:: 0.20 `SimpleImputer` replaces the previous `sklearn.preprocessing.Imputer` estimator which is now removed.

Parameters ---------- missing_values : number, string, np.nan (default) or None The placeholder for the missing values. All occurrences of `missing_values` will be imputed. For pandas' dataframes with nullable integer dtypes with missing values, `missing_values` should be set to `np.nan`, since `pd.NA` will be converted to `np.nan`.

strategy : string, default='mean' The imputation strategy.

If 'mean', then replace missing values using the mean along each column. Can only be used with numeric data.
If 'median', then replace missing values using the median along each column. Can only be used with numeric data.
If 'most_frequent', then replace missing using the most frequent value along each column. Can be used with strings or numeric data.
If 'constant', then replace missing values with fill_value. Can be used with strings or numeric data.

.. versionadded:: 0.20 strategy='constant' for fixed value imputation.

fill_value : string or numerical value, default=None When strategy == 'constant', fill_value is used to replace all occurrences of missing_values. If left to the default, fill_value will be 0 when imputing numerical data and 'missing_value' for strings or object data types.

verbose : integer, default=0 Controls the verbosity of the imputer.

copy : boolean, default=True If True, a copy of X will be created. If False, imputation will be done in-place whenever possible. Note that, in the following cases, a new copy will always be made, even if `copy=False`:

If X is not an array of floating values;
If X is encoded as a CSR matrix;
If add_indicator=True.

add_indicator : boolean, default=False If True, a :class:`MissingIndicator` transform will stack onto output of the imputer's transform. This allows a predictive estimator to account for missingness despite imputation. If a feature has no missing values at fit/train time, the feature won't appear on the missing indicator even if there are missing values at transform/test time.

Attributes ---------- statistics_ : array of shape (n_features,) The imputation fill value for each feature. Computing statistics can result in `np.nan` values. During :meth:`transform`, features corresponding to `np.nan` statistics will be discarded.

indicator_ : :class:`sklearn.impute.MissingIndicator` Indicator used to add binary indicators for missing values. ``None`` if add_indicator is False.

See also -------- IterativeImputer : Multivariate imputation of missing values.

Examples -------- >>> import numpy as np >>> from sklearn.impute import SimpleImputer >>> imp_mean = SimpleImputer(missing_values=np.nan, strategy='mean') >>> imp_mean.fit([7, 2, 3], [4, np.nan, 6], [10, 5, 9]) SimpleImputer() >>> X = [np.nan, 2, 3], [4, np.nan, 6], [10, np.nan, 9] >>> print(imp_mean.transform(X)) [ 7. 2. 3. ] [ 4. 3.5 6. ] [10. 3.5 9. ]

Notes ----- Columns which only contained missing values at :meth:`fit` are discarded upon :meth:`transform` if strategy is not 'constant'.

Sourceval fit : ?y:Py.Object.t -> x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> t

Fit the imputer on X.

Parameters ---------- X : array-like, sparse matrix, shape (n_samples, n_features) Input data, where ``n_samples`` is the number of samples and ``n_features`` is the number of features.

Returns ------- self : SimpleImputer

Source

val fit_transform : 
  ?y:[> `ArrayLike ] Np.Obj.t ->
  ?fit_params:(string * Py.Object.t) list ->
  x:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  [> `ArrayLike ] Np.Obj.t

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters ---------- X : array-like, sparse matrix, dataframe of shape (n_samples, n_features)

y : ndarray of shape (n_samples,), default=None Target values.

**fit_params : dict Additional fit parameters.

Returns ------- X_new : ndarray array of shape (n_samples, n_features_new) Transformed array.

Sourceval get_params : ?deep:bool -> [> tag ] Obj.t -> Dict.t

Get parameters for this estimator.

Parameters ---------- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns ------- params : mapping of string to any Parameter names mapped to their values.

Sourceval set_params : ?params:(string * Py.Object.t) list -> [> tag ] Obj.t -> t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form ``<component>__<parameter>`` so that it's possible to update each component of a nested object.

Parameters ---------- **params : dict Estimator parameters.

Returns ------- self : object Estimator instance.

Source

val transform : 
  x:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  [> `ArrayLike ] Np.Obj.t

Impute all missing values in X.

Parameters ---------- X : array-like, sparse matrix, shape (n_samples, n_features) The input data to complete.

Sourceval statistics_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute statistics_: get value or raise Not_found if None.

Sourceval statistics_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute statistics_: get value as an option.

Sourceval indicator_ : t -> Py.Object.t

Attribute indicator_: get value or raise Not_found if None.

Sourceval indicator_opt : t -> Py.Object.t option

Attribute indicator_: get value as an option.

Sourceval to_string : t -> string

Print the object to a human-readable representation.

Sourceval show : t -> string

Print the object to a human-readable representation.

Sourceval pp : Format.formatter -> t -> unit

Pretty-print the object to a formatter.

package sklearn

Module Impute.SimpleImputerSource

Module `Impute.SimpleImputer`Source