Meta-transformer for selecting features based on importance weights.
.. versionadded:: 0.17
Parameters ---------- estimator : object The base estimator from which the transformer is built. This can be both a fitted (if ``prefit`` is set to True) or a non-fitted estimator. The estimator must have either a ``feature_importances_`` or ``coef_`` attribute after fitting.
threshold : string, float, optional default None The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If 'median' (resp. 'mean'), then the ``threshold`` value is the median (resp. the mean) of the feature importances. A scaling factor (e.g., '1.25*mean') may also be used. If None and if the estimator has a parameter penalty set to l1, either explicitly or implicitly (e.g, Lasso), the threshold used is 1e-5. Otherwise, 'mean' is used by default.
prefit : bool, default False Whether a prefit model is expected to be passed into the constructor directly or not. If True, ``transform`` must be called directly and SelectFromModel cannot be used with ``cross_val_score``, ``GridSearchCV`` and similar utilities that clone the estimator. Otherwise train the model using ``fit`` and then ``transform`` to do feature selection.
norm_order : non-zero int, inf, -inf, default 1 Order of the norm used to filter the vectors of coefficients below ``threshold`` in the case where the ``coef_`` attribute of the estimator is of dimension 2.
max_features : int or None, optional The maximum number of features to select. To only select based on ``max_features``, set ``threshold=-np.inf``.
.. versionadded:: 0.20
Attributes ---------- estimator_ : an estimator The base estimator from which the transformer is built. This is stored only when a non-fitted estimator is passed to the ``SelectFromModel``, i.e when prefit is False.
threshold_ : float The threshold value used for feature selection.
Notes ----- Allows NaN/Inf in the input if the underlying estimator does as well.
Examples -------- >>> from sklearn.feature_selection import SelectFromModel >>> from sklearn.linear_model import LogisticRegression >>> X = [ 0.87, -1.34, 0.31 ],
... [-2.79, -0.02, -0.85 ],
... [-1.34, -0.48, -2.55 ],
... [ 1.92, 1.48, 0.65 ]
>>> y = 0, 1, 0, 1
>>> selector = SelectFromModel(estimator=LogisticRegression()).fit(X, y) >>> selector.estimator_.coef_ array([-0.3252302 , 0.83462377, 0.49750423]
) >>> selector.threshold_ 0.55245... >>> selector.get_support() array(False, True, False
) >>> selector.transform(X) array([-1.34],
[-0.02],
[-0.48],
[ 1.48]
)