Feature ranking with recursive feature elimination and cross-validated selection of the best number of features.
See glossary entry for :term:`cross-validation estimator`.
Read more in the :ref:`User Guide <rfe>`.
Parameters ---------- estimator : object A supervised learning estimator with a ``fit`` method that provides information about feature importance either through a ``coef_`` attribute or through a ``feature_importances_`` attribute.
step : int or float, optional (default=1) If greater than or equal to 1, then ``step`` corresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), then ``step`` corresponds to the percentage (rounded down) of features to remove at each iteration. Note that the last iteration may remove fewer than ``step`` features in order to reach ``min_features_to_select``.
min_features_to_select : int, (default=1) The minimum number of features to be selected. This number of features will always be scored, even if the difference between the original feature count and ``min_features_to_select`` isn't divisible by ``step``.
.. versionadded:: 0.20
cv : int, cross-validation generator or an iterable, optional Determines the cross-validation splitting strategy. Possible inputs for cv are:
- None, to use the default 5-fold cross-validation,
- integer, to specify the number of folds.
- :term:`CV splitter`,
- An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs, if ``y`` is binary or multiclass, :class:`sklearn.model_selection.StratifiedKFold` is used. If the estimator is a classifier or if ``y`` is neither binary nor multiclass, :class:`sklearn.model_selection.KFold` is used.
Refer :ref:`User Guide <cross_validation>` for the various cross-validation strategies that can be used here.
.. versionchanged:: 0.22 ``cv`` default value of None changed from 3-fold to 5-fold.
scoring : string, callable or None, optional, (default=None) A string (see model evaluation documentation) or a scorer callable object / function with signature ``scorer(estimator, X, y)``.
verbose : int, (default=0) Controls verbosity of output.
n_jobs : int or None, optional (default=None) Number of cores to run in parallel while fitting across folds. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details.
.. versionadded:: 0.18
Attributes ---------- n_features_ : int The number of selected features with cross-validation.
support_ : array of shape n_features
The mask of selected features.
ranking_ : array of shape n_features
The feature ranking, such that `ranking_i
` corresponds to the ranking position of the i-th feature. Selected (i.e., estimated best) features are assigned rank 1.
grid_scores_ : array of shape n_subsets_of_features
The cross-validation scores such that ``grid_scores_i
`` corresponds to the CV score of the i-th subset of features.
estimator_ : object The external estimator fit on the reduced dataset.
Notes ----- The size of ``grid_scores_`` is equal to ``ceil((n_features - min_features_to_select) / step) + 1``, where step is the number of features removed at each iteration.
Allows NaN/Inf in the input if the underlying estimator does as well.
Examples -------- The following example shows how to retrieve the a-priori not known 5 informative features in the Friedman #1 dataset.
>>> from sklearn.datasets import make_friedman1 >>> from sklearn.feature_selection import RFECV >>> from sklearn.svm import SVR >>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0) >>> estimator = SVR(kernel='linear') >>> selector = RFECV(estimator, step=1, cv=5) >>> selector = selector.fit(X, y) >>> selector.support_ array( True, True, True, True, True, False, False, False, False,
False
) >>> selector.ranking_ array(1, 1, 1, 1, 1, 6, 4, 3, 2, 5
)
See also -------- RFE : Recursive feature elimination
References ----------
.. 1
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., 'Gene selection for cancer classification using support vector machines', Mach. Learn., 46(1-3), 389--422, 2002.