Logistic Regression CV (aka logit, MaxEnt) classifier.
See glossary entry for :term:`cross-validation estimator`.
This class implements logistic regression using liblinear, newton-cg, sag of lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2 regularization with primal formulation. The liblinear solver supports both L1 and L2 regularization, with a dual formulation only for the L2 penalty. Elastic-Net penalty is only supported by the saga solver.
For the grid of `Cs` values and `l1_ratios` values, the best hyperparameter is selected by the cross-validator :class:`~sklearn.model_selection.StratifiedKFold`, but it can be changed using the :term:`cv` parameter. The 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers can warm-start the coefficients (see :term:`Glossary<warm_start>`).
Read more in the :ref:`User Guide <logistic_regression>`.
Parameters ---------- Cs : int or list of floats, default=10 Each of the values in Cs describes the inverse of regularization strength. If Cs is as an int, then a grid of Cs values are chosen in a logarithmic scale between 1e-4 and 1e4. Like in support vector machines, smaller values specify stronger regularization.
fit_intercept : bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.
cv : int or cross-validation generator, default=None The default cross-validation generator used is Stratified K-Folds. If an integer is provided, then it is the number of folds used. See the module :mod:`sklearn.model_selection` module for the list of possible cross-validation objects.
.. versionchanged:: 0.22 ``cv`` default value if None changed from 3-fold to 5-fold.
dual : bool, default=False Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.
penalty : 'l1', 'l2', 'elasticnet'
, default='l2' Used to specify the norm used in the penalization. The 'newton-cg', 'sag' and 'lbfgs' solvers support only l2 penalties. 'elasticnet' is only supported by the 'saga' solver.
scoring : str or callable, default=None A string (see model evaluation documentation) or a scorer callable object / function with signature ``scorer(estimator, X, y)``. For a list of scoring functions that can be used, look at :mod:`sklearn.metrics`. The default scoring option used is 'accuracy'.
solver : 'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga'
, default='lbfgs'
Algorithm to use in the optimization problem.
- For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones.
- For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs' handle multinomial loss; 'liblinear' is limited to one-versus-rest schemes.
- 'newton-cg', 'lbfgs' and 'sag' only handle L2 penalty, whereas 'liblinear' and 'saga' handle L1 penalty.
- 'liblinear' might be slower in LogisticRegressionCV because it does not handle warm-starting.
Note that 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.
.. versionadded:: 0.17 Stochastic Average Gradient descent solver. .. versionadded:: 0.19 SAGA solver.
tol : float, default=1e-4 Tolerance for stopping criteria.
max_iter : int, default=100 Maximum number of iterations of the optimization algorithm.
class_weight : dict or 'balanced', default=None Weights associated with classes in the form ``class_label: weight
``. If not given, all classes are supposed to have weight one.
The 'balanced' mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``.
Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
.. versionadded:: 0.17 class_weight == 'balanced'
n_jobs : int, default=None Number of CPU cores used during the cross-validation loop. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details.
verbose : int, default=0 For the 'liblinear', 'sag' and 'lbfgs' solvers set verbose to any positive number for verbosity.
refit : bool, default=True If set to True, the scores are averaged across all folds, and the coefs and the C that corresponds to the best score is taken, and a final refit is done using these parameters. Otherwise the coefs, intercepts and C that correspond to the best scores across folds are averaged.
intercept_scaling : float, default=1 Useful only when the solver 'liblinear' is used and self.fit_intercept is set to True. In this case, x becomes x, self.intercept_scaling
, i.e. a 'synthetic' feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes ``intercept_scaling * synthetic_feature_weight``.
Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.
multi_class : 'auto, 'ovr', 'multinomial'
, default='auto' If the option chosen is 'ovr', then a binary problem is fit for each label. For 'multinomial' the loss minimised is the multinomial loss fit across the entire probability distribution, *even when the data is binary*. 'multinomial' is unavailable when solver='liblinear'. 'auto' selects 'ovr' if the data is binary, or if solver='liblinear', and otherwise selects 'multinomial'.
.. versionadded:: 0.18 Stochastic Average Gradient descent solver for 'multinomial' case. .. versionchanged:: 0.22 Default changed from 'ovr' to 'auto' in 0.22.
random_state : int, RandomState instance, default=None Used when `solver='sag'`, 'saga' or 'liblinear' to shuffle the data. Note that this only applies to the solver and not the cross-validation generator. See :term:`Glossary <random_state>` for details.
l1_ratios : list of float, default=None The list of Elastic-Net mixing parameter, with ``0 <= l1_ratio <= 1``. Only used if ``penalty='elasticnet'``. A value of 0 is equivalent to using ``penalty='l2'``, while 1 is equivalent to using ``penalty='l1'``. For ``0 < l1_ratio <1``, the penalty is a combination of L1 and L2.
Attributes ---------- classes_ : ndarray of shape (n_classes, ) A list of class labels known to the classifier.
coef_ : ndarray of shape (1, n_features) or (n_classes, n_features) Coefficient of the features in the decision function.
`coef_` is of shape (1, n_features) when the given problem is binary.
intercept_ : ndarray of shape (1,) or (n_classes,) Intercept (a.k.a. bias) added to the decision function.
If `fit_intercept` is set to False, the intercept is set to zero. `intercept_` is of shape(1,) when the problem is binary.
Cs_ : ndarray of shape (n_cs) Array of C i.e. inverse of regularization parameter values used for cross-validation.
l1_ratios_ : ndarray of shape (n_l1_ratios) Array of l1_ratios used for cross-validation. If no l1_ratio is used (i.e. penalty is not 'elasticnet'), this is set to ``None
``
coefs_paths_ : ndarray of shape (n_folds, n_cs, n_features) or (n_folds, n_cs, n_features + 1) dict with classes as the keys, and the path of coefficients obtained during cross-validating across each fold and then across each Cs after doing an OvR for the corresponding class as values. If the 'multi_class' option is set to 'multinomial', then the coefs_paths are the coefficients corresponding to each class. Each dict value has shape ``(n_folds, n_cs, n_features)`` or ``(n_folds, n_cs, n_features + 1)`` depending on whether the intercept is fit or not. If ``penalty='elasticnet'``, the shape is ``(n_folds, n_cs, n_l1_ratios_, n_features)`` or ``(n_folds, n_cs, n_l1_ratios_, n_features + 1)``.
scores_ : dict dict with classes as the keys, and the values as the grid of scores obtained during cross-validating each fold, after doing an OvR for the corresponding class. If the 'multi_class' option given is 'multinomial' then the same scores are repeated across all classes, since this is the multinomial class. Each dict value has shape ``(n_folds, n_cs`` or ``(n_folds, n_cs, n_l1_ratios)`` if ``penalty='elasticnet'``.
C_ : ndarray of shape (n_classes,) or (n_classes - 1,) Array of C that maps to the best scores across every class. If refit is set to False, then for each class, the best C is the average of the C's that correspond to the best scores for each fold. `C_` is of shape(n_classes,) when the problem is binary.
l1_ratio_ : ndarray of shape (n_classes,) or (n_classes - 1,) Array of l1_ratio that maps to the best scores across every class. If refit is set to False, then for each class, the best l1_ratio is the average of the l1_ratio's that correspond to the best scores for each fold. `l1_ratio_` is of shape(n_classes,) when the problem is binary.
n_iter_ : ndarray of shape (n_classes, n_folds, n_cs) or (1, n_folds, n_cs) Actual number of iterations for all classes, folds and Cs. In the binary or multinomial cases, the first dimension is equal to 1. If ``penalty='elasticnet'``, the shape is ``(n_classes, n_folds, n_cs, n_l1_ratios)`` or ``(1, n_folds, n_cs, n_l1_ratios)``.
Examples -------- >>> from sklearn.datasets import load_iris >>> from sklearn.linear_model import LogisticRegressionCV >>> X, y = load_iris(return_X_y=True) >>> clf = LogisticRegressionCV(cv=5, random_state=0).fit(X, y) >>> clf.predict(X:2, :
) array(0, 0
) >>> clf.predict_proba(X:2, :
).shape (2, 3) >>> clf.score(X, y) 0.98...
See also -------- LogisticRegression