package sklearn

You can search for identifiers within the package.

in-package search v0.2.0

package sklearn

sklearn

Legend:
Page
Library
Module
Module type
Parameter
Class
Class type
Source

Module `Linear_model.LogisticRegressionCV`Source

Sourcetype tag = [

| `LogisticRegressionCV

]

Source

type t =
  [ `BaseEstimator
  | `ClassifierMixin
  | `LinearClassifierMixin
  | `LogisticRegressionCV
  | `Object
  | `SparseCoefMixin ]
    Obj.t

Sourceval of_pyobject : Py.Object.t -> t

Sourceval to_pyobject : [> tag ] Obj.t -> Py.Object.t

Sourceval as_estimator : t -> [ `BaseEstimator ] Obj.t

Sourceval as_classifier : t -> [ `ClassifierMixin ] Obj.t

Sourceval as_linear_classifier : t -> [ `LinearClassifierMixin ] Obj.t

Sourceval as_sparse_coef : t -> [ `SparseCoefMixin ] Obj.t

Source

val create : 
  ?cs:[ `Fs of float list | `I of int ] ->
  ?fit_intercept:bool ->
  ?cv:[ `BaseCrossValidator of [> `BaseCrossValidator ] Np.Obj.t | `I of int ] ->
  ?dual:bool ->
  ?penalty:[ `L1 | `L2 | `Elasticnet ] ->
  ?scoring:
    [ `Neg_mean_absolute_error
    | `Completeness_score
    | `Roc_auc_ovr
    | `Neg_mean_squared_log_error
    | `Neg_mean_gamma_deviance
    | `Precision_macro
    | `R2
    | `Precision_micro
    | `F1_weighted
    | `Balanced_accuracy
    | `Neg_mean_squared_error
    | `F1_samples
    | `Jaccard_micro
    | `Normalized_mutual_info_score
    | `F1_micro
    | `Roc_auc
    | `Mutual_info_score
    | `Adjusted_rand_score
    | `Average_precision
    | `Jaccard
    | `Homogeneity_score
    | `Accuracy
    | `Jaccard_macro
    | `Jaccard_weighted
    | `Recall_micro
    | `Explained_variance
    | `Precision
    | `Callable of Py.Object.t
    | `V_measure_score
    | `F1
    | `Roc_auc_ovo
    | `Neg_mean_poisson_deviance
    | `Recall_samples
    | `Adjusted_mutual_info_score
    | `Neg_brier_score
    | `Roc_auc_ovo_weighted
    | `Recall
    | `Fowlkes_mallows_score
    | `Neg_log_loss
    | `Neg_root_mean_squared_error
    | `Precision_samples
    | `F1_macro
    | `Roc_auc_ovr_weighted
    | `Recall_weighted
    | `Neg_median_absolute_error
    | `Jaccard_samples
    | `Precision_weighted
    | `Max_error
    | `Recall_macro ] ->
  ?solver:[ `Newton_cg | `Lbfgs | `Liblinear | `Sag | `Saga ] ->
  ?tol:float ->
  ?max_iter:int ->
  ?class_weight:[ `Balanced | `DictIntToFloat of (int * float) list ] ->
  ?n_jobs:int ->
  ?verbose:int ->
  ?refit:bool ->
  ?intercept_scaling:float ->
  ?multi_class:[ `Ovr | `T_auto of Py.Object.t | `Multinomial ] ->
  ?random_state:int ->
  ?l1_ratios:float list ->
  unit ->
  t

Logistic Regression CV (aka logit, MaxEnt) classifier.

See glossary entry for :term:`cross-validation estimator`.

This class implements logistic regression using liblinear, newton-cg, sag of lbfgs optimizer. The newton-cg, sag and lbfgs solvers support only L2 regularization with primal formulation. The liblinear solver supports both L1 and L2 regularization, with a dual formulation only for the L2 penalty. Elastic-Net penalty is only supported by the saga solver.

For the grid of `Cs` values and `l1_ratios` values, the best hyperparameter is selected by the cross-validator :class:`~sklearn.model_selection.StratifiedKFold`, but it can be changed using the :term:`cv` parameter. The 'newton-cg', 'sag', 'saga' and 'lbfgs' solvers can warm-start the coefficients (see :term:`Glossary<warm_start>`).

Read more in the :ref:`User Guide <logistic_regression>`.

Parameters ---------- Cs : int or list of floats, default=10 Each of the values in Cs describes the inverse of regularization strength. If Cs is as an int, then a grid of Cs values are chosen in a logarithmic scale between 1e-4 and 1e4. Like in support vector machines, smaller values specify stronger regularization.

fit_intercept : bool, default=True Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

cv : int or cross-validation generator, default=None The default cross-validation generator used is Stratified K-Folds. If an integer is provided, then it is the number of folds used. See the module :mod:`sklearn.model_selection` module for the list of possible cross-validation objects.

.. versionchanged:: 0.22 ``cv`` default value if None changed from 3-fold to 5-fold.

dual : bool, default=False Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.

penalty : 'l1', 'l2', 'elasticnet', default='l2' Used to specify the norm used in the penalization. The 'newton-cg', 'sag' and 'lbfgs' solvers support only l2 penalties. 'elasticnet' is only supported by the 'saga' solver.

scoring : str or callable, default=None A string (see model evaluation documentation) or a scorer callable object / function with signature ``scorer(estimator, X, y)``. For a list of scoring functions that can be used, look at :mod:`sklearn.metrics`. The default scoring option used is 'accuracy'.

solver : 'newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga', default='lbfgs'

Algorithm to use in the optimization problem.

For small datasets, 'liblinear' is a good choice, whereas 'sag' and 'saga' are faster for large ones.
For multiclass problems, only 'newton-cg', 'sag', 'saga' and 'lbfgs' handle multinomial loss; 'liblinear' is limited to one-versus-rest schemes.
'newton-cg', 'lbfgs' and 'sag' only handle L2 penalty, whereas 'liblinear' and 'saga' handle L1 penalty.
'liblinear' might be slower in LogisticRegressionCV because it does not handle warm-starting.

Note that 'sag' and 'saga' fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.

.. versionadded:: 0.17 Stochastic Average Gradient descent solver. .. versionadded:: 0.19 SAGA solver.

tol : float, default=1e-4 Tolerance for stopping criteria.

max_iter : int, default=100 Maximum number of iterations of the optimization algorithm.

class_weight : dict or 'balanced', default=None Weights associated with classes in the form ``class_label: weight``. If not given, all classes are supposed to have weight one.

The 'balanced' mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``.

Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.

.. versionadded:: 0.17 class_weight == 'balanced'

n_jobs : int, default=None Number of CPU cores used during the cross-validation loop. ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details.

verbose : int, default=0 For the 'liblinear', 'sag' and 'lbfgs' solvers set verbose to any positive number for verbosity.

refit : bool, default=True If set to True, the scores are averaged across all folds, and the coefs and the C that corresponds to the best score is taken, and a final refit is done using these parameters. Otherwise the coefs, intercepts and C that correspond to the best scores across folds are averaged.

intercept_scaling : float, default=1 Useful only when the solver 'liblinear' is used and self.fit_intercept is set to True. In this case, x becomes x, self.intercept_scaling, i.e. a 'synthetic' feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes ``intercept_scaling * synthetic_feature_weight``.

Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.

multi_class : 'auto, 'ovr', 'multinomial', default='auto' If the option chosen is 'ovr', then a binary problem is fit for each label. For 'multinomial' the loss minimised is the multinomial loss fit across the entire probability distribution, *even when the data is binary*. 'multinomial' is unavailable when solver='liblinear'. 'auto' selects 'ovr' if the data is binary, or if solver='liblinear', and otherwise selects 'multinomial'.

.. versionadded:: 0.18 Stochastic Average Gradient descent solver for 'multinomial' case. .. versionchanged:: 0.22 Default changed from 'ovr' to 'auto' in 0.22.

random_state : int, RandomState instance, default=None Used when `solver='sag'`, 'saga' or 'liblinear' to shuffle the data. Note that this only applies to the solver and not the cross-validation generator. See :term:`Glossary <random_state>` for details.

l1_ratios : list of float, default=None The list of Elastic-Net mixing parameter, with ``0 <= l1_ratio <= 1``. Only used if ``penalty='elasticnet'``. A value of 0 is equivalent to using ``penalty='l2'``, while 1 is equivalent to using ``penalty='l1'``. For ``0 < l1_ratio <1``, the penalty is a combination of L1 and L2.

Attributes ---------- classes_ : ndarray of shape (n_classes, ) A list of class labels known to the classifier.

coef_ : ndarray of shape (1, n_features) or (n_classes, n_features) Coefficient of the features in the decision function.

`coef_` is of shape (1, n_features) when the given problem is binary.

intercept_ : ndarray of shape (1,) or (n_classes,) Intercept (a.k.a. bias) added to the decision function.

If `fit_intercept` is set to False, the intercept is set to zero. `intercept_` is of shape(1,) when the problem is binary.

Cs_ : ndarray of shape (n_cs) Array of C i.e. inverse of regularization parameter values used for cross-validation.

l1_ratios_ : ndarray of shape (n_l1_ratios) Array of l1_ratios used for cross-validation. If no l1_ratio is used (i.e. penalty is not 'elasticnet'), this is set to ``None``

coefs_paths_ : ndarray of shape (n_folds, n_cs, n_features) or (n_folds, n_cs, n_features + 1) dict with classes as the keys, and the path of coefficients obtained during cross-validating across each fold and then across each Cs after doing an OvR for the corresponding class as values. If the 'multi_class' option is set to 'multinomial', then the coefs_paths are the coefficients corresponding to each class. Each dict value has shape ``(n_folds, n_cs, n_features)`` or ``(n_folds, n_cs, n_features + 1)`` depending on whether the intercept is fit or not. If ``penalty='elasticnet'``, the shape is ``(n_folds, n_cs, n_l1_ratios_, n_features)`` or ``(n_folds, n_cs, n_l1_ratios_, n_features + 1)``.

scores_ : dict dict with classes as the keys, and the values as the grid of scores obtained during cross-validating each fold, after doing an OvR for the corresponding class. If the 'multi_class' option given is 'multinomial' then the same scores are repeated across all classes, since this is the multinomial class. Each dict value has shape ``(n_folds, n_cs`` or ``(n_folds, n_cs, n_l1_ratios)`` if ``penalty='elasticnet'``.

C_ : ndarray of shape (n_classes,) or (n_classes - 1,) Array of C that maps to the best scores across every class. If refit is set to False, then for each class, the best C is the average of the C's that correspond to the best scores for each fold. `C_` is of shape(n_classes,) when the problem is binary.

l1_ratio_ : ndarray of shape (n_classes,) or (n_classes - 1,) Array of l1_ratio that maps to the best scores across every class. If refit is set to False, then for each class, the best l1_ratio is the average of the l1_ratio's that correspond to the best scores for each fold. `l1_ratio_` is of shape(n_classes,) when the problem is binary.

n_iter_ : ndarray of shape (n_classes, n_folds, n_cs) or (1, n_folds, n_cs) Actual number of iterations for all classes, folds and Cs. In the binary or multinomial cases, the first dimension is equal to 1. If ``penalty='elasticnet'``, the shape is ``(n_classes, n_folds, n_cs, n_l1_ratios)`` or ``(1, n_folds, n_cs, n_l1_ratios)``.

Examples -------- >>> from sklearn.datasets import load_iris >>> from sklearn.linear_model import LogisticRegressionCV >>> X, y = load_iris(return_X_y=True) >>> clf = LogisticRegressionCV(cv=5, random_state=0).fit(X, y) >>> clf.predict(X:2, :) array(0, 0) >>> clf.predict_proba(X:2, :).shape (2, 3) >>> clf.score(X, y) 0.98...

See also -------- LogisticRegression

Source

val decision_function : 
  x:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  [> `ArrayLike ] Np.Obj.t

Predict confidence scores for samples.

The confidence score for a sample is the signed distance of that sample to the hyperplane.

Parameters ---------- X : array_like or sparse matrix, shape (n_samples, n_features) Samples.

Returns ------- array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes) Confidence scores per (sample, class) combination. In the binary case, confidence score for self.classes_1 where >0 means this class would be predicted.

Sourceval densify : [> tag ] Obj.t -> t

Convert coefficient matrix to dense array format.

Converts the ``coef_`` member (back) to a numpy.ndarray. This is the default format of ``coef_`` and is required for fitting, so calling this method is only required on models that have previously been sparsified; otherwise, it is a no-op.

Returns ------- self Fitted estimator.

Source

val fit : 
  ?sample_weight:[> `ArrayLike ] Np.Obj.t ->
  x:[> `ArrayLike ] Np.Obj.t ->
  y:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  t

Fit the model according to the given training data.

Parameters ---------- X : array-like, sparse matrix of shape (n_samples, n_features) Training vector, where n_samples is the number of samples and n_features is the number of features.

y : array-like of shape (n_samples,) Target vector relative to X.

sample_weight : array-like of shape (n_samples,) default=None Array of weights that are assigned to individual samples. If not provided, then each sample is given unit weight.

Returns ------- self : object

Sourceval get_params : ?deep:bool -> [> tag ] Obj.t -> Dict.t

Get parameters for this estimator.

Parameters ---------- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns ------- params : mapping of string to any Parameter names mapped to their values.

Source

val predict : 
  x:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  [> `ArrayLike ] Np.Obj.t

Predict class labels for samples in X.

Parameters ---------- X : array_like or sparse matrix, shape (n_samples, n_features) Samples.

Returns ------- C : array, shape n_samples Predicted class label per sample.

Source

val predict_log_proba : 
  x:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  [> `ArrayLike ] Np.Obj.t

Predict logarithm of probability estimates.

The returned estimates for all classes are ordered by the label of classes.

Parameters ---------- X : array-like of shape (n_samples, n_features) Vector to be scored, where `n_samples` is the number of samples and `n_features` is the number of features.

Returns ------- T : array-like of shape (n_samples, n_classes) Returns the log-probability of the sample for each class in the model, where classes are ordered as they are in ``self.classes_``.

Source

val predict_proba : 
  x:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  [> `ArrayLike ] Np.Obj.t

Probability estimates.

The returned estimates for all classes are ordered by the label of classes.

For a multi_class problem, if multi_class is set to be 'multinomial' the softmax function is used to find the predicted probability of each class. Else use a one-vs-rest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function. and normalize these values across all the classes.

Parameters ---------- X : array-like of shape (n_samples, n_features) Vector to be scored, where `n_samples` is the number of samples and `n_features` is the number of features.

Returns ------- T : array-like of shape (n_samples, n_classes) Returns the probability of the sample for each class in the model, where classes are ordered as they are in ``self.classes_``.

Source

val score : 
  ?sample_weight:[> `ArrayLike ] Np.Obj.t ->
  x:[> `ArrayLike ] Np.Obj.t ->
  y:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  float

Returns the score using the `scoring` option on the given test data and labels.

Parameters ---------- X : array-like of shape (n_samples, n_features) Test samples.

y : array-like of shape (n_samples,) True labels for X.

sample_weight : array-like of shape (n_samples,), default=None Sample weights.

Returns ------- score : float Score of self.predict(X) wrt. y.

Sourceval set_params : ?params:(string * Py.Object.t) list -> [> tag ] Obj.t -> t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form ``<component>__<parameter>`` so that it's possible to update each component of a nested object.

Parameters ---------- **params : dict Estimator parameters.

Returns ------- self : object Estimator instance.

Sourceval sparsify : [> tag ] Obj.t -> t

Convert coefficient matrix to sparse format.

Converts the ``coef_`` member to a scipy.sparse matrix, which for L1-regularized models can be much more memory- and storage-efficient than the usual numpy.ndarray representation.

The ``intercept_`` member is not converted.

Returns ------- self Fitted estimator.

Notes ----- For non-sparse models, i.e. when there are not many zeros in ``coef_``, this may actually *increase* memory usage, so use this method with care. A rule of thumb is that the number of zero elements, which can be computed with ``(coef_ == 0).sum()``, must be more than 50% for this to provide significant benefits.

After calling this method, further fitting with the partial_fit method (if any) will not work until you call densify.

Sourceval classes_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute classes_: get value or raise Not_found if None.

Sourceval classes_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute classes_: get value as an option.

Sourceval coef_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute coef_: get value or raise Not_found if None.

Sourceval coef_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute coef_: get value as an option.

Sourceval intercept_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute intercept_: get value or raise Not_found if None.

Sourceval intercept_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute intercept_: get value as an option.

Sourceval cs_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute Cs_: get value or raise Not_found if None.

Sourceval cs_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute Cs_: get value as an option.

Sourceval l1_ratios_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute l1_ratios_: get value or raise Not_found if None.

Sourceval l1_ratios_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute l1_ratios_: get value as an option.

Sourceval coefs_paths_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute coefs_paths_: get value or raise Not_found if None.

Sourceval coefs_paths_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute coefs_paths_: get value as an option.

Sourceval scores_ : t -> Dict.t

Attribute scores_: get value or raise Not_found if None.

Sourceval scores_opt : t -> Dict.t option

Attribute scores_: get value as an option.

Sourceval c_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute C_: get value or raise Not_found if None.

Sourceval c_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute C_: get value as an option.

Sourceval l1_ratio_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute l1_ratio_: get value or raise Not_found if None.

Sourceval l1_ratio_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute l1_ratio_: get value as an option.

Sourceval n_iter_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute n_iter_: get value or raise Not_found if None.

Sourceval n_iter_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute n_iter_: get value as an option.

Sourceval to_string : t -> string

Print the object to a human-readable representation.

Sourceval show : t -> string

Print the object to a human-readable representation.

Sourceval pp : Format.formatter -> t -> unit

Pretty-print the object to a formatter.

package sklearn

Module Linear_model.LogisticRegressionCVSource

Module `Linear_model.LogisticRegressionCV`Source