package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
module Parallel : sig ... end
val cartesian : ?out:Py.Object.t -> arrays:Py.Object.t -> unit -> Ndarray.t

Generate a cartesian product of input arrays.

Parameters ---------- arrays : list of array-like 1-D arrays to form the cartesian product of. out : ndarray Array to place the cartesian product in.

Returns ------- out : ndarray 2-D array of shape (M, len(arrays)) containing cartesian products formed of input arrays.

Examples -------- >>> cartesian((1, 2, 3, 4, 5, 6, 7)) array([1, 4, 6], [1, 4, 7], [1, 5, 6], [1, 5, 7], [2, 4, 6], [2, 4, 7], [2, 5, 6], [2, 5, 7], [3, 4, 6], [3, 4, 7], [3, 5, 6], [3, 5, 7])

val check_array : ?accept_sparse: [ `String of string | `Bool of bool | `StringList of string list ] -> ?accept_large_sparse:bool -> ?dtype: [ `String of string | `Dtype of Py.Object.t | `TypeList of Py.Object.t | `None ] -> ?order:[ `F | `C | `None ] -> ?copy:bool -> ?force_all_finite:[ `Bool of bool | `Allow_nan ] -> ?ensure_2d:bool -> ?allow_nd:bool -> ?ensure_min_samples:int -> ?ensure_min_features:int -> ?warn_on_dtype:[ `Bool of bool | `None ] -> ?estimator:[ `String of string | `Estimator of Py.Object.t ] -> array:Py.Object.t -> unit -> Py.Object.t

Input validation on an array, list, sparse matrix or similar.

By default, the input is checked to be a non-empty 2D array containing only finite values. If the dtype of the array is object, attempt converting to float, raising on failure.

Parameters ---------- array : object Input object to check / convert.

accept_sparse : string, boolean or list/tuple of strings (default=False) Strings representing allowed sparse matrix formats, such as 'csc', 'csr', etc. If the input is sparse but not in the allowed format, it will be converted to the first listed format. True allows the input to be any format. False means that a sparse matrix input will raise an error.

accept_large_sparse : bool (default=True) If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by accept_sparse, accept_large_sparse=False will cause it to be accepted only if its indices are stored with a 32-bit dtype.

.. versionadded:: 0.20

dtype : string, type, list of types or None (default="numeric") Data type of result. If None, the dtype of the input is preserved. If "numeric", dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.

order : 'F', 'C' or None (default=None) Whether an array will be forced to be fortran or c-style. When order is None (default), then if copy=False, nothing is ensured about the memory layout of the output array; otherwise (copy=True) the memory layout of the returned array is kept as close as possible to the original array.

copy : boolean (default=False) Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.

force_all_finite : boolean or 'allow-nan', (default=True) Whether to raise an error on np.inf and np.nan in array. The possibilities are:

  • True: Force all values of array to be finite.
  • False: accept both np.inf and np.nan in array.
  • 'allow-nan': accept only np.nan values in array. Values cannot be infinite.

For object dtyped data, only np.nan is checked and not np.inf.

.. versionadded:: 0.20 ``force_all_finite`` accepts the string ``'allow-nan'``.

ensure_2d : boolean (default=True) Whether to raise a value error if array is not 2D.

allow_nd : boolean (default=False) Whether to allow array.ndim > 2.

ensure_min_samples : int (default=1) Make sure that the array has a minimum number of samples in its first axis (rows for a 2D array). Setting to 0 disables this check.

ensure_min_features : int (default=1) Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when the input data has effectively 2 dimensions or is originally 1D and ``ensure_2d`` is True. Setting to 0 disables this check.

warn_on_dtype : boolean or None, optional (default=None) Raise DataConversionWarning if the dtype of the input data structure does not match the requested dtype, causing a memory copy.

.. deprecated:: 0.21 ``warn_on_dtype`` is deprecated in version 0.21 and will be removed in 0.23.

estimator : str or estimator instance (default=None) If passed, include the name of the estimator in warning messages.

Returns ------- array_converted : object The converted and validated array.

val check_is_fitted : ?attributes: [ `String of string | `ArrayLike of Py.Object.t | `StringList of string list ] -> ?msg:string -> ?all_or_any:[ `Callable of Py.Object.t | `PyObject of Py.Object.t ] -> estimator:Py.Object.t -> unit -> Py.Object.t

Perform is_fitted validation for estimator.

Checks if the estimator is fitted by verifying the presence of fitted attributes (ending with a trailing underscore) and otherwise raises a NotFittedError with the given message.

This utility is meant to be used internally by estimators themselves, typically in their own predict / transform methods.

Parameters ---------- estimator : estimator instance. estimator instance for which the check is performed.

attributes : str, list or tuple of str, default=None Attribute name(s) given as string or a list/tuple of strings Eg.: ``"coef_", "estimator_", ..., "coef_"``

If `None`, `estimator` is considered fitted if there exist an attribute that ends with a underscore and does not start with double underscore.

msg : string The default error message is, "This %(name)s instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator."

For custom messages if "%(name)s" is present in the message string, it is substituted for the estimator name.

Eg. : "Estimator, %(name)s, must be fitted before sparsifying".

all_or_any : callable, all, any, default all Specify whether all or any of the given attributes must exist.

Returns ------- None

Raises ------ NotFittedError If the attributes are not found.

val delayed : ?check_pickle:Py.Object.t -> function_:Py.Object.t -> unit -> Py.Object.t

Decorator used to capture the arguments of a function.

module Deprecated : sig ... end
val mquantiles : ?prob:Py.Object.t -> ?alphap:Py.Object.t -> ?betap:Py.Object.t -> ?axis:Py.Object.t -> ?limit:Py.Object.t -> a:Ndarray.t -> unit -> Py.Object.t

Computes empirical quantiles for a data array.

Samples quantile are defined by ``Q(p) = (1-gamma)*xj + gamma*xj+1``, where ``xj`` is the j-th order statistic, and gamma is a function of ``j = floor(n*p + m)``, ``m = alphap + p*(1 - alphap - betap)`` and ``g = n*p + m - j``.

Reinterpreting the above equations to compare to **R** lead to the equation: ``p(k) = (k - alphap)/(n + 1 - alphap - betap)``

Typical values of (alphap,betap) are:

  • (0,1) : ``p(k) = k/n`` : linear interpolation of cdf ( **R** type 4)
  • (.5,.5) : ``p(k) = (k - 1/2.)/n`` : piecewise linear function ( **R** type 5)
  • (0,0) : ``p(k) = k/(n+1)`` : ( **R** type 6)
  • (1,1) : ``p(k) = (k-1)/(n-1)``: p(k) = modeF(x[k]). ( **R** type 7, **R** default)
  • (1/3,1/3): ``p(k) = (k-1/3)/(n+1/3)``: Then p(k) ~ medianF(x[k]). The resulting quantile estimates are approximately median-unbiased regardless of the distribution of x. ( **R** type 8)
  • (3/8,3/8): ``p(k) = (k-3/8)/(n+1/4)``: Blom. The resulting quantile estimates are approximately unbiased if x is normally distributed ( **R** type 9)
  • (.4,.4) : approximately quantile unbiased (Cunnane)
  • (.35,.35): APL, used with PWM

Parameters ---------- a : array_like Input data, as a sequence or array of dimension at most 2. prob : array_like, optional List of quantiles to compute. alphap : float, optional Plotting positions parameter, default is 0.4. betap : float, optional Plotting positions parameter, default is 0.4. axis : int, optional Axis along which to perform the trimming. If None (default), the input array is first flattened. limit : tuple, optional Tuple of (lower, upper) values. Values of `a` outside this open interval are ignored.

Returns ------- mquantiles : MaskedArray An array containing the calculated quantiles.

Notes ----- This formulation is very similar to **R** except the calculation of ``m`` from ``alphap`` and ``betap``, where in **R** ``m`` is defined with each type.

References ---------- .. 1 *R* statistical software: https://www.r-project.org/ .. 2 *R* ``quantile`` function: http://stat.ethz.ch/R-manual/R-devel/library/stats/html/quantile.html

Examples -------- >>> from scipy.stats.mstats import mquantiles >>> a = np.array(6., 47., 49., 15., 42., 41., 7., 39., 43., 40., 36.) >>> mquantiles(a) array( 19.2, 40. , 42.8)

Using a 2D array, specifying axis and limit.

>>> data = np.array([ 6., 7., 1.], ... [ 47., 15., 2.], ... [ 49., 36., 3.], ... [ 15., 39., 4.], ... [ 42., 40., -999.], ... [ 41., 41., -999.], ... [ 7., -999., -999.], ... [ 39., -999., -999.], ... [ 43., -999., -999.], ... [ 40., -999., -999.], ... [ 36., -999., -999.]) >>> print(mquantiles(data, axis=0, limit=(0, 50))) [19.2 14.6 1.45] [40. 37.5 2.5 ] [42.8 40.05 3.55]

>>> data:, 2 = -999. >>> print(mquantiles(data, axis=0, limit=(0, 50))) [19.200000000000003 14.6 --] [40.0 37.5 --] [42.800000000000004 40.05 --]

val partial_dependence : ?grid:Ndarray.t -> ?x:Ndarray.t -> ?percentiles:Py.Object.t -> ?grid_resolution:int -> gbrt:Py.Object.t -> target_variables:[ `Ndarray of Ndarray.t | `PyObject of Py.Object.t ] -> unit -> Ndarray.t * Py.Object.t

DEPRECATED: The function ensemble.partial_dependence has been deprecated in favour of inspection.partial_dependence in 0.21 and will be removed in 0.23.

Partial dependence of ``target_variables``.

Partial dependence plots show the dependence between the joint values of the ``target_variables`` and the function represented by the ``gbrt``.

Read more in the :ref:`User Guide <partial_dependence>`.

.. deprecated:: 0.21 This function was deprecated in version 0.21 in favor of :func:`sklearn.inspection.partial_dependence` and will be removed in 0.23.

Parameters ---------- gbrt : BaseGradientBoosting A fitted gradient boosting model.

target_variables : array-like, dtype=int The target features for which the partial dependency should be computed (size should be smaller than 3 for visual renderings).

grid : array-like of shape (n_points, n_target_variables) The grid of ``target_variables`` values for which the partial dependency should be evaluated (either ``grid`` or ``X`` must be specified).

X : array-like of shape (n_samples, n_features) The data on which ``gbrt`` was trained. It is used to generate a ``grid`` for the ``target_variables``. The ``grid`` comprises ``grid_resolution`` equally spaced points between the two ``percentiles``.

percentiles : (low, high), default=(0.05, 0.95) The lower and upper percentile used create the extreme values for the ``grid``. Only if ``X`` is not None.

grid_resolution : int, default=100 The number of equally spaced points on the ``grid``.

Returns ------- pdp : array, shape=(n_classes, n_points) The partial dependence function evaluated on the ``grid``. For regression and binary classification ``n_classes==1``.

axes : seq of ndarray or None The axes with which the grid has been created or None if the grid has been given.

Examples -------- >>> samples = [0, 0, 2], [1, 0, 0] >>> labels = 0, 1 >>> from sklearn.ensemble import GradientBoostingClassifier >>> gb = GradientBoostingClassifier(random_state=0).fit(samples, labels) >>> kwargs = dict(X=samples, percentiles=(0, 1), grid_resolution=2) >>> partial_dependence(gb, 0, **kwargs) # doctest: +SKIP (array([-4.52..., 4.52...]), array([ 0., 1.]))

val plot_partial_dependence : ?feature_names:Py.Object.t -> ?label:Py.Object.t -> ?n_cols:int -> ?grid_resolution:int -> ?percentiles:Py.Object.t -> ?n_jobs:[ `Int of int | `None ] -> ?verbose:int -> ?ax:Py.Object.t -> ?line_kw:Py.Object.t -> ?contour_kw:Py.Object.t -> ?fig_kw:(string * Py.Object.t) list -> gbrt:Py.Object.t -> x:Ndarray.t -> features:[ `StringList of string list | `PyObject of Py.Object.t ] -> unit -> Py.Object.t * Py.Object.t

DEPRECATED: The function ensemble.plot_partial_dependence has been deprecated in favour of sklearn.inspection.plot_partial_dependence in 0.21 and will be removed in 0.23.

Partial dependence plots for ``features``.

The ``len(features)`` plots are arranged in a grid with ``n_cols`` columns. Two-way partial dependence plots are plotted as contour plots.

Read more in the :ref:`User Guide <partial_dependence>`.

.. deprecated:: 0.21 This function was deprecated in version 0.21 in favor of :func:`sklearn.inspection.plot_partial_dependence` and will be removed in 0.23.

Parameters ---------- gbrt : BaseGradientBoosting A fitted gradient boosting model.

X : array-like of shape (n_samples, n_features) The data on which ``gbrt`` was trained.

features : seq of ints, strings, or tuples of ints or strings If seqi is an int or a tuple with one int value, a one-way PDP is created; if seqi is a tuple of two ints, a two-way PDP is created. If feature_names is specified and seqi is an int, seqi must be < len(feature_names). If seqi is a string, feature_names must be specified, and seqi must be in feature_names.

feature_names : seq of str Name of each feature; feature_namesi holds the name of the feature with index i.

label : object The class label for which the PDPs should be computed. Only if gbrt is a multi-class model. Must be in ``gbrt.classes_``.

n_cols : int The number of columns in the grid plot (default: 3).

grid_resolution : int, default=100 The number of equally spaced points on the axes.

percentiles : (low, high), default=(0.05, 0.95) The lower and upper percentile used to create the extreme values for the PDP axes.

n_jobs : int or None, optional (default=None) ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context. ``-1`` means using all processors. See :term:`Glossary <n_jobs>` for more details.

verbose : int Verbose output during PD computations. Defaults to 0.

ax : Matplotlib axis object, default None An axis object onto which the plots will be drawn.

line_kw : dict Dict with keywords passed to the ``matplotlib.pyplot.plot`` call. For one-way partial dependence plots.

contour_kw : dict Dict with keywords passed to the ``matplotlib.pyplot.plot`` call. For two-way partial dependence plots.

``**fig_kw`` : dict Dict with keywords passed to the figure() call. Note that all keywords not recognized above will be automatically included here.

Returns ------- fig : figure The Matplotlib Figure object.

axs : seq of Axis objects A seq of Axis objects, one for each subplot.

Examples -------- >>> from sklearn.datasets import make_friedman1 >>> from sklearn.ensemble import GradientBoostingRegressor >>> X, y = make_friedman1() >>> clf = GradientBoostingRegressor(n_estimators=10).fit(X, y) >>> fig, axs = plot_partial_dependence(clf, X, 0, (0, 1)) #doctest: +SKIP ...

OCaml

Innovation. Community. Security.