package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
val get_py : string -> Py.Object.t

Get an attribute of this module as a Py.Object.t. This is useful to pass a Python function to another function.

module Bunch : sig ... end
module DataConversionWarning : sig ... end
module Path : sig ... end
module Sequence : sig ... end
module Compress : sig ... end
module Islice : sig ... end
module Itemgetter : sig ... end
module Parallel_backend : sig ... end
module Arrayfuncs : sig ... end
module Class_weight : sig ... end
module Deprecation : sig ... end
module Extmath : sig ... end
module Fixes : sig ... end
module Graph : sig ... end
module Graph_shortest_path : sig ... end
module Metaestimators : sig ... end
module Multiclass : sig ... end
module Murmurhash : sig ... end
module Optimize : sig ... end
module Random : sig ... end
module Sparsefuncs : sig ... end
module Sparsefuncs_fast : sig ... end
module Stats : sig ... end
module Validation : sig ... end
val all_estimators : ?type_filter:[ `S of string | `StringList of string list ] -> unit -> Py.Object.t

Get a list of all estimators from sklearn.

This function crawls the module and gets all classes that inherit from BaseEstimator. Classes that are defined in test-modules are not included. By default meta_estimators such as GridSearchCV are also not included.

Parameters ---------- type_filter : string, list of string, or None, default=None Which kind of estimators should be returned. If None, no filter is applied and all estimators are returned. Possible values are 'classifier', 'regressor', 'cluster' and 'transformer' to get estimators only of these specific types, or a list of these to get the estimators that fit at least one of the types.

Returns ------- estimators : list of tuples List of (name, class), where ``name`` is the class name as string and ``class`` is the actuall type of the class.

val as_float_array : ?copy:bool -> ?force_all_finite:[ `Allow_nan | `Bool of bool ] -> x:[> `ArrayLike ] Np.Obj.t -> unit -> [> `ArrayLike ] Np.Obj.t

Converts an array-like to an array of floats.

The new dtype will be np.float32 or np.float64, depending on the original type. The function can create a copy or modify the argument depending on the argument copy.

Parameters ---------- X : array-like, sparse matrix

copy : bool, optional If True, a copy of X will be created. If False, a copy may still be returned if X's dtype is not a floating point type.

force_all_finite : boolean or 'allow-nan', (default=True) Whether to raise an error on np.inf, np.nan, pd.NA in X. The possibilities are:

  • True: Force all values of X to be finite.
  • False: accepts np.inf, np.nan, pd.NA in X.
  • 'allow-nan': accepts only np.nan and pd.NA values in X. Values cannot be infinite.

.. versionadded:: 0.20 ``force_all_finite`` accepts the string ``'allow-nan'``.

.. versionchanged:: 0.23 Accepts `pd.NA` and converts it into `np.nan`

Returns ------- XT : array, sparse matrix An array of type np.float

val assert_all_finite : ?allow_nan:bool -> x:[> `ArrayLike ] Np.Obj.t -> unit -> Py.Object.t

Throw a ValueError if X contains NaN or infinity.

Parameters ---------- X : array or sparse matrix

allow_nan : bool

val axis0_safe_slice : x:[> `ArrayLike ] Np.Obj.t -> mask:[> `ArrayLike ] Np.Obj.t -> len_mask:int -> unit -> Py.Object.t

This mask is safer than safe_mask since it returns an empty array, when a sparse matrix is sliced with a boolean mask with all False, instead of raising an unhelpful error in older versions of SciPy.

See: https://github.com/scipy/scipy/issues/5361

Also note that we can avoid doing the dot product by checking if the len_mask is not zero in _huber_loss_and_gradient but this is not going to be the bottleneck, since the number of outliers and non_outliers are typically non-zero and it makes the code tougher to follow.

Parameters ---------- X : array-like, sparse matrix Data on which to apply mask.

mask : array Mask to be used on X.

len_mask : int The length of the mask.

Returns ------- mask

val check_X_y : ?accept_sparse:[ `S of string | `StringList of string list | `Bool of bool ] -> ?accept_large_sparse:bool -> ?dtype: [ `S of string | `Dtype of Np.Dtype.t | `Dtypes of Np.Dtype.t list | `None ] -> ?order:[ `C | `F ] -> ?copy:bool -> ?force_all_finite:[ `Allow_nan | `Bool of bool ] -> ?ensure_2d:bool -> ?allow_nd:bool -> ?multi_output:bool -> ?ensure_min_samples:int -> ?ensure_min_features:int -> ?y_numeric:bool -> ?estimator:[> `BaseEstimator ] Np.Obj.t -> x:[> `ArrayLike ] Np.Obj.t -> y:[> `ArrayLike ] Np.Obj.t -> unit -> Py.Object.t * Py.Object.t

Input validation for standard estimators.

Checks X and y for consistent length, enforces X to be 2D and y 1D. By default, X is checked to be non-empty and containing only finite values. Standard input checks are also applied to y, such as checking that y does not have np.nan or np.inf targets. For multi-label y, set multi_output=True to allow 2D and sparse y. If the dtype of X is object, attempt converting to float, raising on failure.

Parameters ---------- X : nd-array, list or sparse matrix Input data.

y : nd-array, list or sparse matrix Labels.

accept_sparse : string, boolean or list of string (default=False) Strings representing allowed sparse matrix formats, such as 'csc', 'csr', etc. If the input is sparse but not in the allowed format, it will be converted to the first listed format. True allows the input to be any format. False means that a sparse matrix input will raise an error.

accept_large_sparse : bool (default=True) If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by accept_sparse, accept_large_sparse will cause it to be accepted only if its indices are stored with a 32-bit dtype.

.. versionadded:: 0.20

dtype : string, type, list of types or None (default='numeric') Data type of result. If None, the dtype of the input is preserved. If 'numeric', dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.

order : 'F', 'C' or None (default=None) Whether an array will be forced to be fortran or c-style.

copy : boolean (default=False) Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.

force_all_finite : boolean or 'allow-nan', (default=True) Whether to raise an error on np.inf, np.nan, pd.NA in X. This parameter does not influence whether y can have np.inf, np.nan, pd.NA values. The possibilities are:

  • True: Force all values of X to be finite.
  • False: accepts np.inf, np.nan, pd.NA in X.
  • 'allow-nan': accepts only np.nan or pd.NA values in X. Values cannot be infinite.

.. versionadded:: 0.20 ``force_all_finite`` accepts the string ``'allow-nan'``.

.. versionchanged:: 0.23 Accepts `pd.NA` and converts it into `np.nan`

ensure_2d : boolean (default=True) Whether to raise a value error if X is not 2D.

allow_nd : boolean (default=False) Whether to allow X.ndim > 2.

multi_output : boolean (default=False) Whether to allow 2D y (array or sparse matrix). If false, y will be validated as a vector. y cannot have np.nan or np.inf values if multi_output=True.

ensure_min_samples : int (default=1) Make sure that X has a minimum number of samples in its first axis (rows for a 2D array).

ensure_min_features : int (default=1) Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when X has effectively 2 dimensions or is originally 1D and ``ensure_2d`` is True. Setting to 0 disables this check.

y_numeric : boolean (default=False) Whether to ensure that y has a numeric type. If dtype of y is object, it is converted to float64. Should only be used for regression algorithms.

estimator : str or estimator instance (default=None) If passed, include the name of the estimator in warning messages.

Returns ------- X_converted : object The converted and validated X.

y_converted : object The converted and validated y.

val check_array : ?accept_sparse:[ `S of string | `StringList of string list | `Bool of bool ] -> ?accept_large_sparse:bool -> ?dtype: [ `S of string | `Dtype of Np.Dtype.t | `Dtypes of Np.Dtype.t list | `None ] -> ?order:[ `C | `F ] -> ?copy:bool -> ?force_all_finite:[ `Allow_nan | `Bool of bool ] -> ?ensure_2d:bool -> ?allow_nd:bool -> ?ensure_min_samples:int -> ?ensure_min_features:int -> ?estimator:[> `BaseEstimator ] Np.Obj.t -> array:Py.Object.t -> unit -> Py.Object.t

Input validation on an array, list, sparse matrix or similar.

By default, the input is checked to be a non-empty 2D array containing only finite values. If the dtype of the array is object, attempt converting to float, raising on failure.

Parameters ---------- array : object Input object to check / convert.

accept_sparse : string, boolean or list/tuple of strings (default=False) Strings representing allowed sparse matrix formats, such as 'csc', 'csr', etc. If the input is sparse but not in the allowed format, it will be converted to the first listed format. True allows the input to be any format. False means that a sparse matrix input will raise an error.

accept_large_sparse : bool (default=True) If a CSR, CSC, COO or BSR sparse matrix is supplied and accepted by accept_sparse, accept_large_sparse=False will cause it to be accepted only if its indices are stored with a 32-bit dtype.

.. versionadded:: 0.20

dtype : string, type, list of types or None (default='numeric') Data type of result. If None, the dtype of the input is preserved. If 'numeric', dtype is preserved unless array.dtype is object. If dtype is a list of types, conversion on the first type is only performed if the dtype of the input is not in the list.

order : 'F', 'C' or None (default=None) Whether an array will be forced to be fortran or c-style. When order is None (default), then if copy=False, nothing is ensured about the memory layout of the output array; otherwise (copy=True) the memory layout of the returned array is kept as close as possible to the original array.

copy : boolean (default=False) Whether a forced copy will be triggered. If copy=False, a copy might be triggered by a conversion.

force_all_finite : boolean or 'allow-nan', (default=True) Whether to raise an error on np.inf, np.nan, pd.NA in array. The possibilities are:

  • True: Force all values of array to be finite.
  • False: accepts np.inf, np.nan, pd.NA in array.
  • 'allow-nan': accepts only np.nan and pd.NA values in array. Values cannot be infinite.

.. versionadded:: 0.20 ``force_all_finite`` accepts the string ``'allow-nan'``.

.. versionchanged:: 0.23 Accepts `pd.NA` and converts it into `np.nan`

ensure_2d : boolean (default=True) Whether to raise a value error if array is not 2D.

allow_nd : boolean (default=False) Whether to allow array.ndim > 2.

ensure_min_samples : int (default=1) Make sure that the array has a minimum number of samples in its first axis (rows for a 2D array). Setting to 0 disables this check.

ensure_min_features : int (default=1) Make sure that the 2D array has some minimum number of features (columns). The default value of 1 rejects empty datasets. This check is only enforced when the input data has effectively 2 dimensions or is originally 1D and ``ensure_2d`` is True. Setting to 0 disables this check.

estimator : str or estimator instance (default=None) If passed, include the name of the estimator in warning messages.

Returns ------- array_converted : object The converted and validated array.

val check_consistent_length : Py.Object.t list -> Py.Object.t

Check that all arrays have consistent first dimensions.

Checks whether all objects in arrays have the same shape or length.

Parameters ---------- *arrays : list or tuple of input objects. Objects that will be checked for consistent length.

val check_matplotlib_support : string -> Py.Object.t

Raise ImportError with detailed error message if mpl is not installed.

Plot utilities like :func:`plot_partial_dependence` should lazily import matplotlib and call this helper before any computation.

Parameters ---------- caller_name : str The name of the caller that requires matplotlib.

val check_pandas_support : string -> Py.Object.t

Raise ImportError with detailed error message if pandsa is not installed.

Plot utilities like :func:`fetch_openml` should lazily import pandas and call this helper before any computation.

Parameters ---------- caller_name : str The name of the caller that requires pandas.

val check_random_state : [ `Optional of [ `I of int | `None ] | `RandomState of Py.Object.t ] -> Py.Object.t

Turn seed into a np.random.RandomState instance

Parameters ---------- seed : None | int | instance of RandomState If seed is None, return the RandomState singleton used by np.random. If seed is an int, return a new RandomState instance seeded with seed. If seed is already a RandomState instance, return it. Otherwise raise ValueError.

val check_scalar : ?min_val:[ `I of int | `F of float ] -> ?max_val:[ `I of int | `F of float ] -> x:Py.Object.t -> name:string -> target_type:[ `Dtype of Np.Dtype.t | `Tuple of Py.Object.t ] -> unit -> Py.Object.t

Validate scalar parameters type and value.

Parameters ---------- x : object The scalar parameter to validate.

name : str The name of the parameter to be printed in error messages.

target_type : type or tuple Acceptable data types for the parameter.

min_val : float or int, optional (default=None) The minimum valid value the parameter can take. If None (default) it is implied that the parameter does not have a lower bound.

max_val : float or int, optional (default=None) The maximum valid value the parameter can take. If None (default) it is implied that the parameter does not have an upper bound.

Raises ------- TypeError If the parameter's type does not match the desired type.

ValueError If the parameter's value violates the given bounds.

val check_symmetric : ?tol:float -> ?raise_warning:bool -> ?raise_exception:bool -> array:[> `ArrayLike ] Np.Obj.t -> unit -> [> `ArrayLike ] Np.Obj.t

Make sure that array is 2D, square and symmetric.

If the array is not symmetric, then a symmetrized version is returned. Optionally, a warning or exception is raised if the matrix is not symmetric.

Parameters ---------- array : nd-array or sparse matrix Input object to check / convert. Must be two-dimensional and square, otherwise a ValueError will be raised. tol : float Absolute tolerance for equivalence of arrays. Default = 1E-10. raise_warning : boolean (default=True) If True then raise a warning if conversion is required. raise_exception : boolean (default=False) If True then raise an exception if array is not symmetric.

Returns ------- array_sym : ndarray or sparse matrix Symmetrized version of the input array, i.e. the average of array and array.transpose(). If sparse, then duplicate entries are first summed and zeros are eliminated.

val column_or_1d : ?warn:bool -> y:[> `ArrayLike ] Np.Obj.t -> unit -> [> `ArrayLike ] Np.Obj.t

Ravel column or 1d numpy array, else raises an error

Parameters ---------- y : array-like

warn : boolean, default False To control display of warnings.

Returns ------- y : array

val compute_class_weight : class_weight:[ `Balanced | `DictIntToFloat of (int * float) list | `None ] -> classes:[> `ArrayLike ] Np.Obj.t -> y:[> `ArrayLike ] Np.Obj.t -> unit -> [> `ArrayLike ] Np.Obj.t

Estimate class weights for unbalanced datasets.

Parameters ---------- class_weight : dict, 'balanced' or None If 'balanced', class weights will be given by ``n_samples / (n_classes * np.bincount(y))``. If a dictionary is given, keys are classes and values are corresponding class weights. If None is given, the class weights will be uniform.

classes : ndarray Array of the classes occurring in the data, as given by ``np.unique(y_org)`` with ``y_org`` the original class labels.

y : array-like, shape (n_samples,) Array of original class labels per sample;

Returns ------- class_weight_vect : ndarray, shape (n_classes,) Array with class_weight_vecti the weight for i-th class

References ---------- The 'balanced' heuristic is inspired by Logistic Regression in Rare Events Data, King, Zen, 2001.

val compute_sample_weight : ?indices:[> `ArrayLike ] Np.Obj.t -> class_weight: [ `List_of_dicts of Py.Object.t | `Balanced | `DictIntToFloat of (int * float) list | `None ] -> y:[> `ArrayLike ] Np.Obj.t -> unit -> [> `ArrayLike ] Np.Obj.t

Estimate sample weights by class for unbalanced datasets.

Parameters ---------- class_weight : dict, list of dicts, 'balanced', or None, optional Weights associated with classes in the form ``class_label: weight``. If not given, all classes are supposed to have weight one. For multi-output problems, a list of dicts can be provided in the same order as the columns of y.

Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for four-class multilabel classification weights should be {0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1} instead of {1:1}, {2:5}, {3:1}, {4:1}.

The 'balanced' mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data: ``n_samples / (n_classes * np.bincount(y))``.

For multi-output, the weights of each column of y will be multiplied.

y : array-like of shape (n_samples,) or (n_samples, n_outputs) Array of original class labels per sample.

indices : array-like, shape (n_subsample,), or None Array of indices to be used in a subsample. Can be of length less than n_samples in the case of a subsample, or equal to n_samples in the case of a bootstrap subsample with repeated indices. If None, the sample weight will be calculated over the full sample. Only 'balanced' is supported for class_weight if this is provided.

Returns ------- sample_weight_vect : ndarray, shape (n_samples,) Array with sample weights as applied to the original y

val contextmanager : Py.Object.t -> Py.Object.t

@contextmanager decorator.

Typical usage:

@contextmanager def some_generator(<arguments>): <setup> try: yield <value> finally: <cleanup>

This makes this:

with some_generator(<arguments>) as <variable>: <body>

equivalent to this:

<setup> try: <variable> = <value> <body> finally: <cleanup>

val estimator_html_repr : [> `BaseEstimator ] Np.Obj.t -> string

Build a HTML representation of an estimator.

Read more in the :ref:`User Guide <visualizing_composite_estimators>`.

Parameters ---------- estimator : estimator object The estimator to visualize.

Returns ------- html: str HTML representation of estimator.

val gen_batches : ?min_batch_size:int -> n:int -> batch_size:Py.Object.t -> unit -> Py.Object.t

Generator to create slices containing batch_size elements, from 0 to n.

The last slice may contain less than batch_size elements, when batch_size does not divide n.

Parameters ---------- n : int batch_size : int Number of element in each batch min_batch_size : int, default=0 Minimum batch size to produce.

Yields ------ slice of batch_size elements

Examples -------- >>> from sklearn.utils import gen_batches >>> list(gen_batches(7, 3)) slice(0, 3, None), slice(3, 6, None), slice(6, 7, None) >>> list(gen_batches(6, 3)) slice(0, 3, None), slice(3, 6, None) >>> list(gen_batches(2, 3)) slice(0, 2, None) >>> list(gen_batches(7, 3, min_batch_size=0)) slice(0, 3, None), slice(3, 6, None), slice(6, 7, None) >>> list(gen_batches(7, 3, min_batch_size=2)) slice(0, 3, None), slice(3, 7, None)

val gen_even_slices : ?n_samples:int -> n:int -> n_packs:Py.Object.t -> unit -> Py.Object.t

Generator to create n_packs slices going up to n.

Parameters ---------- n : int n_packs : int Number of slices to generate. n_samples : int or None (default = None) Number of samples. Pass n_samples when the slices are to be used for sparse matrix indexing; slicing off-the-end raises an exception, while it works for NumPy arrays.

Yields ------ slice

Examples -------- >>> from sklearn.utils import gen_even_slices >>> list(gen_even_slices(10, 1)) slice(0, 10, None) >>> list(gen_even_slices(10, 10)) slice(0, 1, None), slice(1, 2, None), ..., slice(9, 10, None) >>> list(gen_even_slices(10, 5)) slice(0, 2, None), slice(2, 4, None), ..., slice(8, 10, None) >>> list(gen_even_slices(10, 3)) slice(0, 4, None), slice(4, 7, None), slice(7, 10, None)

val get_chunk_n_rows : ?max_n_rows:int -> ?working_memory:[ `I of int | `F of float ] -> row_bytes:int -> unit -> Py.Object.t

Calculates how many rows can be processed within working_memory

Parameters ---------- row_bytes : int The expected number of bytes of memory that will be consumed during the processing of each row. max_n_rows : int, optional The maximum return value. working_memory : int or float, optional The number of rows to fit inside this number of MiB will be returned. When None (default), the value of ``sklearn.get_config()'working_memory'`` is used.

Returns ------- int or the value of n_samples

Warns ----- Issues a UserWarning if ``row_bytes`` exceeds ``working_memory`` MiB.

val get_config : unit -> Dict.t

Retrieve current values for configuration set by :func:`set_config`

Returns ------- config : dict Keys are parameter names that can be passed to :func:`set_config`.

See Also -------- config_context: Context manager for global scikit-learn configuration set_config: Set global scikit-learn configuration

val import_module : ?package:Py.Object.t -> name:Py.Object.t -> unit -> Py.Object.t

Import a module.

The 'package' argument is required when performing a relative import. It specifies the package to use as the anchor point from which to resolve the relative import to an absolute import.

val indexable : Py.Object.t list -> Py.Object.t

Make arrays indexable for cross-validation.

Checks consistent length, passes through None, and ensures that everything can be indexed by converting sparse matrices to csr and converting non-interable objects to arrays.

Parameters ---------- *iterables : lists, dataframes, arrays, sparse matrices List of objects to ensure sliceability.

val indices_to_mask : indices:[> `ArrayLike ] Np.Obj.t -> mask_length:int -> unit -> Py.Object.t

Convert list of indices to boolean mask.

Parameters ---------- indices : list-like List of integers treated as indices. mask_length : int Length of boolean mask to be generated. This parameter must be greater than max(indices)

Returns ------- mask : 1d boolean nd-array Boolean array that is True where indices are present, else False.

Examples -------- >>> from sklearn.utils import indices_to_mask >>> indices = 1, 2 , 3, 4 >>> indices_to_mask(indices, 5) array(False, True, True, True, True)

val is_scalar_nan : Py.Object.t -> Py.Object.t

Tests if x is NaN

This function is meant to overcome the issue that np.isnan does not allow non-numerical types as input, and that np.nan is not np.float('nan').

Parameters ---------- x : any type

Returns ------- boolean

Examples -------- >>> is_scalar_nan(np.nan) True >>> is_scalar_nan(float('nan')) True >>> is_scalar_nan(None) False >>> is_scalar_nan('') False >>> is_scalar_nan(np.nan) False

val issparse : Py.Object.t -> Py.Object.t

Is x of a sparse matrix type?

Parameters ---------- x object to check for being a sparse matrix

Returns ------- bool True if x is a sparse matrix, False otherwise

Notes ----- issparse and isspmatrix are aliases for the same function.

Examples -------- >>> from scipy.sparse import csr_matrix, isspmatrix >>> isspmatrix(csr_matrix([5])) True

>>> from scipy.sparse import isspmatrix >>> isspmatrix(5) False

val parse_version : Py.Object.t -> Py.Object.t

None

val register_parallel_backend : ?make_default:Py.Object.t -> name:Py.Object.t -> factory:Py.Object.t -> unit -> Py.Object.t

Register a new Parallel backend factory.

The new backend can then be selected by passing its name as the backend argument to the Parallel class. Moreover, the default backend can be overwritten globally by setting make_default=True.

The factory can be any callable that takes no argument and return an instance of ``ParallelBackendBase``.

Warning: this function is experimental and subject to change in a future version of joblib.

.. versionadded:: 0.10

val resample : ?options:(string * Py.Object.t) list -> Py.Object.t list -> Py.Object.t

Resample arrays or sparse matrices in a consistent way

The default strategy implements one step of the bootstrapping procedure.

Parameters ---------- *arrays : sequence of indexable data-structures Indexable data-structures can be arrays, lists, dataframes or scipy sparse matrices with consistent first dimension.

Other Parameters ---------------- replace : boolean, True by default Implements resampling with replacement. If False, this will implement (sliced) random permutations.

n_samples : int, None by default Number of samples to generate. If left to None this is automatically set to the first dimension of the arrays. If replace is False it should not be larger than the length of arrays.

random_state : int, RandomState instance or None, optional (default=None) Determines random number generation for shuffling the data. Pass an int for reproducible results across multiple function calls. See :term:`Glossary <random_state>`.

stratify : array-like or None (default=None) If not None, data is split in a stratified fashion, using this as the class labels.

Returns ------- resampled_arrays : sequence of indexable data-structures Sequence of resampled copies of the collections. The original arrays are not impacted.

Examples -------- It is possible to mix sparse and dense arrays in the same run::

>>> X = np.array([1., 0.], [2., 1.], [0., 0.]) >>> y = np.array(0, 1, 2)

>>> from scipy.sparse import coo_matrix >>> X_sparse = coo_matrix(X)

>>> from sklearn.utils import resample >>> X, X_sparse, y = resample(X, X_sparse, y, random_state=0) >>> X array([1., 0.], [2., 1.], [1., 0.])

>>> X_sparse <3x2 sparse matrix of type '<... 'numpy.float64'>' with 4 stored elements in Compressed Sparse Row format>

>>> X_sparse.toarray() array([1., 0.], [2., 1.], [1., 0.])

>>> y array(0, 1, 0)

>>> resample(y, n_samples=2, random_state=0) array(0, 1)

Example using stratification::

>>> y = 0, 0, 1, 1, 1, 1, 1, 1, 1 >>> resample(y, n_samples=5, replace=False, stratify=y, ... random_state=0) 1, 1, 1, 0, 1

See also -------- :func:`sklearn.utils.shuffle`

val safe_indexing : ?axis:int -> x:[ `Arr of [> `ArrayLike ] Np.Obj.t | `PyObject of Py.Object.t ] -> indices: [ `Arr of [> `ArrayLike ] Np.Obj.t | `Bool of bool | `Slice of Np.Wrap_utils.Slice.t | `S of string | `I of int ] -> unit -> Py.Object.t

DEPRECATED: safe_indexing is deprecated in version 0.22 and will be removed in version 0.24.

Return rows, items or columns of X using indices.

.. deprecated:: 0.22 This function was deprecated in version 0.22 and will be removed in version 0.24.

Parameters ---------- X : array-like, sparse-matrix, list, pandas.DataFrame, pandas.Series Data from which to sample rows, items or columns. `list` are only supported when `axis=0`.

indices : bool, int, str, slice, array-like

  • If `axis=0`, boolean and integer array-like, integer slice, and scalar integer are supported.
  • If `axis=1`:
  • to select a single column, `indices` can be of `int` type for all `X` types and `str` only for dataframe. The selected subset will be 1D, unless `X` is a sparse matrix in which case it will be 2D.
  • to select multiples columns, `indices` can be one of the following: `list`, `array`, `slice`. The type used in these containers can be one of the following: `int`, 'bool' and `str`. However, `str` is only supported when `X` is a dataframe. The selected subset will be 2D.

axis : int, default=0 The axis along which `X` will be subsampled. `axis=0` will select rows while `axis=1` will select columns.

Returns ------- subset Subset of X on axis 0 or 1.

Notes ----- CSR, CSC, and LIL sparse matrices are supported. COO sparse matrices are not supported.

val safe_mask : x:[> `ArrayLike ] Np.Obj.t -> mask:[> `ArrayLike ] Np.Obj.t -> unit -> Py.Object.t

Return a mask which is safe to use on X.

Parameters ---------- X : array-like, sparse matrix Data on which to apply mask.

mask : array Mask to be used on X.

Returns ------- mask

val safe_sqr : ?copy:bool -> x:[> `ArrayLike ] Np.Obj.t -> unit -> Py.Object.t

Element wise squaring of array-likes and sparse matrices.

Parameters ---------- X : array like, matrix, sparse matrix

copy : boolean, optional, default True Whether to create a copy of X and operate on it or to perform inplace computation (default behaviour).

Returns ------- X ** 2 : element wise square

val shuffle : ?random_state:int -> ?n_samples:int -> [> `ArrayLike ] Np.Obj.t list -> [> `ArrayLike ] Np.Obj.t list

Shuffle arrays or sparse matrices in a consistent way

This is a convenience alias to ``resample( *arrays, replace=False)`` to do random permutations of the collections.

Parameters ---------- *arrays : sequence of indexable data-structures Indexable data-structures can be arrays, lists, dataframes or scipy sparse matrices with consistent first dimension.

Other Parameters ---------------- random_state : int, RandomState instance or None, optional (default=None) Determines random number generation for shuffling the data. Pass an int for reproducible results across multiple function calls. See :term:`Glossary <random_state>`.

n_samples : int, None by default Number of samples to generate. If left to None this is automatically set to the first dimension of the arrays.

Returns ------- shuffled_arrays : sequence of indexable data-structures Sequence of shuffled copies of the collections. The original arrays are not impacted.

Examples -------- It is possible to mix sparse and dense arrays in the same run::

>>> X = np.array([1., 0.], [2., 1.], [0., 0.]) >>> y = np.array(0, 1, 2)

>>> from scipy.sparse import coo_matrix >>> X_sparse = coo_matrix(X)

>>> from sklearn.utils import shuffle >>> X, X_sparse, y = shuffle(X, X_sparse, y, random_state=0) >>> X array([0., 0.], [2., 1.], [1., 0.])

>>> X_sparse <3x2 sparse matrix of type '<... 'numpy.float64'>' with 3 stored elements in Compressed Sparse Row format>

>>> X_sparse.toarray() array([0., 0.], [2., 1.], [1., 0.])

>>> y array(2, 1, 0)

>>> shuffle(y, n_samples=2, random_state=0) array(0, 1)

See also -------- :func:`sklearn.utils.resample`

val tosequence : [> `ArrayLike ] Np.Obj.t -> Py.Object.t

Cast iterable x to a Sequence, avoiding a copy if possible.

Parameters ---------- x : iterable