package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
type t
val of_pyobject : Py.Object.t -> t
val to_pyobject : t -> Py.Object.t
val create : ?classes:Arr.t -> ?sparse_output:bool -> unit -> t

Transform between iterable of iterables and a multilabel format

Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label.

Parameters ---------- classes : array-like of shape n_classes (optional) Indicates an ordering for the class labels. All entries should be unique (cannot contain duplicate classes).

sparse_output : boolean (default: False), Set to true if output binary array is desired in CSR sparse format

Attributes ---------- classes_ : array of labels A copy of the `classes` parameter where provided, or otherwise, the sorted set of classes found when fitting.

Examples -------- >>> from sklearn.preprocessing import MultiLabelBinarizer >>> mlb = MultiLabelBinarizer() >>> mlb.fit_transform((1, 2), (3,)) array([1, 1, 0], [0, 0, 1]) >>> mlb.classes_ array(1, 2, 3)

>>> mlb.fit_transform({'sci-fi', 'thriller'}, {'comedy'}) array([0, 1, 1], [1, 0, 0]) >>> list(mlb.classes_) 'comedy', 'sci-fi', 'thriller'

A common mistake is to pass in a list, which leads to the following issue:

>>> mlb = MultiLabelBinarizer() >>> mlb.fit('sci-fi', 'thriller', 'comedy') MultiLabelBinarizer() >>> mlb.classes_ array('-', 'c', 'd', 'e', 'f', 'h', 'i', 'l', 'm', 'o', 'r', 's', 't', 'y', dtype=object)

To correct this, the list of labels should be passed in as:

>>> mlb = MultiLabelBinarizer() >>> mlb.fit(['sci-fi', 'thriller', 'comedy']) MultiLabelBinarizer() >>> mlb.classes_ array('comedy', 'sci-fi', 'thriller', dtype=object)

See also -------- sklearn.preprocessing.OneHotEncoder : encode categorical features using a one-hot aka one-of-K scheme.

val fit : y:Arr.List.t -> t -> t

Fit the label sets binarizer, storing :term:`classes_`

Parameters ---------- y : iterable of iterables A set of labels (any orderable and hashable object) for each sample. If the `classes` parameter is set, `y` will not be iterated.

Returns ------- self : returns this MultiLabelBinarizer instance

val fit_transform : y:Arr.List.t -> t -> Arr.t

Fit the label sets binarizer and transform the given label sets

Parameters ---------- y : iterable of iterables A set of labels (any orderable and hashable object) for each sample. If the `classes` parameter is set, `y` will not be iterated.

Returns ------- y_indicator : array or CSR matrix, shape (n_samples, n_classes) A matrix such that `y_indicatori, j = 1` iff `classes_j` is in `yi`, and 0 otherwise.

val get_params : ?deep:bool -> t -> Dict.t

Get parameters for this estimator.

Parameters ---------- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns ------- params : mapping of string to any Parameter names mapped to their values.

val inverse_transform : yt:Arr.t -> t -> Py.Object.t

Transform the given indicator matrix into label sets

Parameters ---------- yt : array or sparse matrix of shape (n_samples, n_classes) A matrix containing only 1s ands 0s.

Returns ------- y : list of tuples The set of labels for each sample such that `yi` consists of `classes_j` for each `yti, j == 1`.

val set_params : ?params:(string * Py.Object.t) list -> t -> t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form ``<component>__<parameter>`` so that it's possible to update each component of a nested object.

Parameters ---------- **params : dict Estimator parameters.

Returns ------- self : object Estimator instance.

val transform : y:Arr.List.t -> t -> Arr.t

Transform the given label sets

Parameters ---------- y : iterable of iterables A set of labels (any orderable and hashable object) for each sample. If the `classes` parameter is set, `y` will not be iterated.

Returns ------- y_indicator : array or CSR matrix, shape (n_samples, n_classes) A matrix such that `y_indicatori, j = 1` iff `classes_j` is in `yi`, and 0 otherwise.

val classes_ : t -> Arr.t

Attribute classes_: get value or raise Not_found if None.

val classes_opt : t -> Arr.t option

Attribute classes_: get value as an option.

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Format.formatter -> t -> unit

Pretty-print the object to a formatter.

OCaml

Innovation. Community. Security.