Module `Preprocessing.MultiLabelBinarizer`Source

Sourcetype tag = [

| `MultiLabelBinarizer

]

type t =
  [ `BaseEstimator | `MultiLabelBinarizer | `Object | `TransformerMixin ] Obj.t

Sourceval of_pyobject : Py.Object.t -> t

Sourceval to_pyobject : [> tag ] Obj.t -> Py.Object.t

Sourceval as_transformer : t -> [ `TransformerMixin ] Obj.t

Sourceval as_estimator : t -> [ `BaseEstimator ] Obj.t

Source

val create : 
  ?classes:[> `ArrayLike ] Np.Obj.t ->
  ?sparse_output:bool ->
  unit ->
  t

Transform between iterable of iterables and a multilabel format

Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label.

Parameters ---------- classes : array-like of shape n_classes (optional) Indicates an ordering for the class labels. All entries should be unique (cannot contain duplicate classes).

sparse_output : boolean (default: False), Set to true if output binary array is desired in CSR sparse format

Attributes ---------- classes_ : array of labels A copy of the `classes` parameter where provided, or otherwise, the sorted set of classes found when fitting.

Examples -------- >>> from sklearn.preprocessing import MultiLabelBinarizer >>> mlb = MultiLabelBinarizer() >>> mlb.fit_transform((1, 2), (3,)) array([1, 1, 0], [0, 0, 1]) >>> mlb.classes_ array(1, 2, 3)

>>> mlb.fit_transform({'sci-fi', 'thriller'}, {'comedy'}) array([0, 1, 1], [1, 0, 0]) >>> list(mlb.classes_) 'comedy', 'sci-fi', 'thriller'

A common mistake is to pass in a list, which leads to the following issue:

>>> mlb = MultiLabelBinarizer() >>> mlb.fit('sci-fi', 'thriller', 'comedy') MultiLabelBinarizer() >>> mlb.classes_ array('-', 'c', 'd', 'e', 'f', 'h', 'i', 'l', 'm', 'o', 'r', 's', 't', 'y', dtype=object)

To correct this, the list of labels should be passed in as:

>>> mlb = MultiLabelBinarizer() >>> mlb.fit(['sci-fi', 'thriller', 'comedy']) MultiLabelBinarizer() >>> mlb.classes_ array('comedy', 'sci-fi', 'thriller', dtype=object)

See also -------- sklearn.preprocessing.OneHotEncoder : encode categorical features using a one-hot aka one-of-K scheme.

Sourceval fit : y:Np.Numpy.Ndarray.List.t -> [> tag ] Obj.t -> t

Fit the label sets binarizer, storing :term:`classes_`

Parameters ---------- y : iterable of iterables A set of labels (any orderable and hashable object) for each sample. If the `classes` parameter is set, `y` will not be iterated.

Returns ------- self : returns this MultiLabelBinarizer instance

Source

val fit_transform : 
  y:Np.Numpy.Ndarray.List.t ->
  [> tag ] Obj.t ->
  [> `ArrayLike ] Np.Obj.t

Fit the label sets binarizer and transform the given label sets

Parameters ---------- y : iterable of iterables A set of labels (any orderable and hashable object) for each sample. If the `classes` parameter is set, `y` will not be iterated.

Returns ------- y_indicator : array or CSR matrix, shape (n_samples, n_classes) A matrix such that `y_indicatori, j = 1` iff `classes_j` is in `yi`, and 0 otherwise.

Sourceval get_params : ?deep:bool -> [> tag ] Obj.t -> Dict.t

Get parameters for this estimator.

Parameters ---------- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns ------- params : mapping of string to any Parameter names mapped to their values.

Source

val inverse_transform : 
  yt:[> `ArrayLike ] Np.Obj.t ->
  [> tag ] Obj.t ->
  Py.Object.t

Transform the given indicator matrix into label sets

Parameters ---------- yt : array or sparse matrix of shape (n_samples, n_classes) A matrix containing only 1s ands 0s.

Returns ------- y : list of tuples The set of labels for each sample such that `yi` consists of `classes_j` for each `yti, j == 1`.

Sourceval set_params : ?params:(string * Py.Object.t) list -> [> tag ] Obj.t -> t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form ``<component>__<parameter>`` so that it's possible to update each component of a nested object.

Parameters ---------- **params : dict Estimator parameters.

Returns ------- self : object Estimator instance.

Source

val transform : 
  y:Np.Numpy.Ndarray.List.t ->
  [> tag ] Obj.t ->
  [> `ArrayLike ] Np.Obj.t

Transform the given label sets

Parameters ---------- y : iterable of iterables A set of labels (any orderable and hashable object) for each sample. If the `classes` parameter is set, `y` will not be iterated.

Returns ------- y_indicator : array or CSR matrix, shape (n_samples, n_classes) A matrix such that `y_indicatori, j = 1` iff `classes_j` is in `yi`, and 0 otherwise.

Sourceval classes_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute classes_: get value or raise Not_found if None.

Sourceval classes_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute classes_: get value as an option.

Sourceval to_string : t -> string

Print the object to a human-readable representation.

Sourceval show : t -> string

Print the object to a human-readable representation.

Sourceval pp : Format.formatter -> t -> unit

Pretty-print the object to a formatter.

package sklearn

Module Preprocessing.MultiLabelBinarizerSource

Module `Preprocessing.MultiLabelBinarizer`Source