package sklearn

  1. Overview
  2. Docs
Legend:
Library
Module
Module type
Parameter
Class
Class type
type tag = [
  1. | `DictVectorizer
]
type t = [ `BaseEstimator | `DictVectorizer | `Object | `TransformerMixin ] Obj.t
val of_pyobject : Py.Object.t -> t
val to_pyobject : [> tag ] Obj.t -> Py.Object.t
val as_transformer : t -> [ `TransformerMixin ] Obj.t
val as_estimator : t -> [ `BaseEstimator ] Obj.t
val create : ?dtype:Np.Dtype.t -> ?separator:string -> ?sparse:bool -> ?sort:bool -> unit -> t

Transforms lists of feature-value mappings to vectors.

This transformer turns lists of mappings (dict-like objects) of feature names to feature values into Numpy arrays or scipy.sparse matrices for use with scikit-learn estimators.

When feature values are strings, this transformer will do a binary one-hot (aka one-of-K) coding: one boolean-valued feature is constructed for each of the possible string values that the feature can take on. For instance, a feature 'f' that can take on the values 'ham' and 'spam' will become two features in the output, one signifying 'f=ham', the other 'f=spam'.

However, note that this transformer will only do a binary one-hot encoding when feature values are of type string. If categorical features are represented as numeric values such as int, the DictVectorizer can be followed by :class:`sklearn.preprocessing.OneHotEncoder` to complete binary one-hot encoding.

Features that do not occur in a sample (mapping) will have a zero value in the resulting array/matrix.

Read more in the :ref:`User Guide <dict_feature_extraction>`.

Parameters ---------- dtype : dtype, default=np.float64 The type of feature values. Passed to Numpy array/scipy.sparse matrix constructors as the dtype argument. separator : str, default='=' Separator string used when constructing new features for one-hot coding. sparse : bool, default=True Whether transform should produce scipy.sparse matrices. sort : bool, default=True Whether ``feature_names_`` and ``vocabulary_`` should be sorted when fitting.

Attributes ---------- vocabulary_ : dict A dictionary mapping feature names to feature indices.

feature_names_ : list A list of length n_features containing the feature names (e.g., 'f=ham' and 'f=spam').

Examples -------- >>> from sklearn.feature_extraction import DictVectorizer >>> v = DictVectorizer(sparse=False) >>> D = {'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1} >>> X = v.fit_transform(D) >>> X array([2., 0., 1.], [0., 1., 3.]) >>> v.inverse_transform(X) == {'bar': 2.0, 'foo': 1.0}, {'baz': 1.0, 'foo': 3.0} True >>> v.transform('foo': 4, 'unseen_feature': 3) array([0., 0., 4.])

See also -------- FeatureHasher : performs vectorization using only a hash function. sklearn.preprocessing.OrdinalEncoder : handles nominal/categorical features encoded as columns of arbitrary data types.

val fit : ?y:Py.Object.t -> x:Py.Object.t -> [> tag ] Obj.t -> t

Learn a list of feature name -> indices mappings.

Parameters ---------- X : Mapping or iterable over Mappings Dict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype). y : (ignored)

Returns ------- self

val fit_transform : ?y:Py.Object.t -> x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> [> `ArrayLike ] Np.Obj.t

Learn a list of feature name -> indices mappings and transform X.

Like fit(X) followed by transform(X), but does not require materializing X in memory.

Parameters ---------- X : Mapping or iterable over Mappings Dict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype). y : (ignored)

Returns ------- Xa : array, sparse matrix Feature vectors; always 2-d.

val get_feature_names : [> tag ] Obj.t -> Py.Object.t

Returns a list of feature names, ordered by their indices.

If one-of-K coding is applied to categorical features, this will include the constructed feature names but not the original ones.

val get_params : ?deep:bool -> [> tag ] Obj.t -> Dict.t

Get parameters for this estimator.

Parameters ---------- deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns ------- params : mapping of string to any Parameter names mapped to their values.

val inverse_transform : ?dict_type:Np.Dtype.t -> x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> Py.Object.t

Transform array or sparse matrix X back to feature mappings.

X must have been produced by this DictVectorizer's transform or fit_transform method; it may only have passed through transformers that preserve the number of features and their order.

In the case of one-hot/one-of-K coding, the constructed feature names and values are returned rather than the original ones.

Parameters ---------- X : array-like, sparse matrix of shape (n_samples, n_features) Sample matrix. dict_type : type, default=dict Constructor for feature mappings. Must conform to the collections.Mapping API.

Returns ------- D : list of dict_type objects of shape (n_samples,) Feature mappings for the samples in X.

val restrict : ?indices:bool -> support:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> Py.Object.t

Restrict the features to those in support using feature selection.

This function modifies the estimator in-place.

Parameters ---------- support : array-like Boolean mask or list of indices (as returned by the get_support member of feature selectors). indices : bool, default=False Whether support is a list of indices.

Returns ------- self

Examples -------- >>> from sklearn.feature_extraction import DictVectorizer >>> from sklearn.feature_selection import SelectKBest, chi2 >>> v = DictVectorizer() >>> D = {'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1} >>> X = v.fit_transform(D) >>> support = SelectKBest(chi2, k=2).fit(X, 0, 1) >>> v.get_feature_names() 'bar', 'baz', 'foo' >>> v.restrict(support.get_support()) DictVectorizer() >>> v.get_feature_names() 'bar', 'foo'

val set_params : ?params:(string * Py.Object.t) list -> [> tag ] Obj.t -> t

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form ``<component>__<parameter>`` so that it's possible to update each component of a nested object.

Parameters ---------- **params : dict Estimator parameters.

Returns ------- self : object Estimator instance.

val transform : x:[> `ArrayLike ] Np.Obj.t -> [> tag ] Obj.t -> [> `ArrayLike ] Np.Obj.t

Transform feature->value dicts to array or sparse matrix.

Named features not encountered during fit or fit_transform will be silently ignored.

Parameters ---------- X : Mapping or iterable over Mappings of shape (n_samples,) Dict(s) or Mapping(s) from feature names (arbitrary Python objects) to feature values (strings or convertible to dtype).

Returns ------- Xa : array, sparse matrix Feature vectors; always 2-d.

val vocabulary_ : t -> Dict.t

Attribute vocabulary_: get value or raise Not_found if None.

val vocabulary_opt : t -> Dict.t option

Attribute vocabulary_: get value as an option.

val feature_names_ : t -> [> `ArrayLike ] Np.Obj.t

Attribute feature_names_: get value or raise Not_found if None.

val feature_names_opt : t -> [> `ArrayLike ] Np.Obj.t option

Attribute feature_names_: get value as an option.

val to_string : t -> string

Print the object to a human-readable representation.

val show : t -> string

Print the object to a human-readable representation.

val pp : Stdlib.Format.formatter -> t -> unit

Pretty-print the object to a formatter.