package oml

  1. Overview
  2. Docs

Train a Naive Bayes classifier on data encoded using Categorical variables.

Parameters

Signature

include Classifier_interfaces.Generative with type feature = D.feature and type class_ = D.class_ and type feature_probability = float array
include Classifier_interfaces.Classifier with type feature = D.feature with type class_ = D.class_
include Input_interfaces.Data with type feature = D.feature with type class_ = D.class_
type class_ = D.class_
type feature = D.feature
include Oml_util.Optional_arg_intf
type opt
val default : opt
type t

The classifier.

val eval : t -> feature -> class_ Probabilities.t

eval classifier feature assign probabilities to the possible classes based upon feature.

type samples = (class_ * feature) list

Representing training data.

val estimate : ?opt:opt -> ?classes:class_ list -> samples -> t

estimate opt classes samples estimates a classifier based upon the training samples.

classes is an optional argument to specify ahead of time the possible classes to train on (defaults to the ones found in the training data). This is useful for models where we know the population domain but may not see an example of a training datum for rare cases.

opt are the optional classifier dependent estimation/evaluation arguments.

  • raises Invalid_argument

    if classes are specified and new ones are found in the training samples.

type feature_probability = float array
val class_probabilities : t -> class_ -> float * (feature -> feature_probability)

class_probabilities t class returns the prior and per feature likelihood probability (ies) learned by t for class.

  • raises Not_found

    if t never trained on class.

val opt : ?smoothing:float -> unit -> opt

opt ~smoothing () the optional configuration of the classifier.

  • parameter smoothing

    Additive smoothing can be applied to the final estimate of Naive Bayes classifiers. When estimating a probability distribution by counting observed instances in the feature space we may want to smooth the values, particularly if our training data is sparse.