package saga

  1. Overview
  2. Docs
Text processing and NLP extensions for Nx

Install

dune-project
 Dependency

Authors

Maintainers

Sources

raven-1.0.0.alpha1.tbz
sha256=8e277ed56615d388bc69c4333e43d1acd112b5f2d5d352e2453aef223ff59867
sha512=369eda6df6b84b08f92c8957954d107058fb8d3d8374082e074b56f3a139351b3ae6e3a99f2d4a4a2930dd950fd609593467e502368a13ad6217b571382da28c

doc/saga.tokenizers/Saga_tokenizers/Bpe/Trainer/index.html

Module Bpe.TrainerSource

Sourcetype trainer
Sourcetype trainer_config = {
  1. min_frequency : int;
  2. vocab_size : int;
  3. show_progress : bool;
  4. special_tokens : string list;
  5. limit_alphabet : int option;
  6. initial_alphabet : char list;
  7. continuing_subword_prefix : string option;
  8. end_of_word_suffix : string option;
  9. max_token_length : int option;
}
Sourceval default_config : trainer_config

Default trainer configuration

Create a new trainer

Sourceval feed : trainer -> string list -> unit

Feed training data to the trainer

Sourceval train : trainer -> t -> string list

Train the model and return special tokens