package saga

You can search for identifiers within the package.

in-package search v0.2.0

On This Page

Types
N-gram

package saga

saga
- CHANGES
- README
- Library saga
  - Saga
    
    Either
    
    Unicode
    
    Models
    
    Normalizers
    
    Pre_tokenizers
    
    Processors
    
    Decoders
    
    Trainers
    
    Encoding
    
    Bpe
    
    Wordpiece
    
    Added_token
    
    Tokenizer
    
    Sampler
- Library saga.models
  - Saga_models
    
    Ngram
- Library saga.tokenizers
  - Saga_tokenizers
    
    Either
    
    Unicode
    
    Models
    
    Normalizers
    
    Pre_tokenizers
    
    Processors
    
    Decoders
    
    Trainers
    
    Encoding
    
    Bpe
    
    Builder
    
    Trainer
    
    Wordpiece
    
    Builder
    
    Trainer
    
    Added_token
    
    Tokenizer
- Sources
  - saga
    
    io.ml
    
    lm.ml
    
    saga.ml
    
    saga__.ml
    
    sampler.ml
  - saga.models
    
    ngram.ml
    
    saga_models.ml
    
    saga_models__.ml
  - saga.tokenizers
    
    bpe.ml
    
    decoders.ml
    
    encoding.ml
    
    models.ml
    
    normalizers.ml
    
    pre_tokenizers.ml
    
    processors.ml
    
    saga_tokenizers.ml
    
    saga_tokenizers__.ml
    
    trainers.ml
    
    unicode.ml
    
    wordpiece.ml

Legend:
Page
Library
Module
Module type
Parameter
Class
Class type
Source

Module `Saga_models.Ngram`Source

N-gram language models (unigram, bigram, trigram)

N-gram language models for text generation

Types

Sourcetype t

An n-gram model

Sourcetype vocab_stats = {

vocab_size : int;
total_tokens : int;
unique_ngrams : int;

}

Statistics about the trained model

N-gram

Sourcetype smoothing =

| Add_k of float
| Stupid_backoff of float

Smoothing strategies:

Add_k k: classic add-k (Laplace) smoothing
Stupid_backoff alpha: back off to lower orders scaled by alpha

Source

val create : 
  n:int ->
  ?smoothing:smoothing ->
  ?cache_capacity:int ->
  int array ->
  t

create ~n ?smoothing ?cache_capacity tokens builds a model with configurable smoothing and an optional logits cache.

Sourceval logits : t -> context:int array -> float array

logits model ~context returns log probabilities given context. Context length should be n-1 for an n-gram model.

Sourceval perplexity : t -> int array -> float

perplexity model tokens computes perplexity on test tokens

Sourceval log_prob : t -> int array -> float

log_prob model tokens returns the sum of log-probabilities of the observed tokens under the model.

Source

val generate : 
  t ->
  ?max_tokens:int ->
  ?temperature:float ->
  ?seed:int ->
  ?start:int array ->
  unit ->
  int array

generate model ?max_tokens ?temperature ?seed ?start () generates tokens

Sourceval stats : t -> vocab_stats

stats model returns statistics about the highest-order n-grams.

Sourceval save : t -> string -> unit

save model filename serializes the model to a file.

Sourceval load : string -> t

load filename deserializes the model from a file.

Sourceval save_text : t -> string -> unit

save_text model filename serializes the model to a text file.

Sourceval load_text : string -> t

load_text filename deserializes the model from a text file.

Sourceval n : t -> int

n model returns the n-gram order of the model.

package saga

Module Saga_models.NgramSource

Types

N-gram

Module `Saga_models.Ngram`Source