package saga

  1. Overview
  2. Docs
Text processing and NLP extensions for Nx

Install

dune-project
 Dependency

Authors

Maintainers

Sources

raven-1.0.0.alpha2.tbz
sha256=93abc49d075a1754442ccf495645bc4fdc83e4c66391ec8aca8fa15d2b4f44d2
sha512=5eb958c51f30ae46abded4c96f48d1825f79c7ce03f975f9a6237cdfed0d62c0b4a0774296694def391573d849d1f869919c49008acffca95946b818ad325f6f

doc/saga.tokenizers/Saga_tokenizers/Special/index.html

Module Saga_tokenizers.SpecialSource

Sourceval make : ?single_word:bool -> ?lstrip:bool -> ?rstrip:bool -> ?normalized:bool -> string -> special

make ?single_word ?lstrip ?rstrip ?normalized token creates a special token configuration.

All parameters default to appropriate values for special tokens:

  • single_word: false - can match partial words
  • lstrip: false - don't strip left whitespace
  • rstrip: false - don't strip right whitespace
  • normalized: false - special tokens not normalized
Sourceval pad : string -> special

pad token creates a padding token (e.g., "<pad>").

Sourceval unk : string -> special

unk token creates an unknown token (e.g., "<unk>").

Sourceval bos : string -> special

bos token creates a beginning-of-sequence token (e.g., "<s>").

Sourceval eos : string -> special

eos token creates an end-of-sequence token (e.g., "</s>").

Sourceval cls : string -> special

cls token creates a classification token (e.g., "[CLS]").

Sourceval sep : string -> special

sep token creates a separator token (e.g., "[SEP]").

Sourceval mask : string -> special

mask token creates a mask token (e.g., "[MASK]").