package saga
sectionYPositions = computeSectionYPositions($el), 10)"
x-init="setTimeout(() => sectionYPositions = computeSectionYPositions($el), 10)"
>
On This Page
Text processing and NLP extensions for Nx
Install
dune-project
Dependency
Authors
Maintainers
Sources
raven-1.0.0.alpha1.tbz
sha256=8e277ed56615d388bc69c4333e43d1acd112b5f2d5d352e2453aef223ff59867
sha512=369eda6df6b84b08f92c8957954d107058fb8d3d8374082e074b56f3a139351b3ae6e3a99f2d4a4a2930dd950fd609593467e502368a13ad6217b571382da28c
doc/saga.tokenizers/Saga_tokenizers/Bpe/index.html
Module Saga_tokenizers.BpeSource
Byte Pair Encoding (BPE) tokenization module
Core Types
BPE model
List of merge operations
Source
type config = {vocab : vocab;merges : merges;cache_capacity : int;dropout : float option;unk_token : string option;continuing_subword_prefix : string option;end_of_word_suffix : string option;fuse_unk : bool;byte_fallback : bool;ignore_merges : bool;
}BPE configuration
Model Creation
from_files ~vocab_file ~merges_file loads a BPE model from vocab.json and merges.txt files
Configuration Builder
Tokenization
Token with ID, string value, and character offsets
Vocabulary Management
get_vocab model returns the vocabulary as a list of (token, id) pairs
get_unk_token model returns the unknown token if configured
get_continuing_subword_prefix model returns the continuing subword prefix if configured
get_end_of_word_suffix model returns the end-of-word suffix if configured
Cache Management
Serialization
save model ~path ?name () saves the model to vocab.json and merges.txt files
read_files ~vocab_file ~merges_file reads vocabulary and merges from files
Training
sectionYPositions = computeSectionYPositions($el), 10)"
x-init="setTimeout(() => sectionYPositions = computeSectionYPositions($el), 10)"
>
On This Page