package saga
Text processing and NLP extensions for Nx
Install
dune-project
Dependency
Authors
Maintainers
Sources
raven-1.0.0.alpha1.tbz
sha256=8e277ed56615d388bc69c4333e43d1acd112b5f2d5d352e2453aef223ff59867
sha512=369eda6df6b84b08f92c8957954d107058fb8d3d8374082e074b56f3a139351b3ae6e3a99f2d4a4a2930dd950fd609593467e502368a13ad6217b571382da28c
doc/saga.tokenizers/Saga_tokenizers/Processors/index.html
Module Saga_tokenizers.Processors
Source
Post-processing module for tokenization output.
Post-processors handle special tokens and formatting after tokenization, such as adding CLS
and SEP
tokens for BERT, or handling sentence pairs.
Source
type encoding = {
ids : int array;
type_ids : int array;
tokens : string array;
offsets : (int * int) array;
special_tokens_mask : int array;
attention_mask : int array;
overflowing : encoding list;
sequence_ranges : (int * int * int) list;
}
Type representing an encoding to be processed
Main post-processor type
Constructors
Source
val roberta :
sep:(string * int) ->
cls:(string * int) ->
?trim_offsets:bool ->
?add_prefix_space:bool ->
unit ->
t
Create a RoBERTa post-processor.
Source
val template :
single:string ->
?pair:string ->
?special_tokens:(string * int) list ->
unit ->
t
Create a template post-processor.
Operations
Process encodings with the post-processor.
Get the number of tokens added by this post-processor.
Serialization
Convert post-processor to JSON representation
Create post-processor from JSON representation
sectionYPositions = computeSectionYPositions($el), 10)"
x-init="setTimeout(() => sectionYPositions = computeSectionYPositions($el), 10)"
>
On This Page