package kaun

  1. Overview
  2. Docs
Flax-inspired neural network library for OCaml

Install

dune-project
 Dependency

Authors

Maintainers

Sources

raven-1.0.0.alpha2.tbz
sha256=93abc49d075a1754442ccf495645bc4fdc83e4c66391ec8aca8fa15d2b4f44d2
sha512=5eb958c51f30ae46abded4c96f48d1825f79c7ce03f975f9a6237cdfed0d62c0b4a0774296694def391573d849d1f869919c49008acffca95946b818ad325f6f

doc/kaun.models/Kaun_models/Bert/Tokenizer/index.html

Module Bert.TokenizerSource

Sourcetype t

BERT tokenizer instance

Sourceval create : ?vocab_file:string -> ?model_id:string -> unit -> t

Create a WordPiece tokenizer for BERT. Either provide a vocab_file path or a model_id to download from HuggingFace (defaults to bert-base-uncased)

Sourceval create_wordpiece : ?vocab_file:string -> ?model_id:string -> unit -> t

Alias for create

Sourceval encode_to_array : t -> string -> int array

Encode text to token IDs with CLS and SEP tokens

Sourceval encode : t -> string -> inputs

Encode text directly to input tensors ready for forward pass

Sourceval encode_batch : t -> ?max_length:int -> ?padding:bool -> string list -> (int32, Rune.int32_elt) Rune.t

Encode multiple texts with padding and special tokens

Sourceval decode : t -> int array -> string

Decode token IDs back to text