package kaun

You can search for identifiers within the package.

in-package search v0.2.0

Flax-inspired neural network library for OCaml

Install

dune-project

Dependency

sha256=93abc49d075a1754442ccf495645bc4fdc83e4c66391ec8aca8fa15d2b4f44d2

sha512=5eb958c51f30ae46abded4c96f48d1825f79c7ce03f975f9a6237cdfed0d62c0b4a0774296694def391573d849d1f869919c49008acffca95946b818ad325f6f

Sourcetype t

GPT-2 tokenizer instance with BPE

val create : 
  ?vocab_file:string ->
  ?merges_file:string ->
  ?model_id:string ->
  unit ->
  t

Create a BPE tokenizer for GPT-2. Either provide vocab_file and merges_file paths, or a model_id to download from HuggingFace (defaults to gpt2)

Sourceval encode_to_array : t -> string -> int array

Encode text to token IDs

Sourceval encode : t -> string -> inputs

Encode text directly to input tensors ready for forward pass

val encode_batch : 
  t ->
  ?max_length:int ->
  ?padding:bool ->
  string list ->
  (int32, Rune.int32_elt) Rune.t

Encode multiple texts with optional padding

Sourceval decode : t -> int array -> string

Decode token IDs back to text

Sourceval get_bos_token_id : t -> int

Get beginning of sequence token ID

Sourceval get_eos_token_id : t -> int

Get end of sequence token ID

Sourceval get_pad_token_id : t -> int option

Get padding token ID

Sourceval get_vocab_size : t -> int

Get vocabulary size