package kaun

  1. Overview
  2. Docs
Flax-inspired neural network library for OCaml

Install

dune-project
 Dependency

Authors

Maintainers

Sources

raven-1.0.0.alpha1.tbz
sha256=8e277ed56615d388bc69c4333e43d1acd112b5f2d5d352e2453aef223ff59867
sha512=369eda6df6b84b08f92c8957954d107058fb8d3d8374082e074b56f3a139351b3ae6e3a99f2d4a4a2930dd950fd609593467e502368a13ad6217b571382da28c

doc/kaun.models/Kaun_models/GPT2/Tokenizer/index.html

Module GPT2.TokenizerSource

Sourcetype t

GPT-2 tokenizer instance with BPE

Sourceval create : ?vocab_file:string -> ?merges_file:string -> ?model_id:string -> unit -> t

Create a BPE tokenizer for GPT-2. Either provide vocab_file and merges_file paths, or a model_id to download from HuggingFace (defaults to gpt2)

Sourceval encode_to_array : t -> string -> int array

Encode text to token IDs

Sourceval encode : t -> string -> inputs

Encode text directly to input tensors ready for forward pass

Sourceval encode_batch : t -> ?max_length:int -> ?padding:bool -> string list -> (int32, Rune.int32_elt) Rune.t

Encode multiple texts with optional padding

Sourceval decode : t -> int array -> string

Decode token IDs back to text

Sourceval get_bos_token_id : t -> int

Get beginning of sequence token ID

Sourceval get_eos_token_id : t -> int

Get end of sequence token ID

Sourceval get_pad_token_id : t -> int option

Get padding token ID

Sourceval get_vocab_size : t -> int

Get vocabulary size