package kaun
Install
dune-project
Dependency
Authors
Maintainers
Sources
sha256=8e277ed56615d388bc69c4333e43d1acd112b5f2d5d352e2453aef223ff59867
sha512=369eda6df6b84b08f92c8957954d107058fb8d3d8374082e074b56f3a139351b3ae6e3a99f2d4a4a2930dd950fd609593467e502368a13ad6217b571382da28c
doc/kaun.models/Kaun_models/GPT2/index.html
Module Kaun_models.GPT2
Source
GPT-2: Generative Pre-trained Transformer 2 for causal language modeling.
GPT-2: Generative Pre-trained Transformer 2.
Radford et al., 2019: "Language Models are Unsupervised Multitask Learners"
A transformer-based autoregressive language model that uses causal self-attention for text generation and language understanding tasks.
Configuration
type config = {
vocab_size : int;
(*Size of vocabulary
*)n_positions : int;
(*Maximum sequence length
*)n_embd : int;
(*Hidden dimension (d_model)
*)n_layer : int;
(*Number of transformer decoder layers
*)n_head : int;
(*Number of attention heads
*)n_inner : int option;
(*FFN intermediate dimension (defaults to 4 * n_embd)
*)activation_function : [ `gelu | `relu | `swish | `gelu_new ];
(*Activation function
*)resid_pdrop : float;
(*Dropout probability for residual connections
*)embd_pdrop : float;
(*Dropout probability for embeddings
*)attn_pdrop : float;
(*Dropout for attention probabilities
*)layer_norm_epsilon : float;
(*Layer normalization epsilon
*)initializer_range : float;
(*Standard deviation for weight initialization
*)scale_attn_weights : bool;
(*Whether to scale attention weights
*)use_cache : bool;
(*Whether to cache key/values
*)scale_attn_by_inverse_layer_idx : bool;
(*Scale attention by 1/sqrt(layer_idx)
*)reorder_and_upcast_attn : bool;
(*Reorder and upcast attention
*)bos_token_id : int option;
(*Beginning of sequence token ID
*)eos_token_id : int option;
(*End of sequence token ID
*)pad_token_id : int option;
(*Padding token ID
*)
}
GPT-2 model configuration
Model Components
GPT-2 embeddings combining token and position embeddings
type 'a output = {
attentions : (float, 'a) Rune.t list option;
(*Attention weights from all layers if output_attentions=true
*)
}
Model outputs
type 'a gpt2 = {
model : Kaun.module_;
params : 'a Kaun.params;
config : config;
dtype : (float, 'a) Rune.dtype;
}
Unified GPT-2 model type
type inputs = {
input_ids : (int32, Rune.int32_elt) Rune.t;
attention_mask : (int32, Rune.int32_elt) Rune.t option;
position_ids : (int32, Rune.int32_elt) Rune.t option;
}
Input tensors for GPT-2
Create a new GPT-2 model
create ?config ()
creates a new GPT-2 model.
val from_pretrained :
?model_id:string ->
?revision:Kaun_huggingface.revision ->
?cache_config:Kaun_huggingface.Config.t ->
dtype:(float, 'a) Rune.dtype ->
unit ->
'a gpt2
Load pretrained GPT-2 from HuggingFace
from_pretrained ?model_id ?dtype ()
loads pretrained GPT-2.
Default model_id is "gpt2" is CPU, dtype is Float32. Returns a unified gpt2 record with model, params, and config.
Example:
let gpt2 = GPT2.from_pretrained () in
(* Or with options: *)
let gpt2 = GPT2.from_pretrained ~model_id:"gpt2-medium" ()
val forward :
'a gpt2 ->
inputs ->
?training:bool ->
?output_hidden_states:bool ->
?output_attentions:bool ->
unit ->
'a output
Forward pass through GPT-2
forward ~model ~params ~input_ids ... ()
performs a forward pass.
Task-Specific Heads
GPT-2 for causal language modeling
Tokenization
Utilities
Count total parameters in the model
Get human-readable parameter statistics
GPT-2 Configuration Parsing
Parse GPT-2 configuration from HuggingFace JSON format
Common Model Configurations
Load GPT-2 Small (124M parameters)
Load GPT-2 Medium (355M parameters)
Load GPT-2 Large (774M parameters)
Load GPT-2 XL (1.5B parameters)