package kaun
Install
dune-project
Dependency
Authors
Maintainers
Sources
sha256=8e277ed56615d388bc69c4333e43d1acd112b5f2d5d352e2453aef223ff59867
sha512=369eda6df6b84b08f92c8957954d107058fb8d3d8374082e074b56f3a139351b3ae6e3a99f2d4a4a2930dd950fd609593467e502368a13ad6217b571382da28c
doc/kaun.models/Kaun_models/GPT2/index.html
Module Kaun_models.GPT2Source
GPT-2: Generative Pre-trained Transformer 2 for causal language modeling.
GPT-2: Generative Pre-trained Transformer 2.
Radford et al., 2019: "Language Models are Unsupervised Multitask Learners"
A transformer-based autoregressive language model that uses causal self-attention for text generation and language understanding tasks.
Configuration
type config = {vocab_size : int;(*Size of vocabulary
*)n_positions : int;(*Maximum sequence length
*)n_embd : int;(*Hidden dimension (d_model)
*)n_layer : int;(*Number of transformer decoder layers
*)n_head : int;(*Number of attention heads
*)n_inner : int option;(*FFN intermediate dimension (defaults to 4 * n_embd)
*)activation_function : [ `gelu | `relu | `swish | `gelu_new ];(*Activation function
*)resid_pdrop : float;(*Dropout probability for residual connections
*)embd_pdrop : float;(*Dropout probability for embeddings
*)attn_pdrop : float;(*Dropout for attention probabilities
*)layer_norm_epsilon : float;(*Layer normalization epsilon
*)initializer_range : float;(*Standard deviation for weight initialization
*)scale_attn_weights : bool;(*Whether to scale attention weights
*)use_cache : bool;(*Whether to cache key/values
*)scale_attn_by_inverse_layer_idx : bool;(*Scale attention by 1/sqrt(layer_idx)
*)reorder_and_upcast_attn : bool;(*Reorder and upcast attention
*)bos_token_id : int option;(*Beginning of sequence token ID
*)eos_token_id : int option;(*End of sequence token ID
*)pad_token_id : int option;(*Padding token ID
*)
}GPT-2 model configuration
Model Components
GPT-2 embeddings combining token and position embeddings
type 'a output = {attentions : (float, 'a) Rune.t list option;(*Attention weights from all layers if output_attentions=true
*)
}Model outputs
type 'a gpt2 = {model : Kaun.module_;params : 'a Kaun.params;config : config;dtype : (float, 'a) Rune.dtype;
}Unified GPT-2 model type
type inputs = {input_ids : (int32, Rune.int32_elt) Rune.t;attention_mask : (int32, Rune.int32_elt) Rune.t option;position_ids : (int32, Rune.int32_elt) Rune.t option;
}Input tensors for GPT-2
Create a new GPT-2 model
create ?config () creates a new GPT-2 model.
val from_pretrained :
?model_id:string ->
?revision:Kaun_huggingface.revision ->
?cache_config:Kaun_huggingface.Config.t ->
dtype:(float, 'a) Rune.dtype ->
unit ->
'a gpt2Load pretrained GPT-2 from HuggingFace
from_pretrained ?model_id ?dtype () loads pretrained GPT-2.
Default model_id is "gpt2" is CPU, dtype is Float32. Returns a unified gpt2 record with model, params, and config.
Example:
let gpt2 = GPT2.from_pretrained () in
(* Or with options: *)
let gpt2 = GPT2.from_pretrained ~model_id:"gpt2-medium" ()val forward :
'a gpt2 ->
inputs ->
?training:bool ->
?output_hidden_states:bool ->
?output_attentions:bool ->
unit ->
'a outputForward pass through GPT-2
forward ~model ~params ~input_ids ... () performs a forward pass.
Task-Specific Heads
GPT-2 for causal language modeling
Tokenization
Utilities
Count total parameters in the model
Get human-readable parameter statistics
GPT-2 Configuration Parsing
Parse GPT-2 configuration from HuggingFace JSON format
Common Model Configurations
Load GPT-2 Small (124M parameters)
Load GPT-2 Medium (355M parameters)
Load GPT-2 Large (774M parameters)
Load GPT-2 XL (1.5B parameters)