Page
Library
Module
Module type
Parameter
Class
Class type
Source
Kaun.LayerSourceNeural network layer constructors.
This module provides functional layer constructors for building neural networks. Each function creates a layer configuration that returns a module_, which encapsulates parameter initialization and forward computation. Layers can be composed using sequential to build complex architectures.
All layers follow a consistent pattern: they take architecture parameters (dimensions, hyperparameters) and optional initialization strategies, returning a module that can be initialized with random number generators and applied to input tensors.
Create layers by calling constructor functions:
let dense = Layer.linear ~in_features:784 ~out_features:128 () in
let activation = Layer.relu () inCompose layers into networks:
let network = Layer.sequential [
Layer.linear ~in_features:784 ~out_features:128 ();
Layer.relu ();
Layer.dropout ~rate:0.2 ();
Layer.linear ~in_features:128 ~out_features:10 ();
] inInitialize and apply:
let params = Kaun.init network ~rngs ~dtype in
let output = Kaun.apply network params ~training:true input intype module_ = {init : 'layout. rngs:Rune.Rng.key ->
dtype:(float, 'layout) Rune.dtype ->
Ptree.t;init ~rngs ~dtype initializes module parameters.
Creates a parameter tree containing all trainable parameters for this module. The function is polymorphic over layout and device to support different tensor backends and memory layouts.
The RNG key should be split appropriately for modules with multiple parameters to ensure independent initialization.
*)apply : 'layout. Ptree.t ->
training:bool ->
?rngs:Rune.Rng.key ->
(float, 'layout) Rune.t ->
(float, 'layout) Rune.t;apply params ~training ?rngs input performs forward computation.
Executes the module's forward pass using the provided parameters and input tensor.
The training flag enables different behaviors:
training=truetraining=trueRNG is required for stochastic operations during training. Operations needing randomness will fail if rngs is None when training=true.
}val conv1d :
in_channels:int ->
out_channels:int ->
?kernel_size:int ->
?stride:int ->
?dilation:int ->
?padding:[ `Same | `Valid | `Causal ] ->
unit ->
module_conv1d ~in_channels ~out_channels ?kernel_size ?stride ?dilation ?padding () creates a 1D convolutional layer over inputs of shape batch; in_channels; length. Supports `Same, `Valid, and `Causal padding. Default: kernel_size=3, stride=1, dilation=1, padding=`Same.
val conv2d :
in_channels:int ->
out_channels:int ->
?kernel_size:(int * int) ->
unit ->
module_conv2d ~in_channels ~out_channels ?kernel_size () creates a 2D convolutional layer.
Performs 2D convolution over 4D input tensors of shape batch_size, in_channels, height, width. The layer maintains learnable weight and bias parameters.
The weight tensor has shape out_channels, in_channels, kernel_height, kernel_width and is initialized using Glorot uniform initialization. The bias tensor has shape out_channels and is zero-initialized.
Example
let conv = Layer.conv2d ~in_channels:3 ~out_channels:64 ~kernel_size:(5, 5) () in
(* Processes RGB images (3 channels) to produce 64 feature maps with 5x5 filters *)val linear :
in_features:int ->
out_features:int ->
?weight_init:Initializers.t ->
?bias_init:Initializers.t ->
unit ->
module_linear ~in_features ~out_features ?weight_init ?bias_init () creates a fully connected layer.
Applies linear transformation y = xW^T + b where x is input, W is weight matrix, and b is bias vector. Accepts inputs of any shape with last dimension matching in_features.
The weight tensor has shape out_features, in_features and bias has shape out_features.
Examples
let classifier = Layer.linear ~in_features:512 ~out_features:10 () in
(* Maps 512-dimensional features to 10 class logits *)
let custom_init = Layer.linear
~in_features:256 ~out_features:128
~weight_init:(Initializers.he_normal ())
~bias_init:(Initializers.constant 0.1) () indropout ~rate () creates a dropout layer for regularization.
During training, randomly sets elements to zero with probability rate and scales remaining elements by 1 / (1 - rate) to maintain expected values. During evaluation, applies identity transformation.
Requires random number generator during training. No learnable parameters.
Example
let drop = Layer.dropout ~rate:0.5 () in
(* Randomly zeros 50% of activations during training *)batch_norm ~num_features () creates a batch normalization layer.
Normalizes inputs across the batch dimension, learning scale and shift parameters. Applies transformation y = γ((x - μ) / σ) + β where μ and σ are batch statistics, and γ, β are learnable parameters.
Maintains running statistics for evaluation mode. Parameters include scale (γ), bias (β), running mean, and running variance.
max_pool2d ~kernel_size ?stride () creates a 2D max pooling layer.
Applies maximum operation over spatial windows, reducing spatial dimensions while preserving channel dimension.
No learnable parameters.
avg_pool2d ~kernel_size ?stride () creates a 2D average pooling layer.
Applies average operation over spatial windows, providing smoother downsampling compared to max pooling.
No learnable parameters.
flatten () creates a flatten layer that reshapes multidimensional inputs to 2D.
Preserves batch dimension while flattening all other dimensions. Transforms shape batch_size, d1, d2, ..., dn to batch_size, d1 * d2 * ... * dn.
Commonly used before dense layers in CNN architectures. No learnable parameters.
relu () creates a ReLU activation layer applying max(0, x) elementwise.
Most common activation for hidden layers. Computationally efficient with good gradient flow for positive inputs. No learnable parameters.
sigmoid () creates a sigmoid activation layer applying 1 / (1 + exp(-x)) elementwise.
Maps inputs to range (0, 1). Commonly used for binary classification and gating mechanisms. No learnable parameters.
tanh () creates a hyperbolic tangent activation layer applying tanh(x) elementwise.
Maps inputs to range (-1, 1). Provides stronger gradients than sigmoid but can suffer from vanishing gradients. No learnable parameters.
gelu () creates a GELU activation layer.
Applies Gaussian Error Linear Unit activation, popular in transformer architectures. Smoother alternative to ReLU with better gradient properties. No learnable parameters.
swish () creates a Swish activation layer applying x * sigmoid(x) elementwise.
Self-gated activation function that can outperform ReLU in deep networks. No learnable parameters.
sequential layers creates a sequential composition of layers.
Applies layers in order, threading output of each layer as input to the next. The resulting module's parameters are the union of all component layer parameters.
Example
let mlp = Layer.sequential [
Layer.linear ~in_features:784 ~out_features:256 ();
Layer.relu ();
Layer.dropout ~rate:0.3 ();
Layer.linear ~in_features:256 ~out_features:10 ();
] inval einsum :
einsum_str:string ->
shape:int array ->
?kernel_init:Initializers.t ->
unit ->
module_einsum ~einsum_str ~shape ?kernel_init () creates a parameterized Einstein summation layer.
Implements learnable tensor contractions specified by Einstein notation. Useful for implementing custom linear transformations and attention mechanisms.
rms_norm ~dim ?eps ?scale_init () creates a Root Mean Square normalization layer.
Applies RMS normalization with learnable scaling. Normalizes by the RMS of activations rather than full statistics like batch normalization.
layer_norm ~dim ?eps ?elementwise_affine () creates a layer normalization layer.
Normalizes activations across the feature dimension within each sample. Popular in transformer architectures for stable training.
val embedding :
vocab_size:int ->
embed_dim:int ->
?scale:bool ->
?embedding_init:Initializers.t ->
unit ->
module_embedding ~vocab_size ~embed_dim ?scale ?embedding_init () creates an embedding lookup layer.
Maps discrete tokens (integers) to dense vectors. Commonly used as the first layer in NLP models to convert token IDs to continuous representations.
The embedding matrix has shape vocab_size, embed_dim.
val mlp :
in_features:int ->
hidden_features:int ->
out_features:int ->
?activation:[ `relu | `gelu | `swish ] ->
?dropout:float ->
unit ->
module_mlp ~in_features ~hidden_features ~out_features ... creates a multi-layer perceptron (feed-forward network).
Standard MLP architecture: Linear -> Activation -> Dropout -> Linear -> Dropout Commonly used in transformers and other architectures.
val rnn :
input_size:int ->
hidden_size:int ->
?return_sequences:bool ->
?learned_init:bool ->
unit ->
module_Simple tanh RNN over a sequence. Input batch; seq; input_size, output batch; hidden_size (last hidden state).
val gru :
input_size:int ->
hidden_size:int ->
?return_sequences:bool ->
?learned_init:bool ->
unit ->
module_GRU over a sequence. Input/output like rnn.
val lstm :
input_size:int ->
hidden_size:int ->
?return_sequences:bool ->
?learned_init:bool ->
unit ->
module_LSTM over a sequence. Input/output like rnn.
Adds learned positional embeddings to input batch; seq; embed_dim.
val positional_encoding_sinusoidal_table :
max_len:int ->
embed_dim:int ->
dtype:(float, 'layout) Rune.dtype ->
(float, 'layout) Rune.tCreate a max_len; embed_dim sinusoidal positional encoding table (not trainable). Can be added to token embeddings.