package fehu

  1. Overview
  2. Docs

Module Fehu.BufferSource

Experience replay buffer for trajectory storage.

Buffers accumulate trajectories for batch training or off-policy learning.

Experience collection buffers for reinforcement learning algorithms.

This module provides two buffer types for storing agent-environment interactions: replay buffers for off-policy algorithms and rollout buffers for on-policy algorithms. Both support efficient batch sampling and storage management.

Buffer Types

Replay buffers store transitions with complete state information, supporting off-policy algorithms like DQN, SAC, and TD3. They maintain a fixed-capacity circular buffer that overwrites oldest experiences when full.

Rollout buffers store sequential steps with optional value estimates and log probabilities, supporting on-policy algorithms like PPO and A2C. They compute advantages using Generalized Advantage Estimation (GAE) before returning batches.

Usage

Create a replay buffer and add transitions:

  let buffer = Buffer.Replay.create ~capacity:10000 in
  let transition =
    { observation; action; reward; next_observation; terminated; truncated }
  in
  Buffer.Replay.add buffer transition

Sample a batch for training:

  let batch = Buffer.Replay.sample buffer ~rng ~batch_size:32 in
  Array.iter (fun t -> (* train on transition *)) batch

Use rollout buffers for on-policy data:

  let buffer = Buffer.Rollout.create ~capacity:2048 in
  Buffer.Rollout.add buffer
    { observation; action; reward; terminated; truncated; value; log_prob };
  Buffer.Rollout.compute_advantages buffer ~last_value ~last_done ~gamma:0.99 ~gae_lambda:0.95;
  let steps, advantages, returns = Buffer.Rollout.get buffer
Sourcetype ('obs, 'act) transition = {
  1. observation : 'obs;
    (*

    Current state observation

    *)
  2. action : 'act;
    (*

    Action taken in current state

    *)
  3. reward : float;
    (*

    Immediate reward received

    *)
  4. next_observation : 'obs;
    (*

    Resulting next state observation

    *)
  5. terminated : bool;
    (*

    Whether episode ended naturally

    *)
  6. truncated : bool;
    (*

    Whether episode was artificially truncated

    *)
}

Basic transition for off-policy algorithms.

Represents a complete state transition containing both the current and next observations. Used by replay buffers for algorithms that learn from arbitrary past experiences.

Sourcetype ('obs, 'act) step = {
  1. observation : 'obs;
    (*

    State observation at this step

    *)
  2. action : 'act;
    (*

    Action taken at this step

    *)
  3. reward : float;
    (*

    Immediate reward received

    *)
  4. terminated : bool;
    (*

    Whether episode ended at this step

    *)
  5. truncated : bool;
    (*

    Whether the episode was truncated at this step

    *)
  6. value : float option;
    (*

    Value estimate V(s) from critic, if available

    *)
  7. log_prob : float option;
    (*

    Log probability log π(a|s) from policy, if available

    *)
}

Rollout step for on-policy algorithms.

Represents a single timestep with optional policy information. Unlike transitions, steps do not store next observations since on-policy data is processed sequentially. Value estimates and log probabilities support policy gradient methods.

Replay Buffer (Off-Policy: DQN, SAC, TD3)

Sourcemodule Replay : sig ... end

Replay buffer for off-policy algorithms.

Rollout Buffer (On-Policy: PPO, A2C)

Sourcemodule Rollout : sig ... end

Rollout buffer for on-policy algorithms.