package fehu

You can search for identifiers within the package.

in-package search v0.2.0

On This Page

Buffer Types
Usage
Replay Buffer (Off-Policy: DQN, SAC, TD3)
Rollout Buffer (On-Policy: PPO, A2C)

package fehu

fehu
- CHANGES
- README
- Library fehu
  - Fehu
    
    Errors
    
    Info
    
    Metadata
    
    Render
    
    Pixel
    
    Space
    
    Value
    
    Discrete
    
    Box
    
    Multi_binary
    
    Multi_discrete
    
    Tuple
    
    Dict
    
    Sequence
    
    Text
    
    Env
    
    Wrapper
    
    Vector_env
    
    Buffer
    
    Replay
    
    Rollout
    
    Training
    
    Policy
    
    Trajectory
- Library fehu.algorithms
  - Fehu_algorithms
    
    Reinforce
    
    Dqn
- Library fehu.envs
  - Fehu_envs
    
    Random_walk
    
    Grid_world
    
    Cartpole
    
    Mountain_car
- Library fehu.visualize
  - Fehu_visualize
    
    Overlay
    
    Video
    
    Sink
- Sources
  - fehu
    
    buffer.ml
    
    env.ml
    
    errors.ml
    
    fehu.ml
    
    fehu__.ml
    
    info.ml
    
    metadata.ml
    
    policy.ml
    
    render.ml
    
    space.ml
    
    training.ml
    
    trajectory.ml
    
    vector_env.ml
    
    wrapper.ml
  - fehu.algorithms
    
    dqn.ml
    
    fehu_algorithms.ml
    
    fehu_algorithms__.ml
    
    reinforce.ml
  - fehu.envs
    
    cartpole.ml
    
    fehu_envs.ml
    
    fehu_envs__.ml
    
    grid_world.ml
    
    mountain_car.ml
    
    random_walk.ml
  - fehu.visualize
    
    fehu_visualize.ml
    
    fehu_visualize__.ml
    
    overlay.ml
    
    sink.ml
    
    utils.ml
    
    wrapper_video.ml

Legend:
Page
Library
Module
Module type
Parameter
Class
Class type
Source

Module `Fehu.Buffer`Source

Experience replay buffer for trajectory storage.

Buffers accumulate trajectories for batch training or off-policy learning.

Experience collection buffers for reinforcement learning algorithms.

This module provides two buffer types for storing agent-environment interactions: replay buffers for off-policy algorithms and rollout buffers for on-policy algorithms. Both support efficient batch sampling and storage management.

Buffer Types

Replay buffers store transitions with complete state information, supporting off-policy algorithms like DQN, SAC, and TD3. They maintain a fixed-capacity circular buffer that overwrites oldest experiences when full.

Rollout buffers store sequential steps with optional value estimates and log probabilities, supporting on-policy algorithms like PPO and A2C. They compute advantages using Generalized Advantage Estimation (GAE) before returning batches.

Usage

Create a replay buffer and add transitions:

  let buffer = Buffer.Replay.create ~capacity:10000 in
  let transition =
    { observation; action; reward; next_observation; terminated; truncated }
  in
  Buffer.Replay.add buffer transition

Sample a batch for training:

  let batch = Buffer.Replay.sample buffer ~rng ~batch_size:32 in
  Array.iter (fun t -> (* train on transition *)) batch

Use rollout buffers for on-policy data:

  let buffer = Buffer.Rollout.create ~capacity:2048 in
  Buffer.Rollout.add buffer
    { observation; action; reward; terminated; truncated; value; log_prob };
  Buffer.Rollout.compute_advantages buffer ~last_value ~last_done ~gamma:0.99 ~gae_lambda:0.95;
  let steps, advantages, returns = Buffer.Rollout.get buffer

Sourcetype ('obs, 'act) transition = {

observation : 'obs;
(*
Current state observation
*)
action : 'act;
(*
Action taken in current state
*)
reward : float;
(*
Immediate reward received
*)
next_observation : 'obs;
(*
Resulting next state observation
*)
terminated : bool;
(*
Whether episode ended naturally
*)
truncated : bool;
(*
Whether episode was artificially truncated
*)

}

Basic transition for off-policy algorithms.

Represents a complete state transition containing both the current and next observations. Used by replay buffers for algorithms that learn from arbitrary past experiences.

Sourcetype ('obs, 'act) step = {

observation : 'obs;
(*
State observation at this step
*)
action : 'act;
(*
Action taken at this step
*)
reward : float;
(*
Immediate reward received
*)
terminated : bool;
(*
Whether episode ended at this step
*)
truncated : bool;
(*
Whether the episode was truncated at this step
*)
value : float option;
(*
Value estimate V(s) from critic, if available
*)
log_prob : float option;
(*
Log probability log π(a|s) from policy, if available
*)

}

Rollout step for on-policy algorithms.

Represents a single timestep with optional policy information. Unlike transitions, steps do not store next observations since on-policy data is processed sequentially. Value estimates and log probabilities support policy gradient methods.

Replay Buffer (Off-Policy: DQN, SAC, TD3)

Sourcemodule Replay : sig ... end

Replay buffer for off-policy algorithms.

Rollout Buffer (On-Policy: PPO, A2C)

Sourcemodule Rollout : sig ... end

Rollout buffer for on-policy algorithms.

package fehu

Module Fehu.BufferSource

Buffer Types

Usage

Replay Buffer (Off-Policy: DQN, SAC, TD3)

Rollout Buffer (On-Policy: PPO, A2C)

Module `Fehu.Buffer`Source