package fehu

  1. Overview
  2. Docs
Reinforcement learning framework for OCaml

Install

dune-project
 Dependency

Authors

Maintainers

Sources

raven-1.0.0.alpha1.tbz
sha256=8e277ed56615d388bc69c4333e43d1acd112b5f2d5d352e2453aef223ff59867
sha512=369eda6df6b84b08f92c8957954d107058fb8d3d8374082e074b56f3a139351b3ae6e3a99f2d4a4a2930dd950fd609593467e502368a13ad6217b571382da28c

doc/fehu.algorithms/Fehu_algorithms/index.html

Module Fehu_algorithmsSource

Reinforcement learning algorithms for Fehu.

This library provides production-ready implementations of standard RL algorithms. Each algorithm follows a consistent interface: create an agent with a policy network and configuration, train with learn, and use the trained policy with predict.

Available Algorithms

Policy Gradient Methods

  • Reinforce: Monte Carlo Policy Gradient (REINFORCE)

Value-Based Methods

  • Dqn: Deep Q-Network (DQN)

Usage Pattern

All algorithms follow this pattern:

  open Fehu

  (* 1. Create policy network *)
  let policy_net = Kaun.Layer.sequential [...] in

  (* 2. Initialize algorithm *)
  let agent = Algorithm.create
    ~policy_network:policy_net
    ~n_actions:n
    ~rng:(Rune.Rng.key 42)
    Algorithm.default_config
  in

  (* 3. Train *)
  let agent = Algorithm.learn agent ~env ~total_timesteps:100_000 () in

  (* 4. Use trained policy *)
  let action = Algorithm.predict agent obs ~training:false |> fst

Choosing an Algorithm

  • REINFORCE: Simple policy gradient, works for small discrete action spaces, requires complete episodes. Good for learning but sample inefficient.
  • DQN: Off-policy value-based method with experience replay, good for discrete actions, more sample efficient than REINFORCE.

Future algorithms:

  • PPO: More sample efficient, supports continuous actions, industry standard
  • SAC: Off-policy actor-critic, excellent for continuous control
Sourcemodule Reinforce : sig ... end

Reinforce algorithm implementation.

Sourcemodule Dqn : sig ... end

Dqn algorithm implementation.