package fehu
Reinforcement learning framework for OCaml
Install
dune-project
Dependency
Authors
Maintainers
Sources
raven-1.0.0.alpha1.tbz
sha256=8e277ed56615d388bc69c4333e43d1acd112b5f2d5d352e2453aef223ff59867
sha512=369eda6df6b84b08f92c8957954d107058fb8d3d8374082e074b56f3a139351b3ae6e3a99f2d4a4a2930dd950fd609593467e502368a13ad6217b571382da28c
doc/fehu.envs/Fehu_envs/Cartpole/index.html
Module Fehu_envs.Cartpole
Source
Classic cart-pole balancing environment.
ID: CartPole-v1
Observation Space: Fehu.Space.Box
with shape [4]
in range:
- Position: [-4.8, 4.8]
- Velocity: [-∞, ∞]
- Angle: [~-24°, ~24°]
- Angular velocity: [-∞, ∞]
Action Space: Fehu.Space.Discrete
with 2 choices:
0
: Push cart to the left1
: Push cart to the right
Rewards: +1.0 for each step the pole remains upright
Episode Termination:
- Terminated: Pole angle exceeds ±12° or cart position exceeds ±2.4
- Truncated: Episode reaches 500 steps (considered solved if average reward ≥ 475 over 100 consecutive episodes)
Rendering: Text output showing cart position, velocity, pole angle, and angular velocity
Example
Train an agent to balance the pole:
let rng = Rune.Rng.create () in
let env = Fehu_envs.Cartpole.make ~rng () in
let obs, _ = Fehu.Env.reset env () in
let rec run_episode total_reward =
let action = (* DQN or policy gradient decision *) in
let t = Fehu.Env.step env action in
let new_total = total_reward +. t.reward in
if t.terminated || t.truncated then
Printf.printf "Episode reward: %.0f\n" new_total
else
run_episode new_total
in
run_episode 0.0
Tips
- One of the most popular RL benchmarks, considered solved at 475/500 average reward
- Good for testing DQN, REINFORCE, A2C, and PPO algorithms
- Requires learning to balance competing objectives (position and angle)
- Observation space is continuous, making it ideal for neural network policies
Source
type state = {
mutable x : float;
mutable x_dot : float;
mutable theta : float;
mutable theta_dot : float;
mutable steps : int;
rng : Rune.Rng.key ref;
}
Source
val reset :
'a ->
?options:'b ->
unit ->
state ->
(float, Rune.float32_elt) Rune.t * Fehu.Info.t
Source
val step :
'a ->
(Int32.t, 'b) Rune.t ->
state ->
((float, Rune.float32_elt) Rune.t, 'c, 'd) Fehu.Env.transition
Source
val make :
rng:Rune.Rng.key ->
unit ->
(Fehu.Space.Box.element, Fehu.Space.Discrete.element, string) Fehu.Env.t