Rewards: +1.0 for each step the pole remains upright
Episode Termination:
Terminated: Pole angle exceeds ±12° or cart position exceeds ±2.4
Truncated: Episode reaches 500 steps (considered solved if average reward ≥ 475 over 100 consecutive episodes)
Rendering: Text output showing cart position, velocity, pole angle, and angular velocity
Example
Train an agent to balance the pole:
let rng = Rune.Rng.create () in
let env = Fehu_envs.Cartpole.make ~rng () in
let obs, _ = Fehu.Env.reset env () in
let rec run_episode total_reward =
let action = (* DQN or policy gradient decision *) in
let t = Fehu.Env.step env action in
let new_total = total_reward +. t.reward in
if t.terminated || t.truncated then
Printf.printf "Episode reward: %.0f\n" new_total
else
run_episode new_total
in
run_episode 0.0
Tips
One of the most popular RL benchmarks, considered solved at 475/500 average reward
Good for testing DQN, REINFORCE, A2C, and PPO algorithms
Requires learning to balance competing objectives (position and angle)
Observation space is continuous, making it ideal for neural network policies