package fehu

  1. Overview
  2. Docs

Module Fehu_envs.CartpoleSource

Classic cart-pole balancing environment.

ID: CartPole-v1

Observation Space: Fehu.Space.Box with shape [4] in range:

  • Position: [-4.8, 4.8]
  • Velocity: [-∞, ∞]
  • Angle: [~-24°, ~24°]
  • Angular velocity: [-∞, ∞]

Action Space: Fehu.Space.Discrete with 2 choices:

  • 0: Push cart to the left
  • 1: Push cart to the right

Rewards: +1.0 for each step the pole remains upright

Episode Termination:

  • Terminated: Pole angle exceeds ±12° or cart position exceeds ±2.4
  • Truncated: Episode reaches 500 steps (considered solved if average reward ≥ 475 over 100 consecutive episodes)

Rendering: Text output showing cart position, velocity, pole angle, and angular velocity

Example

Train an agent to balance the pole:

  let rng = Rune.Rng.create () in
  let env = Fehu_envs.Cartpole.make ~rng () in
  let obs, _ = Fehu.Env.reset env () in
  let rec run_episode total_reward =
    let action = (* DQN or policy gradient decision *) in
    let t = Fehu.Env.step env action in
    let new_total = total_reward +. t.reward in
    if t.terminated || t.truncated then
      Printf.printf "Episode reward: %.0f\n" new_total
    else
      run_episode new_total
  in
  run_episode 0.0
Tips
  • One of the most popular RL benchmarks, considered solved at 475/500 average reward
  • Good for testing DQN, REINFORCE, A2C, and PPO algorithms
  • Requires learning to balance competing objectives (position and angle)
  • Observation space is continuous, making it ideal for neural network policies
Sourcetype observation = (float, Rune.float32_elt) Rune.t
Sourcetype action = (int32, Rune.int32_elt) Rune.t
Sourcetype render = string
Sourcetype state = {
  1. mutable x : float;
  2. mutable x_dot : float;
  3. mutable theta : float;
  4. mutable theta_dot : float;
  5. mutable steps : int;
  6. rng : Rune.Rng.key ref;
}
Sourceval gravity : float
Sourceval masscart : float
Sourceval masspole : float
Sourceval total_mass : float
Sourceval length : float
Sourceval polemass_length : float
Sourceval force_mag : float
Sourceval tau : float
Sourceval theta_threshold_radians : float
Sourceval x_threshold : float
Sourceval observation_space : Fehu.Space.Box.element Fehu__Space.t
Sourceval action_space : Fehu.Space.Discrete.element Fehu__Space.t
Sourceval metadata : Fehu.Metadata.t
Sourceval reset : 'a -> ?options:'b -> unit -> state -> (float, Rune.float32_elt) Rune.t * Fehu.Info.t
Sourceval step : 'a -> (Int32.t, 'b) Rune.t -> state -> ((float, Rune.float32_elt) Rune.t, 'c, 'd) Fehu.Env.transition
Sourceval render : state -> string