package fehu
Reinforcement learning framework for OCaml
Install
dune-project
Dependency
Authors
Maintainers
Sources
raven-1.0.0.alpha1.tbz
sha256=8e277ed56615d388bc69c4333e43d1acd112b5f2d5d352e2453aef223ff59867
sha512=369eda6df6b84b08f92c8957954d107058fb8d3d8374082e074b56f3a139351b3ae6e3a99f2d4a4a2930dd950fd609593467e502368a13ad6217b571382da28c
doc/fehu.envs/Fehu_envs/Mountain_car/index.html
Module Fehu_envs.Mountain_car
Source
Mountain car environment - drive up a steep hill using momentum.
ID: MountainCar-v0
Observation Space: Fehu.Space.Box
with shape [2]
:
- Position: [-1.2, 0.6] (goal at 0.5)
- Velocity: [-0.07, 0.07]
Action Space: Fehu.Space.Discrete
with 3 choices:
0
: Push left (accelerate to the left)1
: No push (coast)2
: Push right (accelerate to the right)
Rewards: -1.0 for each step until the goal is reached
Episode Termination:
- Terminated: Car reaches position ≥ 0.5 (goal at top of right hill)
- Truncated: Episode exceeds 200 steps
Initial State: Random position in [-0.6, -0.4] with velocity 0.0
Rendering: ASCII visualization showing car position ('C') and goal ('G') on a track
Example
Train an agent to reach the goal by building momentum:
let rng = Rune.Rng.create () in
let env = Fehu_envs.Mountain_car.make ~rng () in
let obs, _ = Fehu.Env.reset env () in
let rec run_episode steps =
let action = (* policy decision based on position and velocity *) in
let t = Fehu.Env.step env action in
if t.terminated then
Printf.printf "Goal reached in %d steps!\n" steps
else if t.truncated then
Printf.printf "Failed to reach goal in 200 steps\n"
else
run_episode (steps + 1)
in
run_episode 0
Tips
- Classic exploration challenge: car engine is too weak to drive directly up the hill
- Agent must learn to build momentum by driving back and forth
- Sparse reward makes this difficult for naive value-based methods
- Consider reward shaping (e.g., bonus for reaching higher positions) or policy gradient methods
- Good for testing exploration strategies and delayed reward learning
Source
type state = {
mutable position : float;
mutable velocity : float;
mutable steps : int;
rng : Rune.Rng.key ref;
}
Source
val reset :
'a ->
?options:'b ->
unit ->
state ->
(float, Rune.float32_elt) Rune.t * Fehu.Info.t
Source
val step :
'a ->
(Int32.t, 'b) Rune.t ->
state ->
((float, Rune.float32_elt) Rune.t, 'c, 'd) Fehu.Env.transition
Source
val make :
rng:Rune.Rng.key ->
unit ->
(Fehu.Space.Box.element, Fehu.Space.Discrete.element, string) Fehu.Env.t