package fehu
Reinforcement learning framework for OCaml
Install
dune-project
Dependency
Authors
Maintainers
Sources
raven-1.0.0.alpha1.tbz
sha256=8e277ed56615d388bc69c4333e43d1acd112b5f2d5d352e2453aef223ff59867
sha512=369eda6df6b84b08f92c8957954d107058fb8d3d8374082e074b56f3a139351b3ae6e3a99f2d4a4a2930dd950fd609593467e502368a13ad6217b571382da28c
doc/fehu.envs/Fehu_envs/Random_walk/index.html
Module Fehu_envs.Random_walk
Source
One-dimensional random walk environment.
ID: RandomWalk-v0
Observation Space: Fehu.Space.Box
with shape [1]
in range [-10.0, 10.0]. Represents the agent's continuous position on a line.
Action Space: Fehu.Space.Discrete
with 2 choices:
0
: Move left (position -= 1.0)1
: Move right (position += 1.0)
Rewards: Negative absolute position (-|position|
), encouraging the agent to stay near the origin. Terminal states at boundaries yield reward -10.0.
Episode Termination:
- Terminated: Agent reaches position -10.0 or +10.0 (boundaries)
- Truncated: Episode exceeds 200 steps
Rendering: ASCII visualization showing agent position ('o') on a line.
Example
Train a simple policy to stay near the origin:
let rng = Rune.Rng.create () in
let env = Fehu_envs.Random_walk.make ~rng () in
let obs, _ = Fehu.Env.reset env () in
for _ = 1 to 100 do
let action = (* policy chooses 0 or 1 *) in
let t = Fehu.Env.step env action in
Printf.printf "Position: %.2f, Reward: %.2f\n"
(Rune.to_array t.observation).(0) t.reward
done
Tips
- The environment is deterministic given the action sequence
- Optimal policy alternates actions to minimize distance from origin
- Good for testing value function approximation with continuous states
Source
val reset :
'a ->
?options:'b ->
unit ->
state ->
(float, Rune.float32_elt) Rune.t * Fehu.Info.t
Source
val step :
'a ->
(Int32.t, 'b) Rune.t ->
state ->
((float, Rune.float32_elt) Rune.t, 'c, 'd) Fehu.Env.transition
Source
val make :
rng:Rune.Rng.key ->
unit ->
(Fehu.Space.Box.element, Fehu.Space.Discrete.element, string) Fehu.Env.t