package fehu
Reinforcement learning framework for OCaml
Install
dune-project
Dependency
Authors
Maintainers
Sources
raven-1.0.0.alpha2.tbz
sha256=93abc49d075a1754442ccf495645bc4fdc83e4c66391ec8aca8fa15d2b4f44d2
sha512=5eb958c51f30ae46abded4c96f48d1825f79c7ce03f975f9a6237cdfed0d62c0b4a0774296694def391573d849d1f869919c49008acffca95946b818ad325f6f
doc/fehu.envs/Fehu_envs/Random_walk/index.html
Module Fehu_envs.Random_walkSource
One-dimensional random walk environment.
ID: RandomWalk-v0
Observation Space: Fehu.Space.Box with shape [1] in range [-10.0, 10.0]. Represents the agent's continuous position on a line.
Action Space: Fehu.Space.Discrete with 2 choices:
0: Move left (position -= 1.0)1: Move right (position += 1.0)
Rewards: Negative absolute position (-|position|), encouraging the agent to stay near the origin. Terminal states at boundaries yield reward -10.0.
Episode Termination:
- Terminated: Agent reaches position -10.0 or +10.0 (boundaries)
- Truncated: Episode exceeds 200 steps
Rendering: ASCII visualization showing agent position ('o') on a line.
Example
Train a simple policy to stay near the origin:
let rng = Rune.Rng.create () in
let env = Fehu_envs.Random_walk.make ~rng () in
let obs, _ = Fehu.Env.reset env () in
for _ = 1 to 100 do
let action = (* policy chooses 0 or 1 *) in
let t = Fehu.Env.step env action in
Printf.printf "Position: %.2f, Reward: %.2f\n"
(Rune.to_array t.observation).(0) t.reward
doneTips
- The environment is deterministic given the action sequence
- Optimal policy alternates actions to minimize distance from origin
- Good for testing value function approximation with continuous states
Source
val reset :
'a ->
?options:'b ->
unit ->
state ->
(float, Rune.float32_elt) Rune.t * Fehu.Info.tSource
val step :
'a ->
(Int32.t, 'b) Rune.t ->
state ->
((float, Rune.float32_elt) Rune.t, 'c, 'd) Fehu.Env.transitionSource
val make :
rng:Rune.Rng.key ->
unit ->
(Fehu.Space.Box.element, Fehu.Space.Discrete.element, string) Fehu.Env.t