package fehu

  1. Overview
  2. Docs
Reinforcement learning framework for OCaml

Install

dune-project
 Dependency

Authors

Maintainers

Sources

raven-1.0.0.alpha1.tbz
sha256=8e277ed56615d388bc69c4333e43d1acd112b5f2d5d352e2453aef223ff59867
sha512=369eda6df6b84b08f92c8957954d107058fb8d3d8374082e074b56f3a139351b3ae6e3a99f2d4a4a2930dd950fd609593467e502368a13ad6217b571382da28c

doc/fehu.envs/Fehu_envs/Random_walk/index.html

Module Fehu_envs.Random_walkSource

One-dimensional random walk environment.

ID: RandomWalk-v0

Observation Space: Fehu.Space.Box with shape [1] in range [-10.0, 10.0]. Represents the agent's continuous position on a line.

Action Space: Fehu.Space.Discrete with 2 choices:

  • 0: Move left (position -= 1.0)
  • 1: Move right (position += 1.0)

Rewards: Negative absolute position (-|position|), encouraging the agent to stay near the origin. Terminal states at boundaries yield reward -10.0.

Episode Termination:

  • Terminated: Agent reaches position -10.0 or +10.0 (boundaries)
  • Truncated: Episode exceeds 200 steps

Rendering: ASCII visualization showing agent position ('o') on a line.

Example

Train a simple policy to stay near the origin:

  let rng = Rune.Rng.create () in
  let env = Fehu_envs.Random_walk.make ~rng () in
  let obs, _ = Fehu.Env.reset env () in
  for _ = 1 to 100 do
    let action = (* policy chooses 0 or 1 *) in
    let t = Fehu.Env.step env action in
    Printf.printf "Position: %.2f, Reward: %.2f\n"
      (Rune.to_array t.observation).(0) t.reward
  done
Tips
  • The environment is deterministic given the action sequence
  • Optimal policy alternates actions to minimize distance from origin
  • Good for testing value function approximation with continuous states
Sourcetype observation = (float, Rune.float32_elt) Rune.t
Sourcetype action = (int32, Rune.int32_elt) Rune.t
Sourcetype render = string
Sourcetype state = {
  1. mutable position : float;
  2. mutable steps : int;
}
Sourceval observation_space : Fehu.Space.Box.element Fehu__Space.t
Sourceval action_space : Fehu.Space.Discrete.element Fehu__Space.t
Sourceval metadata : Fehu.Metadata.t
Sourceval reset : 'a -> ?options:'b -> unit -> state -> (float, Rune.float32_elt) Rune.t * Fehu.Info.t
Sourceval step : 'a -> (Int32.t, 'b) Rune.t -> state -> ((float, Rune.float32_elt) Rune.t, 'c, 'd) Fehu.Env.transition
Sourceval render : state -> string
On This Page
  1. Example
  2. Tips