package fehu

  1. Overview
  2. Docs
Reinforcement learning framework for OCaml

Install

dune-project
 Dependency

Authors

Maintainers

Sources

raven-1.0.0.alpha1.tbz
sha256=8e277ed56615d388bc69c4333e43d1acd112b5f2d5d352e2453aef223ff59867
sha512=369eda6df6b84b08f92c8957954d107058fb8d3d8374082e074b56f3a139351b3ae6e3a99f2d4a4a2930dd950fd609593467e502368a13ad6217b571382da28c

doc/fehu.envs/Fehu_envs/Grid_world/index.html

Module Fehu_envs.Grid_worldSource

Two-dimensional grid world with goal and obstacles.

ID: GridWorld-v0

Observation Space: Fehu.Space.Multi_discrete with shape [5; 5]. Represents the agent's (row, column) position as two discrete coordinates, each in range [0, 5).

Action Space: Fehu.Space.Discrete with 4 choices:

  • 0: Move up (row -= 1)
  • 1: Move down (row += 1)
  • 2: Move left (col -= 1)
  • 3: Move right (col += 1)

Rewards:

  • +10.0: Reaching the goal at position (4, 4)
  • -1.0: Every other step (encourages shortest path)

Episode Termination:

  • Terminated: Agent reaches the goal position (4, 4)
  • Truncated: Episode exceeds 200 steps

Obstacles: Position (2, 2) is blocked. Actions attempting to move into obstacles or outside the grid leave the agent's position unchanged.

Rendering: ASCII grid visualization:

  • 'A': Agent position
  • 'G': Goal position (4, 4)
  • '#': Obstacle at (2, 2)
  • '.': Empty cells
Example

Navigate to the goal while avoiding obstacles:

  let rng = Rune.Rng.create () in
  let env = Fehu_envs.Grid_world.make ~rng () in
  let obs, _ = Fehu.Env.reset env () in
  let rec run_episode steps =
    if steps >= 200 then ()
    else begin
      let action = (* policy maps (row, col) to action 0-3 *) in
      let t = Fehu.Env.step env action in
      match Fehu.Env.render env with
      | Some grid -> print_endline grid
      | None -> ();
      if t.terminated then
        Printf.printf "Goal reached in %d steps!\n" steps
      else
        run_episode (steps + 1)
    end
  in
  run_episode 0
Tips
  • Optimal policy requires approximately 8 steps (Manhattan distance from (0,0) to (4,4))
  • Obstacle at (2, 2) forces agents to plan around it
  • Good for testing Q-learning, DQN, or policy gradient methods on discrete spaces
Sourcetype observation = (int32, Rune.int32_elt) Rune.t
Sourcetype action = (int32, Rune.int32_elt) Rune.t
Sourcetype render = string
Sourcetype state = {
  1. mutable position : int * int;
  2. mutable steps : int;
}
Sourceval grid_size : int
Sourceval observation_space : Fehu.Space.Multi_discrete.element Fehu__Space.t
Sourceval action_space : Fehu.Space.Discrete.element Fehu__Space.t
Sourceval metadata : Fehu.Metadata.t
Sourceval is_goal : (int * int) -> bool
Sourceval is_obstacle : (int * int) -> bool
Sourceval is_valid_pos : (int * int) -> bool
Sourceval reset : 'a -> ?options:'b -> unit -> state -> (int32, Rune.int32_elt) Rune.t * Fehu.Info.t
Sourceval step : 'a -> (Int32.t, 'b) Rune.t -> state -> ((int32, Rune.int32_elt) Rune.t, 'c, 'd) Fehu.Env.transition
Sourceval render : state -> string
On This Page
  1. Example
  2. Tips