Core environment interface for reinforcement learning.
Defines the standard RL environment API: reset, step, render, and close. All environments implement this interface. See Env for lifecycle management and custom environment creation.
Core environment interface for reinforcement learning.
This module defines the standard RL environment interface inspired by OpenAI Gymnasium. Environments represent interactive tasks where agents observe states, take actions, and receive rewards.
Environment Lifecycle
Create an environment, reset it to get an initial observation, interact by stepping with actions, and optionally render or close resources:
let env =
Env.create ~rng ~observation_space ~action_space ~reset ~step ()
in
let obs, info = Env.reset env () in
let transition = Env.step env action in
Env.close env
Episode Termination
Episodes end in two ways:
Terminated: Natural completion (e.g., goal reached, game over)
Truncated: Artificial cutoff (e.g., time limit, resource exhaustion)
This distinction matters for bootstrapping: terminated episodes have zero future value, while truncated episodes may continue beyond the limit.
Custom Environments
Implement custom environments by providing reset and step functions:
split_rng env ~n generates n independent RNG keys.
Splits the environment's RNG into n+1 keys: n returned in the array and one kept for the environment. Use this for parallel operations requiring independent randomness.