Vectorization enables simultaneous stepping of multiple environments, essential for efficient on-policy data collection. See Vector_env for batched operations.
Vectorized environments for parallel interaction.
Vectorized environments run multiple environment instances in parallel, enabling efficient data collection. This module follows Gymnasium's vectorization API, batching observations, actions, and rewards across environments.
Benefits
Collect trajectories faster by stepping multiple environments simultaneously
Amortize policy inference costs across batched observations
Essential for on-policy algorithms that need large amounts of data per update
Usage
Create a vectorized environment from multiple instances:
let envs = List.init 8 (fun _ -> make_env ()) in
let vec_env = Vector_env.create_sync ~envs () in
let observations, infos = Vector_env.reset vec_env () in
let actions = (* compute batched actions *) in
let step = Vector_env.step vec_env actions
With autoreset enabled (default), terminated environments automatically reset on the next step, returning their initial observation. This ensures continuous data collection without manual intervention.
Reset terminated environments on the next step call
*)
| Disabled
(*
Do not automatically reset; requires manual intervention
*)
Autoreset behavior for terminated episodes.
With Next_step, when an environment terminates or truncates, the next call to step returns its initial observation instead of requiring an explicit reset. This maintains a constant number of active environments.
create_sync ~autoreset_mode ~envs () creates a synchronous vectorized environment.
Wraps envs to provide batched operations. All environments are stepped sequentially in the current process. For true parallelism, consider asynchronous implementations (not yet provided).
Parameters:
autoreset_mode: Controls automatic resetting of terminated environments (default: Next_step)
envs: List of environment instances to vectorize. Must be non-empty and share compatible observation/action spaces
step vec_env actions executes actions in all environments.
Takes an array of actions with length num_envs, steps each environment, and returns batched results.
If autoreset is enabled (Next_step), terminated environments automatically reset and return their initial observation. The terminations and truncations arrays indicate which environments ended before resetting. Infos for terminated environments include a `final_observation` key with the structured final observation encoded as an Info.value.