Implements a fixed-capacity circular buffer storing complete transitions. When capacity is reached, oldest transitions are overwritten. Supports uniform random sampling for breaking temporal correlations in training data.
All transitions are stored contiguously in memory. Observation and action arrays are lazily initialized on the first call to add.
add buffer transition stores a transition in the buffer.
Appends transition to the buffer, overwriting the oldest transition if at capacity. The first call initializes internal storage arrays based on the observation and action types.
Samples batch_size transitions uniformly at random from the buffer. If batch_size exceeds the current buffer size, samples min(batch_size, size) transitions instead. Sampling is with replacement.