package obandit

  1. Overview
  2. Docs
Ocaml Multi-Armed Bandits

Install

dune-project
 Dependency

Authors

Maintainers

Sources

obandit-0.3.4.tbz
sha256=0a84abe0b800b06a14b302e632403950e9552d9fc0b5b2cf09d7262a4ddad7dd
md5=f5aa2c86eb25d4fad308d3de0dbc9288

doc/obandit/Obandit/MakeUCB1/index.html

Module Obandit.MakeUCB1Source

The UCB1 Bandit for stochastic regret minimization .

Parameters

module P : KBanditParam

Signature

Sourcetype bandit = banditEstimates

The internal data structure of the bandit algorithm.

Sourceval initialBandit : bandit

The internal data structure of the bandit algorithm.

The initial state of the bandit algorithm.

Sourceval step : bandit -> float -> int * bandit

The initial state of the bandit algorithm.

step r advances the bandit game one step, where r is the reward for the last action. The result of this call is the next action, encoded as an integer in $ \{ 0, \cdots , K-1 \} $, and the new state of the bandit. The reward range depends on the bandit algorithm in use and the first reward provided to the algorithm is discarded.