package obandit
Ocaml Multi-Armed Bandits
Install
dune-project
Dependency
Authors
Maintainers
Sources
obandit-0.3.4.tbz
sha256=0a84abe0b800b06a14b302e632403950e9552d9fc0b5b2cf09d7262a4ddad7dd
md5=f5aa2c86eb25d4fad308d3de0dbc9288
doc/obandit/Obandit/MakeUCB1/index.html
Module Obandit.MakeUCB1
Source
The UCB1 Bandit for stochastic regret minimization .
Parameters
module P : KBanditParam
Signature
The internal data structure of the bandit algorithm.
The internal data structure of the bandit algorithm.
The initial state of the bandit algorithm.
The initial state of the bandit algorithm.
step r
advances the bandit game one step, where r
is the reward for the last action. The result of this call is the next action, encoded as an integer in $ \{ 0, \cdots , K-1 \} $, and the new state of the bandit. The reward range depends on the bandit algorithm in use and the first reward provided to the algorithm is discarded.
sectionYPositions = computeSectionYPositions($el), 10)"
x-init="setTimeout(() => sectionYPositions = computeSectionYPositions($el), 10)"
>
On This Page