package obandit
Ocaml Multi-Armed Bandits
Install
dune-project
Dependency
Authors
Maintainers
Sources
v0.2.2.tar.gz
sha256=45f0811dffce8326d0bc9b258e65b9c84c73e5c76ffb7d500cb9435c2b23808c
md5=1c0cf1677d232515f1a8f014cc24ea7c
doc/obandit/Obandit/MakeAlphaUCB/index.html
Module Obandit.MakeAlphaUCB
The $\alpha$-UCB Bandit for stochastic regret minimization described in [1]
.
Parameters
module P : AlphaUCBParam
Signature
type bandit = banditEstimates
The internal data structure of the bandit algorithm.
val initialBandit : bandit
The internal data structure of the bandit algorithm.
The initial state of the bandit algorithm.
The initial state of the bandit algorithm.
step r
advances the bandit game one step, where r
is the reward for the last action. The result of this call is the next action, encoded as an integer in $ \{ 0, \cdots , K-1 \} $, and the new state of the bandit. The reward range depends on the bandit algorithm in use and the first reward provided to the algorithm is discarded.
sectionYPositions = computeSectionYPositions($el), 10)"
x-init="setTimeout(() => sectionYPositions = computeSectionYPositions($el), 10)"
>
On This Page