package obandit
sectionYPositions = computeSectionYPositions($el), 10)"
x-init="setTimeout(() => sectionYPositions = computeSectionYPositions($el), 10)"
>
On This Page
Ocaml Multi-Armed Bandits
Install
dune-project
Dependency
Authors
Maintainers
Sources
v0.2.2.tar.gz
sha256=45f0811dffce8326d0bc9b258e65b9c84c73e5c76ffb7d500cb9435c2b23808c
md5=1c0cf1677d232515f1a8f014cc24ea7c
doc/obandit/Obandit/MakeAlphaUCB/index.html
Module Obandit.MakeAlphaUCBSource
The $\alpha$-UCB Bandit for stochastic regret minimization described in [1] .
Parameters
module P : AlphaUCBParamSignature
The internal data structure of the bandit algorithm.
The internal data structure of the bandit algorithm.
The initial state of the bandit algorithm.
The initial state of the bandit algorithm.
step r advances the bandit game one step, where r is the reward for the last action. The result of this call is the next action, encoded as an integer in $ \{ 0, \cdots , K-1 \} $, and the new state of the bandit. The reward range depends on the bandit algorithm in use and the first reward provided to the algorithm is discarded.
sectionYPositions = computeSectionYPositions($el), 10)"
x-init="setTimeout(() => sectionYPositions = computeSectionYPositions($el), 10)"
>
On This Page