package obandit

  1. Overview
  2. Docs
Ocaml Multi-Armed Bandits

Install

dune-project
 Dependency

Authors

Maintainers

Sources

v0.2.2.tar.gz
sha256=45f0811dffce8326d0bc9b258e65b9c84c73e5c76ffb7d500cb9435c2b23808c
md5=1c0cf1677d232515f1a8f014cc24ea7c

doc/obandit/Obandit/MakeAlphaUCB/index.html

Module Obandit.MakeAlphaUCB

The $\alpha$-UCB Bandit for stochastic regret minimization described in [1] .

Parameters

module P : AlphaUCBParam

Signature

type bandit = banditEstimates

The internal data structure of the bandit algorithm.

val initialBandit : bandit

The internal data structure of the bandit algorithm.

The initial state of the bandit algorithm.

val step : bandit -> float -> int * bandit

The initial state of the bandit algorithm.

step r advances the bandit game one step, where r is the reward for the last action. The result of this call is the next action, encoded as an integer in $ \{ 0, \cdots , K-1 \} $, and the new state of the bandit. The reward range depends on the bandit algorithm in use and the first reward provided to the algorithm is discarded.

OCaml

Innovation. Community. Security.