Library
Module
Module type
Parameter
Class
Class type
The WrapRange functor wraps a bandit algorithm with the doubling trick. This heuristic allows to use a andit algorithm without knowing the reward ranges. All rewards are linearly rescaled to a range (initially given by a RangeParam). When a value is observed above the range, the bandit algorithm is restarted and the range interval is doubled in that direction.
module R : RangeParam
module P : BanditParam
module B (Pb : BanditParam) : Bandit
A Mutable bandit.
The getAction function mutates the bandit one step further in the bandit game. The argument is the reward for the last action and the result is the next action. Rewards are floats in 0,1
and actions are integers in 0,n-1
. The first reward is discarded. In order to use rewards larger than 1, please use the WrapDoubling functor.