Library
Module
Module type
Parameter
Class
Class type
Implementation of MCTS based on UCB1 bandits.
module State : Mcts_state
type 'a state = 'a State.t
'a
state is the module type of states, with 'a
the type of actions available in that state. See Mcts_state
.
'a t
is the type of a non-deterministic computation yielding values of type 'a
.
Choices are taken either according to some user-defined strategy or automatically by optimizing a given reward using Monte-Carlo tree search.
choices state alternatives
creates a choice point with state
, the choice point can evaluate to any element of alternatives
.
map choice f
is the computation resulting from evaluating f
on the outcome of choice
. f
must be deterministic.
bind choice f
is the computation resulting from evaluating f
on the outcome of choice
. f
can give rise to further choice points.
val return : 'a -> 'a t
return x
is the deterministic computation evaluating to x
.
>|=
is an alias for map
. Note that this operator can only apply on a value of type 'a choice
rather than 'a t
.
>|=
is an alias for bind
. Note that this operator can only apply on a value of type 'a choice
rather than 'a t
.
A choice_function
is used to guide the unfolding of the tree. When in a given state
with an array of possible choices
:
`Choose i
will move to choices.(i)
unconditionally (the choice is performed "externally");`Auto
will pick a choice automatically using Monte-Carlo tree search to estimate the action values (the choice is performed "internally").As a concrete example, in a two-player game involving a human and a computer, the human player would `Choose _
his next move while the computer player would be implemented by `Auto playouts
. (This implicitly assumes that the state
contains the current player.)
The playouts
controls the amount of exploration performed during each `Auto
move.
A reward_function
returns the reward at any intermediate state
given the final outcome.
Considering the example of a two-player game, this allows to compute reward as a function of the player at each intermediate state of the game.
type 'a gen := Random.State.t -> 'a
The type of random samplers
A kernel
is a distribution over available actions in a given state.
val uniform : kernel
The uniform
kernel picks an action uniformly at random.
val run : 'a t -> choice_function -> 'a reward_function -> kernel -> 'a gen
run tree choice_function reward_function kernel ~playouts
explores tree
according to choice_function
. `Auto _
picks moves optimizing the reward_function
. kernel
is used to guide the exploration when estimating the value of each action.
type _ interaction = private
| Interaction : 'a state
* 'a array
* ([ `Auto of int * kernel | `Choose of int ] ->
'b interaction) -> 'b interaction
Interaction (state, choices, k)
is an interaction point exposing to the user the state
and available choices
at the point as well as a continuation.
The continuation either takes
`Auto (playouts, kernel)
with playouts
a strictly positive integer and kernel
a value of type kernel
; or`Choose i
with i
an integer in the domain of choices
.The continuation raises Invalid_argument if playouts
is not strictly positive or if i
is not in the domain of choices
.
| Terminated : 'a -> 'a interaction
The incremental API hands back control to the user at interaction
points.
val run_incremental : 'a t -> 'a reward_function -> 'a interaction gen
This library uses memoization to construct lazy search trees. The following functor allows to create binding operators for types satisfying the Hashtbl.HashedType
signature.
module Make_syntax (H : Hashtbl.HashedType) : sig ... end
module Poly_syntax : sig ... end
The Poly_syntax
module uses the polymorphic equality and hashing to construct memoizing binding operators. Use only if you must...