Page
Library
Module
Module type
Parameter
Class
Class type
Source
Mcts.S
SourceA module of type S
implements a monadic interface to building lazy search trees and allow their guided exploration using Monte-Carlo tree search.
'a
state is the module type of states, with 'a
the type of actions available in that state. See Mcts_state
.
'a t
is the type of a non-deterministic computation yielding values of type 'a
.
Choices are taken either according to some user-defined strategy or automatically by optimizing a given reward using Monte-Carlo tree search.
The type of a choice point with choices of type 'a
.
choices state alternatives
creates a choice point with state
, the choice point can evaluate to any element of alternatives
.
map choice f
is the computation resulting from evaluating f
on the outcome of choice
. f
must be deterministic.
bind choice f
is the computation resulting from evaluating f
on the outcome of choice
. f
can give rise to further choice points.
>|=
is an alias for map
. Note that this operator can only apply on a value of type 'a choice
rather than 'a t
.
>|=
is an alias for bind
. Note that this operator can only apply on a value of type 'a choice
rather than 'a t
.
type choice_function = {
choice : 'a. 'a state -> 'a array -> [ `Auto of int | `Choose of int ];
}
A choice_function
is used to guide the unfolding of the tree. When in a given state
with an array of possible choices
:
`Choose i
will move to choices.(i)
unconditionally (the choice is performed "externally");`Auto
will pick a choice automatically using Monte-Carlo tree search to estimate the action values (the choice is performed "internally").As a concrete example, in a two-player game involving a human and a computer, the human player would `Choose _
his next move while the computer player would be implemented by `Auto playouts
. (This implicitly assumes that the state
contains the current player.)
The playouts
controls the amount of exploration performed during each `Auto
move.
A reward_function
returns the reward at any intermediate state
given the final outcome.
Considering the example of a two-player game, this allows to compute reward as a function of the player at each intermediate state of the game.
The type of random samplers
A kernel
is a distribution over available actions in a given state.
run tree choice_function reward_function kernel ~playouts
explores tree
according to choice_function
. `Auto _
picks moves optimizing the reward_function
. kernel
is used to guide the exploration when estimating the value of each action.
type _ interaction = private
| Interaction : 'a state
* 'a array
* ([ `Auto of int * kernel | `Choose of int ] ->
'b interaction) -> 'b interaction
Interaction (state, choices, k)
is an interaction point exposing to the user the state
and available choices
at the point as well as a continuation.
The continuation either takes
`Auto (playouts, kernel)
with playouts
a strictly positive integer and kernel
a value of type kernel
; or`Choose i
with i
an integer in the domain of choices
.The continuation raises Invalid_argument if playouts
is not strictly positive or if i
is not in the domain of choices
.
| Terminated : 'a -> 'a interaction
The incremental API hands back control to the user at interaction
points.
This library uses memoization to construct lazy search trees. The following functor allows to create binding operators for types satisfying the Hashtbl.HashedType
signature.
The Poly_syntax
module uses the polymorphic equality and hashing to construct memoizing binding operators. Use only if you must...