prbnmcn-mcts 0.0.2 · OCaml Package

Implementation of MCTS based on UCB1 bandits.

Parameters

module State : Mcts_state

Signature

type 'a state = 'a State.t

'a state is the module type of states, with 'a the type of actions available in that state. See Mcts_state.

type 'a t

'a t is the type of a non-deterministic computation yielding values of type 'a.

Choices are taken either according to some user-defined strategy or automatically by optimizing a given reward using Monte-Carlo tree search.

and 'a choice

The type of a choice point with choices of type 'a.

Creating choice trees

val choices : 'a state -> 'a array -> 'a choice

choices state alternatives creates a choice point with state, the choice point can evaluate to any element of alternatives.

raises Invalid_argument
if Array.length alternatives = 0.

val map : 'a choice -> ('a -> 'b) -> 'b t

map choice f is the computation resulting from evaluating f on the outcome of choice. f must be deterministic.

val bind : 'a choice -> ('a -> 'b t) -> 'b t

bind choice f is the computation resulting from evaluating f on the outcome of choice. f can give rise to further choice points.

val return : 'a -> 'a t

return x is the deterministic computation evaluating to x.

val (>|=) : 'a choice -> ('a -> 'b) -> 'b t

>|= is an alias for map. Note that this operator can only apply on a value of type 'a choice rather than 'a t.

val (>>=) : 'a choice -> ('a -> 'b t) -> 'b t

>|= is an alias for bind. Note that this operator can only apply on a value of type 'a choice rather than 'a t.

type choice_function = {

choice : 'a. 'a state -> 'a array -> [ `Auto of int | `Choose of int ];

}

A choice_function is used to guide the unfolding of the tree. When in a given state with an array of possible choices:

returning `Choose i will move to choices.(i) unconditionally (the choice is performed "externally");
returning `Auto will pick a choice automatically using Monte-Carlo tree search to estimate the action values (the choice is performed "internally").

As a concrete example, in a two-player game involving a human and a computer, the human player would `Choose _ his next move while the computer player would be implemented by `Auto playouts. (This implicitly assumes that the state contains the current player.)

The playouts controls the amount of exploration performed during each `Auto move.

raises Invalid_argument
if i is not in the bounds of choices

raises Invalid_argument
if playouts is not strictly positive

type 'a reward_function = {

reward : 'i. 'i state -> 'a -> float;

}

A reward_function returns the reward at any intermediate state given the final outcome.

Considering the example of a two-player game, this allows to compute reward as a function of the player at each intermediate state of the game.

raises Invalid_argument
if the reward is not in the unit interval.

type 'a gen := Random.State.t -> 'a

The type of random samplers

type kernel = {

kernel : 'a. 'a state -> 'a array -> int gen;

}

A kernel is a distribution over available actions in a given state.

val uniform : kernel

The uniform kernel picks an action uniformly at random.

val run : 'a t -> choice_function -> 'a reward_function -> kernel -> 'a gen

run tree choice_function reward_function kernel ~playouts explores tree according to choice_function. `Auto _ picks moves optimizing the reward_function. kernel is used to guide the exploration when estimating the value of each action.

Incremental API

type _ interaction = private

| Interaction : 'a state * 'a array * ([ `Auto of int * kernel | `Choose of int ] -> 'b interaction) -> 'b interaction
(*
Interaction (state, choices, k) is an interaction point exposing to the user the state and available choices at the point as well as a continuation.
The continuation either takes
- `Auto (playouts, kernel) with playouts a strictly positive integer and kernel a value of type kernel; or
- `Choose i with i an integer in the domain of choices.
The continuation raises Invalid_argument if playouts is not strictly positive or if i is not in the domain of choices.
*)
| Terminated : 'a -> 'a interaction

The incremental API hands back control to the user at interaction points.

val run_incremental : 'a t -> 'a reward_function -> 'a interaction gen

Binding operator generation

This library uses memoization to construct lazy search trees. The following functor allows to create binding operators for types satisfying the Hashtbl.HashedType signature.

module Make_syntax (H : Hashtbl.HashedType) : sig ... end

module Poly_syntax : sig ... end

The Poly_syntax module uses the polymorphic equality and hashing to construct memoizing binding operators. Use only if you must...

package prbnmcn-mcts

Parameters

Signature

Creating choice trees

Incremental API

Binding operator generation