package prbnmcn-mcts

  1. Overview
  2. Docs

A module of type S implements a monadic interface to building lazy search trees and allow their guided exploration using Monte-Carlo tree search.

type 'a state

'a state is the module type of states, with 'a the type of actions available in that state. See Mcts_state.

type 'a t

'a t is the type of a non-deterministic computation yielding values of type 'a.

Choices are taken either according to some user-defined strategy or automatically by optimizing a given reward using Monte-Carlo tree search.

and 'a choice

The type of a choice point with choices of type 'a.

Creating choice trees

val choices : 'a state -> 'a array -> 'a choice

choices state alternatives creates a choice point with state, the choice point can evaluate to any element of alternatives.

val map : 'a choice -> ('a -> 'b) -> 'b t

map choice f is the computation resulting from evaluating f on the outcome of choice. f must be deterministic.

val bind : 'a choice -> ('a -> 'b t) -> 'b t

bind choice f is the computation resulting from evaluating f on the outcome of choice. f can give rise to further choice points.

val return : 'a -> 'a t

return x is the deterministic computation evaluating to x.

val (>|=) : 'a choice -> ('a -> 'b) -> 'b t

>|= is an alias for map. Note that this operator can only apply on a value of type 'a choice rather than 'a t.

val (>>=) : 'a choice -> ('a -> 'b t) -> 'b t

>|= is an alias for bind. Note that this operator can only apply on a value of type 'a choice rather than 'a t.

type choice_function = {
  1. choice : 'a. 'a state -> 'a array -> [ `Auto of int | `Choose of int ];
}

A choice_function is used to guide the unfolding of the tree. When in a given state with an array of possible choices:

  • returning `Choose i will move to choices.(i) unconditionally (the choice is performed "externally");
  • returning `Auto will pick a choice automatically using Monte-Carlo tree search to estimate the action values (the choice is performed "internally").

As a concrete example, in a two-player game involving a human and a computer, the human player would `Choose _ his next move while the computer player would be implemented by `Auto playouts. (This implicitly assumes that the state contains the current player.)

The playouts controls the amount of exploration performed during each `Auto move.

type 'a reward_function = {
  1. reward : 'i. 'i state -> 'a -> float;
}

A reward_function returns the reward at any intermediate state given the final outcome.

Considering the example of a two-player game, this allows to compute reward as a function of the player at each intermediate state of the game.

type 'a gen := Random.State.t -> 'a

The type of random samplers

type kernel = {
  1. kernel : 'a. 'a state -> 'a array -> int gen;
}

A kernel is a distribution over available actions in a given state.

val uniform : kernel

The uniform kernel picks an action uniformly at random.

val run : 'a t -> choice_function -> 'a reward_function -> kernel -> 'a gen

run tree choice_function reward_function kernel ~playouts explores tree according to choice_function. `Auto _ picks moves optimizing the reward_function. kernel is used to guide the exploration when estimating the value of each action.

Incremental API

type _ interaction = private
  1. | Interaction : 'a state * 'a array * ([ `Auto of int * kernel | `Choose of int ] -> 'b interaction) -> 'b interaction
    (*

    Interaction (state, choices, k) is an interaction point exposing to the user the state and available choices at the point as well as a continuation.

    The continuation either takes

    • `Auto (playouts, kernel) with playouts a strictly positive integer and kernel a value of type kernel; or
    • `Choose i with i an integer in the domain of choices.

    The continuation raises Invalid_argument if playouts is not strictly positive or if i is not in the domain of choices.

    *)
  2. | Terminated : 'a -> 'a interaction

The incremental API hands back control to the user at interaction points.

val run_incremental : 'a t -> 'a reward_function -> 'a interaction gen

Binding operator generation

This library uses memoization to construct lazy search trees. The following functor allows to create binding operators for types satisfying the Hashtbl.HashedType signature.

module Make_syntax (H : Hashtbl.HashedType) : sig ... end
module Poly_syntax : sig ... end

The Poly_syntax module uses the polymorphic equality and hashing to construct memoizing binding operators. Use only if you must...

OCaml

Innovation. Community. Security.