A reimplementation of AlphaGo in Go (specifically AlphaZero)
The algorithm is composed of:
- a Monte-Carlo Tree Search (MCTS) implemented in the
mctspackage; - a Dual Neural Network (DNN) implemented in the
dualnetpackage.
The algorithm is wrapped into a top-level structure (AZ for AlphaZero). The algorithm applies to any game able to fulfill a specified contract.
The contract specifies the description of a game state.
In this package, the contract is a Go interface declared in the game package: State.
-
In the
agogopackage, each player of the game is anAgent, and in agame, twoAgentsare playing in anArena -
The
gamepackage is loosely coupled with the AlphaZero algorithm and describes a game's behavior (and not what a game is). The behavior is expressed as a set of functions to operate on aStateof the game. A State is an interface that represents the current game state as well as the allowed interactions. The interaction is made by an objectPlayerwho is operating aPlayerMove. The implementer's responsibility is to code the game's rules by creating an object that fulfills the State contract and implements the allowed moves.
This package is designed to be extensible. Therefore you can train AlphaZero on any board game respecting the contract of the game package.
Then, the model can be saved and used as a player.
The steps to train the algorithm are:
- Creating a structure that is fulfilling the
Stateinterface (aka a game). - Creating a configuration for your AZ internal MCTS and NN.
- Creating an
AZstructure based on the game and the configuration - Executing the learning process (by calling the
Learnmethod) - Saving the trained model (by calling the
Savemethod)
The steps to play against the algorithm are:
- Creating an
AZobject - Loading the trained model (by calling the
Readmethod) - Switching the agent to inference mode via the
SwitchToInferencemethod - Get the AI move by calling the
Searchmethod and applying the move to the game manually
Four board games are implemented so far. Each of them is defined as a subpackage of game:
Tic-tac-toe is a m,n,k game where m=n=k=3.
Here is a sample code that trains AlphaGo to play the game. The result is saved in a file example.model
// encodeBoard is a GameEncoder (https://pkg.go.dev/github.com/gorgonia/agogo#GameEncoder) for the tic-tac-toe
func encodeBoard(a game.State) []float32 {
board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
for i := range board {
if board[i] == 0 {
board[i] = 0.001
}
}
playerLayer := make([]float32, len(a.Board()))
next := a.ToMove()
if next == game.Player(game.Black) {
for i := range playerLayer {
playerLayer[i] = 1
}
} else if next == game.Player(game.White) {
// vecf32.Scale(board, -1)
for i := range playerLayer {
playerLayer[i] = -1
}
}
retVal := append(board, playerLayer...)
return retVal
}
func main() {
// Create the configuration of the neural network
conf := agogo.Config{
Name: "Tic Tac Toe",
NNConf: dual.DefaultConf(3, 3, 10),
MCTSConf: mcts.DefaultConfig(3),
UpdateThreshold: 0.52,
}
conf.NNConf.BatchSize = 100
conf.NNConf.Features = 2 // write a better encoding of the board, and increase features (and that allows you to increase K as well)
conf.NNConf.K = 3
conf.NNConf.SharedLayers = 3
conf.MCTSConf = mcts.Config{
PUCT: 1.0,
M: 3,
N: 3,
Timeout: 100 * time.Millisecond,
PassPreference: mcts.DontPreferPass,
Budget: 1000,
DumbPass: true,
RandomCount: 0,
}
conf.Encoder = encodeBoard
// Create a new game
g := mnk.TicTacToe()
// Create the AlphaZero structure
a := agogo.New(g, conf)
// Launch the learning process
err := a.Learn(5, 50, 100, 100) // 5 epochs, 50 episode, 100 NN iters, 100 games.
if err != nil {
log.Println(err)
}
// Save the model
a.Save("example.model")
}func encodeBoard(a game.State) []float32 {
board := agogo.EncodeTwoPlayerBoard(a.Board(), nil)
for i := range board {
if board[i] == 0 {
board[i] = 0.001
}
}
playerLayer := make([]float32, len(a.Board()))
next := a.ToMove()
if next == game.Player(game.Black) {
for i := range playerLayer {
playerLayer[i] = 1
}
} else if next == game.Player(game.White) {
// vecf32.Scale(board, -1)
for i := range playerLayer {
playerLayer[i] = -1
}
}
retVal := append(board, playerLayer...)
return retVal
}
func main() {
conf := agogo.Config{
Name: "Tic Tac Toe",
NNConf: dual.DefaultConf(3, 3, 10),
MCTSConf: mcts.DefaultConfig(3),
}
conf.Encoder = encodeBoard
g := mnk.TicTacToe()
a := agogo.New(g, conf)
a.Load("example.model")
a.A.Player = mnk.Cross
a.B.Player = mnk.Nought
a.B.SwitchToInference(g)
a.A.SwitchToInference(g)
// Put x int the center
stateAfterFirstPlay := g.Apply(game.PlayerMove{
Player: mnk.Cross,
Single: 4,
})
fmt.Println(stateAfterFirstPlay)
// ⎢ · · · ⎥
// ⎢ · X · ⎥
// ⎢ · · · ⎥
// What to do next
move := a.B.Search(stateAfterFirstPlay)
fmt.Println(move)
// 1
g.Apply(game.PlayerMove{
Player: mnk.Nought,
Single: move,
})
fmt.Println(stateAfterFirstPlay)
// ⎢ · O · ⎥
// ⎢ · X · ⎥
// ⎢ · · · ⎥
}A Funny Thing Happened On The Way To Reimplementing AlphaGo - A talk by @chewxy (one of the authors) about this specific implementation
Original implementation credits to