How do game-playing programs choose the best move?


In games where there are more possible moves than atoms in the universe, programs such as Google DeepMind's AlphaGo can use the simple but effective Monte Carlo tree search to narrow their options. Angus Bezzina explains how it works.


Go is a game of 'perfect information' – ideal for game-playing programs to tackle. But with so many potential board configurations, a machine can't churn through the lot, so it uses the Monte Carlo tree search.
ED JONES / AFP / Getty Images

AlphaGo – the game-playing program that beat 18-time world champion Lee Sedol at the ancient board game Go in March this year – can be harnessed to crunch problems outside the games room, according to engineers in China and the US and published in IEEE/CAA Journal Of Automatica Sinica.

A major part of its power is what’s known as the Monte Carlo tree search.

Although not artificial intelligence in the vein of science fiction, it is an effective form of narrow artificial intelligence, meaning it is able to “think” on its own and make the best decision within a set of constraints.

This makes it ideal for “perfect information” games – those with a set number of possible outcomes, and each opponent can see all moves past and present as the game progresses.

So how does Monte Carlo determine the best move in a game?

Think of the possible states of a game being drawn up in an imaginary tree-like structure. At the base of this tree is the trunk, or the blank board. From it branch all potential first moves. Each of these moves has its own branches depicting the next possible move, and so on.

In a game where the number of possible moves that a player can make at each turn is small, such as noughts and crosses, the entire tree can be searched pretty quickly.

For a game with hundreds of possible moves per turn, the tree that an algorithm has to search through is massive.

In the case of Go, the aim of the game is straightforward: secure territory. Players place black and white stones on a 19 x 19 grid. When a stone is surrounded by the opponent's stones, it is taken off the board. Once a player has covered 50% of the board, they win.

But this means there are more Go board configurations than atoms in the universe.

Even with current processing power, a machine would take far too long to search every possible move – a competitive Go match is limited for around four hours.

Monte Carlo was developed as a better way to sort through the tree. (Not by AlphaGo’s developers at Google DeepMind – Monte Carlo methods have been used since Polish-American mathematician Stanislaw Ulam invented a computational technique in the 1940s while working on nuclear weapons.)

“It essentially samples parts of the search space [or tree] and uses the results it receives as a guide,” explains Toby Walsh, an artificial intelligence researcher at government research unit Data61 and the University of New South Wales in Australia.

“If one part of the search space seems to have many more promising nodes [or branches] in it than another then that’s going to bias Monte Carlo to make more moves in that direction.”

But there is another important component of this process. How does Monte Carlo determine if it is winning or not, especially in a game that is difficult to assess midway through, such as Go?

Walsh explains that Monte Carlo is able to do this by looking at random potential endings from the current state of the game.

If one player wins in more of those outcomes than the other, then they are considered to be winning at the time.

'Depending on your appetite for risk you may miss some very rare but significant effects'

Monte Carlo’s ability to sift through a large set of possible scenarios means that it has a number of potential applications beyond gaming too.

“Because you can essentially apply this algorithm to any game, you can even apply it to games against nature and that game might be running your business,” Walsh says.

For instance, Monte Carlo could be useful running a supply chain because it is able to quickly analyse different factors, such as competitor decisions and forces of nature, to come up with a business strategy.

Walsh cautions that due to the random sampling Monte Carlo employs, it does not account for every single option within a tree of potential scenarios.

“Depending on your appetite for risk you may miss some very rare but significant effects,” he says.

“The consequences of a small probability may be very large and since Monte Carlo samples very randomly you may not see that, especially when it comes to real-world scenarios.”

Still, while many other strategies are currently used in developing artificial intelligence, Monte Carlo remains a great tool. It’s also implemented in conjunction with other systems such as deep learning – multilayered artificial neural networks – that AlphaGo used to teach itself how to play.

  1. https://cosmosmagazine.com/technology/world-go-champ-beaten-twice-ai-can-he-redeem-himself
  2. http://ieeexplore.ieee.org/stamp/stamp.jsp?reload=true&arnumber=7471613
Latest Stories
MoreMore Articles