AI conquers multi-player no-limit poker

A new artificial intelligence (AI) system dubbed Pluribus looks set to simultaneously delight computer scientists and terrify professional gamblers around the world.

Revealed in a paper published in the journal Science, Pluribus is a self-learning system that can tackle six-player no-limit Texas hold’em poker and beat all-comers – even professional players.

The achievement, report inventors Noam Brown and Tuomas Sandholm from Carnegie Mellon University in the US, represents a significant milestone in AI development. 

Recent research has produced systems capable of self-learning – and mastering to a level uncomfortably described in the jargon of the field as “superhuman” – the boardgame Go and the online video games Dota 2 and StarCraft.

The crucial difference between these pastimes and the card game favoured by Pluribus, however, is that the former were all constructed as two-player exercises. The same goes for other games mastered by AI – albeit trained, rather than self-taught – including chess and checkers. 

The distinction, explain Brown and Sandholm, is more than simply a matter of numbers.

Two player games are by definition “zero sum” exercises – whatever one player loses, the other one wins. And that means they can be mastered by discovering a mathematical sweet spot known as a the “Nash equilibrium”.

“A Nash equilibrium,” the authors explain, “is a list of strategies, one for each player, in which no player can improve by deviating to a different strategy.”

Finding the Nash equilibrium for any given two-player game is comparatively easy. The authors use the example of rock-paper-scissors. By deploying the three states with equal probability, neither player, after multiple games, can win or lose by any significant margin. Both are adhering to the Nash equilibrium.

As soon as one player shifts strategy, however, for instance deploying only paper for a sustained period, the other player will win.

Nash equilibria theoretically exist for games involving more than two players, but they are much, much harder to identify, and thus have little or no practical use. In addition, multi-player games such as poker rely intrinsically on hidden information – card values known only to each individual player.

This was one of the prime reasons that Brown and Sandholm used a self-learning approach for Pluribus. By playing thousands of times against earlier iterations of itself, the system was able to develop strategies based on pure probability, free from the influence of deeply ingrained poker habit and tradition.

“Pluribus disagrees with the folk wisdom that ‘donk betting’ (starting a round by betting when one ended the previous betting round with a call) is a mistake,” the authors write. “Pluribus does this far more often than professional humans do.”

The result is that the system compiles a “blueprint” for poker games, which is essentially a list of possible strategies based on cards dealt. However, the blueprint is only used for the first hand in any game, then adjusted in real time for each subsequent hand.

One of the reasons for this is that the AI system retains the use of algorithms that in two-player situations converge on Nash equilibria – even though in multi-player situations such a result is practically impossible.

Another design strategy also adds to the mix. Pluribus reduces complications by “bucketing” similar hands – for instance, a nine-high straight and a 10-high straight – and treating them as identical. A similar approach covers betting, with the system choosing one of 14 sums to wager at any point, rather than having the traditional human poker choice of a range between $100 and $10,000.

Nevertheless, the authors point out, such abstractions are only ever used when Pluribus is estimating the probabilities of future hands. To do so in real time would be a rookie move – and Pluribus is no rookie.

“Information abstraction drastically reduces the complexity of the game but may wash away subtle differences that are important for superhuman performance,” they write.

“Therefore, during actual play against humans, Pluribus uses information abstraction only to reason about situations on future betting rounds, never the betting round it is actually in.”

Thus far, the system’s playing challenges against real pros have been limited to four-round practice games online, so the fortunes of Las Vegas casino owners are as yet unthreatened.

However, it may not be too long before the gambling industry feels the chill winds of technological change.

“Pluribus’s success shows that despite the lack of known strong theoretical guarantees on performance in multiplayer games, there are large-scale, complex multiplayer imperfect-information settings in which a carefully constructed self-play-with-search algorithm can produce superhuman strategies,” the authors conclude.

Please login to favourite this article.