Previously thought too complicated even for algorithms, an artificial intelligence (AI) agent has been developed that can win at online multiplayer games.
Using nothing but the same pixel-based point of view and knowledge of the game-state as human players, scientists led by Max Jaderberg at the Google-owned DeepMind research company generated AI agents to play a variant of the popular game Quake III Arena, called Capture the Flag, which pits two teams against each other in randomly generated environments, finding and taking enemy flags from across the map.
The team built the agents using reinforcement learning techniques across parallel gameplay, and after 450,000 games the bots were able to beat professional human players – no small feat in such a complicated environment with so many variables.
One of the three machine-learning paradigms, along with supervised and unsupervised learning, reinforcement learning doesn’t use definitive input-output pairs, and doesn’t call for the correction or erasure of actions that aren’t perfect.
Instead, it balances exploration of an unknown domain and exploitation of any knowledge collected about it – perfect for the endlessly changing conditions among a large number of agents, such as those present in an online multiplayer game.
The aim of the DeepMind study was agents that truly self-learned when starting with the same information a human player would have. That meant no policy knowledge and no ability to communicate and share notes outside the game, whereas previous iterations of similar work gave the software models of the environment or the state of other players.
The learning process is optimised by letting agents loose in huge numbers of games at once, clubbing the results together for a top-down view of the tips and tricks each agent has picked up and then distributing that knowledge among the next generation.
Like a human player, they glean experience about strategy that is then applicable to a new map, even though they know nothing about its layout and topology, or the intent or position of other players.
In such circumstances, Jaderberg and colleagues write, “the outcome is sufficiently uncertain to provide a meaningful learning signal”.
The reinforcement learning workflow was a two-step process, where optimising a single agent’s behaviour for rewards is then matched with the “hyper-parameters” of the whole dataset. Underperforming agents are replaced with mutated offspring that internalise the lessons learned from across the board – a practice also called “population-based training”.
The results were remarkable. Even when the system slowed the reaction times of the agents down to average human levels, they still matched and exceeded human performance. After hours of practice, human gamers weren’t able to beat them in any more than 25% of attempts, and more interestingly still, the AI agents discovered and employed winning tactics that were commonly used by human players.
But the secret sauce might be in the parallel, multi-game methodology. Similar self-learning systems have AI agents test what they’ve learned against their own policies in a single exercise – they literally play against themselves.
But while bots that are great at Quake III Arena might be cool, the researchers note that it is the scalability of the approach that offers exciting applications across multi-agent systems where stable learning is needed.
The research is published in the journal Science.
Related reading: How does AI think?
Read science facts, not fiction...
There’s never been a more important time to explain the facts, cherish evidence-based knowledge and to showcase the latest scientific, technological and engineering breakthroughs. Cosmos is published by The Royal Institution of Australia, a charity dedicated to connecting people with the world of science. Financial contributions, however big or small, help us provide access to trusted science information at a time when the world needs it most. Please support us by making a donation or purchasing a subscription today.