In 2015, Google’s DeepMind AI was tasked with learning to play Atari video games. It was quite successful too, becoming as good at Video Pinball as a human player.
But beyond the simple arcade games it began to struggle, notoriously failing to even collect the first key in the legendary 1980s adventure game Montezuma’s Revenge, due to the game’s complexity.
However, a new approach has resulted in an AI algorithm that learnt from mistakes and identified intermediate steps 10 times faster, succeeding where Google failed and successfully autonomously playing Montezuma’s Revenge.
The work was carried out by Fabio Zambetta and his team from RMIT University in Melbourne, Australia. Zambetta presents the findings at the 33rd AAAI Conference on Artificial Intelligence in Hawaii on 1 February.
Designing AI that can overcome planning problems, such as when rewards are not immediately obvious, is one of the most important challenges in advancing the field.
The reason AI struggles in adventure games is that prior to discovering some reward it sees no incentive to choose one course of action over any other, such as achieving sub-goals like climbing a ladder or jumping over a pit on its way to the larger objective of the level.
Confused and unable to work out a path forward, it instead simply begins acting at random.
For some games, such as pinball, the rewards are nearby and the algorithm gets the external input it needs.
However, in an adventure game, where rewards are more spread out, a chicken-and-egg situation develops. The program finds itself unable to improve its gameplay until it gets some reward, but won’t find a reward until it improves its gameplay.
To overcome this, Zambetta took inspiration from other computer games such as Super Mario and Pacman and introduced pellet rewards, which provided small intermediate rewards and encouraged it to explore and complete sub-goals.
“Truly intelligent AI needs to be able to learn to complete tasks autonomously in ambiguous environments,” he says.
“We’ve shown that the right kind of algorithms can improve results using a smarter approach rather than purely brute forcing a problem end-to-end on very powerful computers.”
This approach meant that the algorithm would act more naturally, and complete the sub-goals up to 10 times faster than other AI approaches.
“Not only did our algorithms autonomously identify relevant tasks roughly 10 times faster than Google DeepMind while playing Montezuma’s Revenge, they also exhibited relatively human-like behaviour while doing so,” Zambetta claims.
“For example, before you can get to the second screen of the game you need to identify sub-tasks such as climbing ladders, jumping over an enemy and then finally picking up a key, roughly in that order.
“This would eventually happen randomly after a huge amount of time but to happen so naturally in our testing shows some sort of intent.
“This makes ours the first fully autonomous sub-goal-oriented agent to be truly competitive with state-of-the-art agents on these games.”
While it sounds trivial, the work could be important outside of gaming. According to Zambetta, incentivising sub-goals could be beneficial for algorithms controlling autonomous cars and other situations where robotic assistants are required to achieve goals in the real world.