For someone who spends his life working with robots, Anton van den Hengel sure is rude about them.
“They are really, very, very stupid,” he says. “They can’t do what a toddler can do. While they may be performing millions of tasks every day, the truth is they are incapable of understanding what needs to be done.
“You still can’t ask one to bring you a spoon.”
Van den Hengel’s lighthearted banter has a serious point and highlights the fundamental problem he grapples with daily as a chief investigator with the Australian Centre for Robotic Vision – developing the capacity of robots to operate autonomously in open environments.
For that they need to ’see‘ the world around them, but as with human sight that is only partly about the eyes and much more about the brain. There is more a robot needs to learn about seeing if it is to be of use to us.
Roboticists trying to build robots spend a lot of time thinking deeply about daily tasks the rest of us never take any notice of. Van den Hengel is no exception.
He’s still talking about robots getting him a spoon – or not as the case many be.
“Just think of the range of things a robot needs to know if it is to do that,” he says.
“It has to first know about a class of items called spoons, then it needs to know that they are likely to be in the kitchen. But that’s just the start of the problem.
“If you or I – or even a toddler – are in a strange house and told to find our way from the lounge room to the kitchen, we start with some advantages.
“We will have some idea of the layout of a typical house and the relations between the rooms that will help us find the kitchen. A robot doesn’t know that.
“Once in the kitchen the robot needs to recognise the typical layout of this sort of room and where in that layout it might find the cutlery drawer.
“Only then can it start to access its database and activate pattern recognition to tell the difference between a knife, a fork and finally, a SPOON!
“Then the complicated business of navigating back to bring you the utensil begins,” Van den Hengel concludes gloomily.
His anecdote highlights the complexity of the field of robotic vision and the many associated disciplines that, through the marriage of engineering with 3D geometry and computer science, are brought together to tackle it.
At its simplest, robotic vision describes how a robot is programmed with an algorithm and fitted with a camera, allowing it to process data gathered in the real world.
The area is closely related to other fields such as machine vision and computer vision and has been greatly enhanced by advances in the study of machine learning, which gives robots the ability to learn as they encounter new things in the world around them.
Once robots can see, and understand their environment using the ‘sense’ of vision, we will be close to Van den Hengel getting his spoon.
But before he does, researchers must tackle a range of challenges that fall into two broad categories; first, the geometric challenges of building a 3D model of the world and second, how to assign semantic meaning to that model so that the robot can interact within its environment.
Van den Hengel’s colleague Vincent Lui, who is based at Monash University, focuses his attentions mainly on the first of these problems. He explains the scale of the task.
“As that map becomes larger, the 3D models become larger – as do the number of things to remember,“ he says. “And as that is happening, the robot has to make decisions in a very short space of time. All of this has to be achieved with the limited amount of computing power it is possible to fit on a moving robot.”
That is why the algorithms used become so important.
“We do that with a combination of algorithms that are smart and break down problems into smaller slices, or we model the problem mathematically in smaller slices,” Lui says.
Despite the challenges, vision systems have some distinct advantages over other forms of orientation for robots. Lui’s colleague Hongdong Li from ANU explains: “When developing vision-based systems we use cameras that cost a few hundred dollars in contrast to systems like the Google car that uses very expensive LIDAR to position itself.”
While there are still only limited examples of reliable robot vision systems guiding robots in open environments, the Centre’s researchers expect that to change rapidly over the next five to 10 years.
“Machine learning has a huge role to play in robotic vision,” says Centre researcher Ian Reid. “And it has only been four or five years that neural networking has been a factor. As a result, computer vision has become much better at seeing the world and converting that to data and geometric information.”
While he acknowledges there are significant differences between the closed environment of the lab or the factory floor and the open environment of the wider world, they are getting there. Already robots are poised to take over a range of functions from people, ranging from medical procedures and surgery to infrastructure inspection, from dealing with weeds in crops to picking fruit when labour is in short supply.
For demanding futurists like Van den Hengel, the revolution cannot come quickly enough.
“In the interaction with computers, humans have always been the ones to change. We are adaptable and put up with it,” he says. “Every time you push a button is a failure. But it’s time we did away with these idiotic systems that don’t learn.
“We are done with easy AI [artificial intelligence] in a simplified reality, like learning to play chess. We’ve made huge steps and now is the time for Hard AI – functioning intelligently out in the real world!”
This article was first published by the Australian Centre for Robotic Vision.