Your autonomous car is hurtling down a country road when a tractor pulls out of a farm gate in front of you. Are you sure your car has seen it? How confident are you it will know what to do to avoid a collision?
Our lives will depend on robots not only knowing where they are but how to react to what they see around them. It is, perhaps, the biggest challenge in the whole field of robotic vision.
A whole new world opens up to challenge a robot from the minute it steps out of the lab or off the factory floor into a space without the certainties that a fixed location on an assembly line provides.
Before it can exercise any autonomy, a robot must first know exactly where it is in a complex and changing environment.
How it achieves that is just one of the key challenges being addressed at the Australian Centre for Robotic Vision, headquartered at the Queensland University of Technology (QUT). And while many systems of orientation are being trialled elsewhere, the Centre’s approach is to use vision.
“Part of the compelling reason for using cameras on robots – for tasks like autonomous driving, for instance – is because humans use sight,” says Ian Reid, the Centre’s deputy director and one of its leading researchers.
“We have engineered the world to take advantage of that sensing capability.”
Reid is discussing the advantages of using vision-based sensing systems for robot navigation in preference to set-ups involving laser-based light detection and ranging (LIDAR).
Cameras and LIDARs deliver very different types of data into the front end of robotic systems. Both have their uses, but robots receiving images through cameras use the same type of positioning and reference data as their human designers. Essentially, they are ‘seeing’ the world as people do.
“The camera will tell you about really useful photonic information out there in the world,” Reid explains.
“For instance, a LIDAR would struggle to read texture or writing on an object. It’s possible, but it’s not really how most people use it and it’s certainly not what LIDAR is designed for – it’s designed for making 3D measurements in the world.
“A camera tells you indirectly about 3D measurements in the world, but it also tells you directly about writing, lettering, texture, and colour. These are all very useful bits of information that tell you not just about the geometry of an object but also about its role in a scene.”
Static floor-based robots of the type typically seen in factory production lines do not need sophisticated location and mapping capabilities. They are bolted to the ground, usually inside a cubicle. Neither their position nor environment ever changes.
As soon as robots are let loose to travel around a factory, mapping and navigation become much more complex. A factory, however, is still essentially a closed box, although with additional unpredictable data elements such as people and things moving within it.
Mapping and navigating a truly open system – the congested road network of a major city, for instance – represents another order of magnitude of complexity again.
But both scenarios present the same baseline problem for developers.
“How do you best deploy your limited computational and communications resources so you do the best job?” Reid asks. “There’s a lot of computing power required by some of these things. One approach is to be very careful with your algorithms, and to try to develop smarter algorithms.
“Because what you’ve got in any vision problem is a huge amount of very redundant data.”
It’s a situation familiar to one of Reid’s colleagues, Vincent Lui, a research fellow with the ARC Centre of Excellence in Robotic Vision, a collaboration between several institutions including QUT, the Australian National University and Imperial College, London.
Lui and his supervisor, Centre chief investigator Tom Drummond, seek to refine and improve simultaneous localisation and mapping (SLAM). This is the computational process whereby a robot must construct a map of an unknown environment while at the same time keeping track of its position within it.
Mapping large areas, in which both the robot and the environment are moving, means dealing with an ever-increasing number of parameters, Lui says. The information coming in through the robot’s sensors is typically fuzzy and noisy, because even closed worlds are dirty and loud places. This requires ever more detailed corrections if the resulting internal map is to be accurate.
“That means that the costs associated with doing the optimisations become ever higher,” Lui explains.
“For a robot, having to be able to work in real time, such that it can respond in a short and sensible amount of time, is just critical. And so the real challenge is how do you keep all of these things computationally efficient? It’s not just about responsiveness, it’s also critical when thinking about power consumption.