Computers are turning the tables on people
Recognising complex images was once a preserve of human superiority. No longer, writes Alan Finkel.
All my life I have reassured myself that although computers are faster than you and me, never make mistakes in arithmetic and have perfect memories, they nevertheless are a few cards short of a deck because they are hopeless at recognising complex images.
But the tables are turning. This became apparent to me when I heard that malicious computer viruses can read those irritating security tests known as CAPTCHAs that web sites use to defend themselves. The acronym is a fair description of why they exist: Completely Automated Public Turing test to tell Computers and Humans Apart. Until this year it was fair to say that if you could read this CAPTCHA you were a human, and if you couldn’t you were not.
In case you are doubting your biological origins, I can tell you that this CAPTCHA comprises the letters s m w m.
Step back from this deliberately difficult CAPTCHA text for a moment and think about naturally occurring reading challenges. In April this year, Ian Goodfellow and colleagues from Google announced that they had used neural networks (see p17) to recognise street numbers in Google Street View photographs with 96% accuracy. That’s stunningly good, especially given the graininess of some of the images and the seemingly deliberate attempt by many householders to make their numbers obscure. They then applied their neural networks to reCAPTCHA tests, a version of CAPTCHA that adds swirls and lines to the distorted text to make it even harder to read.
I consider that I am doing well if I get these right every second time. But the Google neural network gets them right 99.8% of the time when applied to the hardest category.
Maybe it’s time to reverse the test? The user who gets it right is probably a computer. We humans will now be identified by our failings.
If computers are doing so well at reading CAPTCHAs, how are they doing at recognising faces?
Kindly programs will whisper the names of people you meet at a reunion.
Face recognition reached a milestone this year when physicists Chaochao Lu and Xiaoou Tang published on the arXiv website their identification algorithm called GaussianFace, based on the impressively named “Discriminative Gaussian Process Latent Variable Model”. They trained their algorithm with a number of face data sets, then took it out for a test run on a large new set of faces. It contained 13,000 images of 6,000 public figures, each represented at least twice with different hairstyles, lighting and facial expressions – differences that until now have confused machines in a task that we humans do very well.
Indeed, human volunteers matched the faces with 97.53% accuracy. Lo and behold the GaussianFace algorithm achieved 98.53% accuracy!
For human supremacists, the news is not all bad – yet. So far, computers still struggle with voice recognition. If you speak clearly, Siri on your iPhone or Google Now on your Android phone will understand your commands, but try the simple commands in a noisy environment and your friendly devices fall apart. Nevertheless, in only a few years voice recognition technology has rocketed from an infuriatingly unreliable way to book cinema tickets to a pocket assistant that under optimal conditions does your bidding.
For the security experts, computer viruses that can crack the CAPTCHA test are not too much of a threat. The game will simply move to the next level, relying on secret personal information, confirmatory text messages to your phone, multiple CAPTHCAs, finger print biometrics or other novel defences.
But the vastly improved capabilities of image and voice recognition programs will increasingly threaten our privacy as advanced face, voice, text and number recognition algorithms combine to make it easy for authorities to continuously track our location and activities.
On the up side, you can look forward to kindly programs that will whisper into your ear the names of everybody you meet as you navigate the floor of a school reunion – and you will be able to verbally instruct your oven, car and every other gadget in your life without fear of being misinterpreted.