Book: Statistical flaws in the age of Big Data
A new book examining how statistics can be manipulated delivers on its promise to provide simple guidelines for recognising bull when you see it. Reviewed by Bill Condie
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Gary Smith Duckworth (2014)
Long the abused plaything of politicians, statistics are also routinely pressed into service by everyone from property developers to stockbrokers.
They have governed society’s development in health, town planning and law since at least the Second World War. Today, in the age of Big Data and fast computers, statistics and their manipulation control every aspect of our lives in a way they never have before.
The ability to sift mountains of data in the blink of an eye means we can find the “answer” to every ill in society. But is that answer always right?
Yale-educated economist Gary Smith replies with an emphatic “no”. It all depends on what data we use and whether we read it properly. Among the erroneous claims by serious statisticians Smith lists: messy rooms make people racist; drinking two cups of coffee a day substantially increases the risk of pancreatic cancer; athletes who appear on the cover of Sports Illustrated are more likely to be injured, and Asian Americans are more susceptible to heart attacks on the fourth day of the month.
“Sometimes the unscrupulous deliberately try to mislead us. Other times, the well-intentioned are blissfully unaware of the mischief they are committing,” he writes.
A wonderful primer for those of us determined not to be
bamboozled by the cheats or misled by the careless.
He blames, in part, the technology revolution. Decades ago, when data were scarce and computers non-existent, researchers worked much harder to gather good data and make painstaking calculations based on it. But, he argues, these days data are so plentiful and easy to come by that too little time is spent winnowing out the good from the rubbish.
As Nobel Prize winning economist Ronald Coase observed: “If you torture data long enough, it will confess.”
Smith’s book is a wonderful primer for those of us determined not to be bamboozled by the cheats or misled by the careless.
With an amusing, accessible style, he guides us through the obvious and not-so-obvious traps: a coin is just as likely to be heads as tails on the eleventh toss, even if it has turned up 10 heads in a row; your plane is no more likely to crash whether you are a frequent flyer or a first-timer; and it’s easy to declare yourself a sharpshooter if you spray bullets at a barn wall and then draw a bullseye around the holes in the tightest grouping.
Liberally illustrated, the book points out some blatant sleights of hand. There is the graph published in the New York Times of US household income data stretching back to 1965, that shows a dramatic acceleration between 1980 and 1990 in the number of households earning more than $100,000 a year. The graph was drawn to back up an assertion by neoconservative David Frum that the policies he favoured were working. Just one problem with that. In a graph containing five bars, the intervals between the first four were five years, while the time period between the fourth and fifth (1980 and 1990) was 10. If a 1985 bar had been inserted, the rise would have been gradual, not an abrupt jump.
Then there are the less-obvious traps, like the potential for percentage change to mislead. One year, the tiny Massachusetts town of Wellfleet was shocked to learn that it had the highest murder rate in the state – 40 per 100,000 of population compared with 17 per 100,000 in Boston, the state’s capital. No one could remember a single killing in the town. However, a man accused of murdering someone 20 miles away had turned himself in at the Wellfleet police station. This one recorded case in a population of 2,491 had warped the statistics.
Smith’s book delivers on his promise that “you will learn simple guidelines for recognising bull when you see it – or say it. Not only do others use data to fool us, we often fool ourselves.”
It does, however, leave one unanswered question. As statistics is such a vexed but important area of mathematics, and one with such impact on everyday life, why is the subject not taught more rigorously in every high school in the country? Of course, that would mean the politicians who guide the curriculum would, in time, face a much better-educated electorate, well-equipped to challenge their assertions.
Could the two possibly be correlated?