Don’t repeat experiments, flip coins

April 17, 2019

Stephen Fleischfresser

Stephen Fleischfresser is a lecturer at the University of Melbourne's Trinity College and holds a PhD in the History and Philosophy of Science.

Science is in the midst of a so called “replication crisis”, arising from the uncomfortable fact that the results of hundreds, perhaps thousands, of published experiments have been unable to be replicated by other researchers, or even the original scientists themselves.

The issue particularly affects social sciences, such as psychology, but is being increasingly identified in several other fields, such as evolutionary biology.

Now, new research shows that the probability of precisely repeated experiments confirming initial results is, in some cases, about equal to that of tossing a coin.

Even successful replication, it seems, is a game of chance.

The ability to reproduce experiments is a cornerstone of the scientific method: one of the earliest scholarly societies of the Scientific Revolution, the Academia del Cimento of the Medici court in Florence, Italy, had as its motto Provando e riprovando, meaning “testing and retesting” or “experimenting and confirming”.

The replication crisis has seen successful replication rates fall below 50% and reawakened many to the need to test and retest their experiments. But just how much this adds to the evidential base is open to question, as a new paper in the journal PLOS Biology explores.

Sophie Piper and colleagues from the Berlin Institute of Health (BIH) and the Charite´ Universitätsmedizin in Berlin, Germany, have conducted a deliberately provocative replication of their own work.

Their initial experiments showed that valproic acid (VPA) could reduce the amount of brain tissue killed during a stroke in mice. However, their results came down just on the happy side of being statistically significant, with what’s known as a “p value” of 0.047. In experimental science, anything with a p value of less than 0.05 is considered significant, but the team’s results came perilously close to the cut off.

As a result, they had planned to replicate the experiment. However, some calculations demonstrated that doing so, using the same sample size and conditions, would only have 52% chance of detecting the effect of VPA, assuming that the effect is real. This, they reasoned, is about the same as tossing a coin.

So, instead a sending another 20 mice to early graves and consuming valuable time and money, the team decided to actually toss a coin to confirm their initial results.

It came up heads. VPA could help protect the brain from stroke damage.

The findings were real.

Rather than being crackpot advocates for an insane new method of replication studies, the authors are hoping to provoke a discussion within the scientific community about the usefulness of “exact replication” experiments.

“The absurd but true notion,” they write, “that a coin flip provides approximately the same positive predictive value as an exact replication experiment when the initial effect is barely significant highlights an important, but little known, limitation of exact replications.”

Instead, they advocate that replication studies increase sample sizes and broaden their horizons via what they call “conceptual replication” – that is, by adding meaningful alterations to the initial experiment to help generalise the result.

While exact replications can help to ferret out technical mistakes, the authors suggest a broadening of the experimental and statistical repertoire is needed to genuinely add to the evidential basis of scientific hypotheses.

“Replication is a fundament of the scientific process,” they write. “We can learn from successful and from failed replication – but only if we design, perform, and report them properly.”

Don’t repeat experiments, flip coins

Stephen Fleischfresser

Postgrad pathways at QUT: Paving the way for tomorrow's science and technology leaders

Infertility of mice and men

COVID Booster

A dozen stories of note in 2020