Time for a Reproducibility Index

What’s behind paper retractions? (17)
by Adam Marcus and Ivan Oransky, Labtimes 04/2013

Photo: zettberlin / Source: PHOTOCASE

Hold journals accountable for their vaunted peer review, not just citations.

In February 2012, a team of scientists at Case Western Reserve University in Cleveland published a dramatic paper in Science (335 (6075): 1503-06): A drug already approved for the treatment of skin cancer appeared to reverse changes in the brain linked to Alzheimer’s disease, or at least the mouse equivalent. Even more impressive, some animals began to behave as if they’d never had the illness.

Although the drug hadn’t yet been tested for Alzheimer’s in humans, reports (e.g. N Engl J Med, 367, 488-90) began to surface of clinicians prescribing the drug, bexarotene, to patients with the dreaded, degenerative disease.

Fluke reporting

Two of the researchers launched a company, ReXceptor Therapeutics, to commercialise their work, while experts waxed amazement. One, Kenneth Kosik, a prominent neuroscientist at the University of California, Santa Barbara (who also happens to hold two degrees from Case Western Reserve) told Scientific American, “The effects in mice, including some restoration of cognitive abilities, are dramatic.”

If only!

In late May of this year, researchers at the University of Chicago and elsewhere announced, in several Science papers, that they were unable to replicate the Cleveland team’s findings, suggesting that the initial report was likely to be a fluke.

Dirty secret of publishing

All this would be remarkable … except it’s not. A dirty secret of scientific publishing is that many published discoveries turn out to be ephemera, statistical starbursts that fade upon further examination.

By this point, Lab Times readers are probably familiar with the work of John Ioannidis, who has shown quite convincingly, as one of the titles of his essays reads, that “most published findings are false”. And you might also recall that Amgen’s Glenn Begley and MD Anderson Cancer Center’s Lee Ellis reported last year in Nature (483: 531-3) that they couldn’t replicate most of the preclinical cancer studies they examined. A report from Bayer had similar results.

And in a recent PLoS ONE paper (8(5): e63221), a group of researchers at MD Anderson, including Ellis, reported that half of their colleagues said they had been unable to replicate at least one published study.

Leonard Zwelling, the MD Anderson doctor who led the work, says the results are the predicable outcome of today’s scientific incentives, “There’s such a de-emphasis on real quality and an emphasis on quantity,” he told us for a post on Retraction Watch.

So, how can the system change for the better? In our work on Retraction Watch, we have found that readers tend to trust journals more when they are transparent in the way they handle retractions, corrections and other issues with the papers they publish. With this in mind, we have called for a “Transparency Index” to supplement the standard “Impact Factor” – a gauge of how often papers are cited by other papers, which journals use to create a hierarchy of prestige (The Scientist, 26(8): 24).

Standing the test of science

The Transparency Index won’t solve the problem of bad data but we think another metric might help substantially: the Reproducibility Index. Rather than rate journals on how often their articles are cited by other researchers, let’s grade them on how well those papers stand the most important test of science: namely, does the work stand up to scrutiny?

The idea is to encourage “slow science” and careful peer review, whilst discouraging journals from publishing papers based on flimsy results that are likely to be heavily cited. Like the Transparency Index, the Reproducibility Index could supplement the impact factor. In fact, one way to judge average reproducibility would be to calculate what percentage of citations for a given paper shows replication versus inability to reproduce results.

Transparent and reproducible

The Reproducibility Index might also take into account how often journals are willing to publish replications and negative findings, as Science admirably did in the bexarotene case – or findings of the sort that demonstrate the lack of reproducibility of others’ work. Top journals often decline to publish such findings. They give a number of reasons for this phenomenon but a cynic might count a hit to their impact factors as one. So a higher Reproducibility Index might give journals an incentive to shift the mix.

There are, of course, a lot of details to work out and we look forward to help on doing that from readers of Lab Times and Retraction Watch. Isn’t the reproducibility of science worth it?

(The authors run the blog Retraction Watch: http://retractionwatch.com)

Last Changed: 04.07.2013