“A Long Journey to Reproducible Results: Replicating Our Work Took Four Years and 100,000 Worms but Brought Surprising Discoveries”, 2017-08-22 (; similar):
About 15 years ago, one of us (G.J.L.) got an uncomfortable phone call from a colleague and collaborator. After nearly a year of frustrating experiments, this colleague was about to publish a paper1 chronicling his team’s inability to reproduce the results of our high-profile paper2 in a mainstream journal. Our study was the first to show clearly that a drug-like molecule could extend an animal’s lifespan. We had found over and over again that the treatment lengthened the life of a roundworm by as much as 67%. Numerous phone calls and e-mails failed to identify why this apparently simple experiment produced different results between the labs. Then another lab failed to replicate our study. Despite more experiments and additional publications, we couldn’t work out why the labs were getting different lifespan results. To this day, we still don’t know. A few years later, the same scenario played out with different compounds in other labs…In another, now-famous example, two cancer labs spent more than a year trying to understand inconsistencies6. It took scientists working side by side on the same tumour biopsy to reveal that small differences in how they isolated cells—vigorous stirring versus prolonged gentle rocking—produced different results. Subtle tinkering has long been important in getting biology experiments to work. Before researchers purchased kits of reagents for common experiments, it wasn’t unheard of for a team to cart distilled water from one institution when it moved to another. Lab members would spend months tweaking conditions until experiments with the new institution’s water worked as well as before. Sources of variation include the quality and purity of reagents, daily fluctuations in microenvironment and the idiosyncratic techniques of investigators7. With so many ways of getting it wrong, perhaps we should be surprised at how often experimental findings are reproducible.
…Nonetheless, scores of publications continued to appear with claims about compounds that slow ageing. There was little effort at replication. In 2013, the three of us were charged with that unglamorous task…Our first task, to develop a protocol, seemed straightforward.
But subtle disparities were endless. In one particularly painful teleconference, we spent an hour debating the proper procedure for picking up worms and placing them on new agar plates. Some batches of worms lived a full day longer with gentler technicians. Because a worm’s lifespan is only about 20 days, this is a big deal. Hundreds of e-mails and many teleconferences later, we converged on a technique but still had a stupendous three-day difference in lifespan between labs. The problem, it turned out, was notation—one lab determined age on the basis of when an egg hatched, others on when it was laid. We decided to buy shared batches of reagents from the start. Coordination was a nightmare; we arranged with suppliers to give us the same lot numbers and elected to change lots at the same time. We grew worms and their food from a common stock and had strict rules for handling. We established protocols that included precise positions of flasks in autoclave runs. We purchased worm incubators at the same time, from the same vendor. We also needed to cope with a large amount of data going from each lab to a single database. We wrote an iPad app so that measurements were entered directly into the system and not jotted on paper to be entered later. The app prompted us to include full descriptors for each plate of worms, and ensured that data and metadata for each experiment were proofread (the strain names MY16 and my16 are not the same). This simple technology removed small recording errors that could disproportionately affect statistical analyses.
Once this system was in place, variability between labs decreased. After more than a year of pilot experiments and discussion of methods in excruciating detail, we almost completely eliminated systematic differences in worm survival across our labs (see ‘Worm wonders’)…Even in a single lab performing apparently identical experiments, we could not eliminate run-to-run differences.
…We have found one compound that lengthens lifespan across all strains and species. Most do so in only two or three strains, and often show detrimental effects in others.