“Do Multiple Experimenters Improve the Reproducibility of Animal Studies?”, Vanessa Tabea von Kortzfleisch, Oliver Ambrée, Natasha A. Karp, Neele Meyer, Janja Novak, Rupert Palme, Marianna Rosso, Chadi Touma, Hanno Würbel, Sylvia Kaiser, Norbert Sachser, S. Helene Richter2022-05-05 (, ; backlinks)⁠:

[von Kortzfleisch et al 2022] The credibility of scientific research has been seriously questioned by the widely claimed reproducibility crisis. In light of this crisis, there is a growing awareness that the rigorous standardisation of experimental conditions may contribute to poor reproducibility of animal studies. Instead, systematic heterogenization has been proposed as a tool to enhance reproducibility, but a real-life test across multiple independent laboratories is still pending.

The aim of this study was therefore to test whether heterogenization of experimental conditions by using multiple experimenters improves the reproducibility of research findings compared to standardized conditions with only one experimenter.

To this end, we replicated the same animal experiment in 3 independent laboratories, each employing both a heterogenized and a standardized design. Whereas in the standardized design, all animals were tested by a single experimenter; in the heterogenized design, 3 different experimenters were involved in testing the animals.

In contrast to our expectation, the inclusion of multiple experimenters in the heterogenized design did not improve the reproducibility of the results across the 3 laboratories. Interestingly, however, a variance component analysis indicated that the variation introduced by the different experimenters was not as high as the variation introduced by the laboratories, probably explaining why this heterogenization strategy did not bring the anticipated success. Even more interestingly, for the majority of outcome measures, the remaining residual variation was identified as an important source of variance accounting for 41% (CI95 [34%, 49%]) to 72% (CI95 [58%, 88%]) of the observed total variance.

Figure 4: Proportion of variance explained by each factor. For each outcome measure, the total variance of the full dataset could be decomposed into the following sources using an LMM: between-strain variability (yellow), between-laboratory variability (blue), between-experimenter variability (red), strain-by-laboratory interaction variability (dark blue), strain-by-experimenter interaction variability (orange), between-block variability (dark green), strain-by-block interaction variability (light green), between-cage variability (beige), and between-individual variability (residuals, grey). Shown are point estimates of the proportion of variation explained by each factor. For details on 95% confidence intervals of these estimates, see S7 Table. The raw data underlying this figure are available in the Figshare repository. Abbreviations: ‘DL’, Dark Light; ‘EPM’, Elevated Plus Maze; ‘FCMs’, faecal corticosterone metabolites; ‘LMM’, linear mixed model; ‘NC’, Novel Cage; ‘NT’, Nest; ‘OF’, Open Field.

Despite some uncertainty surrounding the estimated numbers, these findings argue for systematically including biological variation rather than eliminating it in animal studies and call for future research on effective improvement strategies.