“What Can We Learn from Many Labs Replications? 3. Can Replication Studies Detect Fraud?”, 2019-03-08 ():
Several hundred research groups attempted replications of published effects in so-called Many Labs studies involving thousands of research participants. Given this enormous investment, it seems timely to assess what has been learned and what can be learned from this type of project. My evaluation addresses 4 questions: First, do these replication studies inform us about the replicability of social psychological research? Second, can replications detect fraud? Third, does the failure to replicate a finding indicate that the original result was wrong? Finally, do these replications help to support or disprove any social psychological theories? Although evidence of replication failures resulted in important methodological changes, the 2015 Open Science Collaboration findings sufficed to make the point. To assess the state of social psychology, we have to evaluate theories rather than randomly selected research findings.
… In only 2, and 2 rather unusual, cases was fraud discovered by replication failure. Even the Stapel fraud was revealed by his research students, who had become suspicious of his unusual success in empirically supporting the most daring hypotheses ( et al 2012). With the new rule that data for published research have to be made available, it can be expected that fraud cases will increasingly be detected because of suspicious data patterns.
Another reason for replications being poor fraud detectors is that clever fraudsters, who stick closely to predictions that are plausible in the light of existing literature, have a very good chance that their research will be successfully replicated by their colleagues. (If Stapel had kept to this recipe and not become overconfident in his later research, his fraud might never have been detected.) For example, 2004 published a meta-analysis of priming effects on impression formation supporting a general model of information bias. The literature was very coherent and supportive of their model. The only unexpected finding was that effect sizes of studies conducted in Europe were substantially greater than those of American studies. They attributed this to cultural differences. However, when I checked the authorship of the European studies, it turned out that the majority had been conducted by Stapel, and many of these studies later turned out to be fraudulent ( et al 2012). Thus, in inventing data, Stapel managed to get the priming effects right but overestimated the size of these effects.