The number of manually replicated studies falls well below the abundance of important studies that the scientific community would like to see replicated.
We created a word2vectext-based machine learning model to estimate the replication likelihood for more than 14,000 published articles in 6 subfields of Psychology since 2000 [Clinical Psychology, Cognitive Psychology, Developmental Psychology, Organizational Psychology, Personality Psychology, and Social Psychology]. Additionally, we investigated how replicability varies with respect to different research methods, authors ’productivity, citation impact, and institutional prestige, and a paper’s citation growth and social media coverage.
Our findings help establish large-scale empirical patterns on which to prioritize manual replications and advance replication research.
Conjecture about the weak replicability in social sciences has made scholars eager to quantify the scale and scope of replication failure for a discipline. Yet small-scale manual replication methods alone are ill-suited to deal with this big data problem. Here, we conduct a discipline-wide replication census in science. [Can this possibly work…?]
Our sample (n = 14,126 papers) covers nearly all papers published in the 6 top-tier Psychology journals over the past 20 years…In total, the sample includes 14,126 papers by 26,349 distinct authors from 6,173 distinct institutions, with 1,222,292 total citations and 27,447 total media mentions…Using a validated machine learning model that estimates a paper’s likelihood of replication, we found evidence that both supports and refutes speculations drawn from a relatively small sample of manual replications.
we find that a single overall replication rate of Psychology poorly captures the varying degree of replicability among subfields.
we find that replication rates are strongly correlated with research methods in all subfields. Experiments replicate at a statistically-significantly lower rate than do non-experimental studies.
we find that authors’ cumulative publication number and citation impact are positively related to the likelihood of replication, while other proxies of research quality and rigor, such as an author’s university prestige and a paper’s citations, are unrelated to replicability.
contrary to the ideal that media attention should cover replicable research, we find that media attention is positively related to the likelihood of replication failure.
Our assessments of the scale and scope of replicability are important next steps toward broadly resolving issues of replicability.
Figure 2: Comparing Replicability for 6 Psychology Subfields and Between Experimental and Non-experimental Research. Panel A shows the average replicability estimated for papers published in specialized journals, categorized into 6 subfields. The light blue vertical line represents the median for each subfield, and the dark blue line is the mean. Panel B also illustrates predicted replication scores, but the papers are all published in a single multi-subfield journal, Psychological Science. The subfield replicability rankings were largely consistent with the ones in specialized journals, except that the order of Cognitive and Clinical Psychology was reversed.
To explain the subfield patterns, Panel C further breaks down the average replication scores by research methods for each subfield, comparing replicability between experimental (orange boxes) and non-experimental research (blue boxes).
The proportion of experimental vs. non-experimental research in each subfield is marked as k% of total papers (eg. 40% of Developmental Psychology papers are experimental). Experimental research on average has lower replication scores, and the proportion of experimental research partially explains the subfield differences in average replicability.
Figure 3: Percentage of Experimental Research in Each Psychology Subfield and the Subfield’s Mean Replication Score.
Subfields with larger proportions of non-experiments (Personality Psychology, Organizational Psychology and Clinical Psychology) have a propensity for higher average replication scores. An exception is Developmental Psychology. It has the lowest average replicability and is mostly non-experimental.
The discrepancy may be accounted for by the tendency of Developmental Psychology to study participants over their lifespans, from infancy to adulthood, which presents unique data collection challenges57.
…The final publication metric we examined with replicability is media coverage. Ideally, media should cover credible and rigorous research. Yet in reality, the mainstream media tends to highlight research that finds surprising, counterintuitive results66. A small sample of replications has shown that the more surprising a study’s finding, the less likely it is to replicate10. Our analysis more directly tested the association between media coverage and replicability and found similar results. Both training and prediction samples indicate that media attention and replication success are negatively correlated (Figure 4E). Biserial correlations are r = −0.21, p = 0.001 in the training sample, and r = −0.13, p < 0.001 in the prediction sample.