“On the Predicted Replicability of Two Decades of Experimental Research on System Justification: A z-Curve Analysis”, Lukas K. Sotola, Marcus Credé2022-09-24 (, )⁠:

We examine the predicted replicability of experimental research on system justification theory (SJT) [essentially, a laundering of Marxist false consciousness] by conducting a z-curve analysis.

z-curve is a meta-analytic technique similar to p-curve, but which performs better under conditions of heterogeneity. It estimates the predicted replication rate, average power, false discovery risk, and file drawer ratio (FDR) of a sample of studies. The z-curve based on 116 papers and 232 unique samples suggests:

that the experimental SJT literature is likely to show low rates of replicability, as indicated by an overall average statistical power of 16%. Moderator analyses suggest that this may be driven in part by publication pressures, that the replicability of research in this area has improved since 2015, and that studies using system threat manipulations show particularly low estimated replication rates (ERR).

Implications for the replicability and validity of the experimental SJT literature are discussed, and recommendations to increase the rigor of research are put forth.

Comparing published and unpublished studies: Among published studies, there was an ERR of 18%, an ODR of 82%, an EDR of 11%, an FDR of 7.80, and a Sorić FDR of 41%. Among unpublished studies, there was an ERR of 26%, an ODR of 61%, an EDR of 21%, an FDR of 3.83, and a Sorić FDR of 20%.

These results show that unpublished studies are predicted to replicate at a higher rate; show lower instances of QRPs; show a lower FDR by half; and a lower false discovery risk—again by almost half. This suggests the possibility that publication pressures are associated with the problematic outcomes of the z-curves we report, because each of the metrics is slightly better among studies that were not published.

Overall z-curve: The z-curve analysis including all p-values from all studies is shown in Figure 8. As expected, and similarly to the moderator analyses, this showed substantial evidence of publication bias. The ODR (78%) and EDR (14%) are substantially different, the FDR (6.40) is high, average power (18%) is low, and the Sorić FDR (34%) is not ideal. Just over 1⁄3rd of these findings could be false positives, and there are predicted to be almost 7 unpublished, non-statistically-significant studies for every study published; and the average power suggests an 82% chance of a Type II Error.