“Estimating Population Mean Power Under Conditions of Heterogeneity and Selection for Significance”, Jerry Brunner, Ulrich Schimmack2020-05-31 (, ; backlinks)⁠:

In scientific fields that use statistical-significance tests, statistical power is important for successful replications of statistically-significant results because it is the long-run success rate in a series of exact replication studies. For any population of statistically-significant results, there is a population of power values of the statistical tests on which conclusions are based.

We give exact theoretical results showing how selection for statistical-significance affects the distribution of statistical power in a heterogeneous population of statistical-significance tests.

In a set of large-scale simulation studies, we compare 4 methods for estimating population mean power of a set of studies selected for statistical-significance (a maximum likelihood model, extensions of p-curve and p-uniform, & z-curve).

The p-uniform and p-curve methods performed well with a fixed effects size and varying sample sizes. However, when there was substantial variability in effect sizes as well as sample sizes, both methods systematically overestimate mean power. With heterogeneity in effect sizes, the maximum likelihood model produced the most accurate estimates when the distribution of effect sizes matched the assumptions of the model, but z-curve produced more accurate estimates when the assumptions of the maximum likelihood model were not met.

We recommend the use of z-curve to estimate the typical power of statistically-significant results, which has implications for the replicability of statistically-significant results in psychology journals.

[Keywords: power estimation, post-hoc power analysis, publication bias, maximum likelihood, z-curve, p-curve, p-uniform, effect size, replicability, meta-analysis]