Regression To The Mean Fallacies
Regression to the mean is a general statistical phenomenon which leads to several widespread fallacies in analyzing & interpreting statistical results, such as residual confounding and Lord’s paradox.
Regression to the mean causes regression fallacy, but it also leads to additional errors, particularly when combined with measurement error:
Residual Confounding: “Statistically Controlling for Confounding Constructs Is Harder than You Think”, 2016 (1992/1991/ et al 1992, et al 2004a/ et al 2004b, et al 2007/ et al 2007/2011, et al 2021)1 Example: et al 2021.
“Impossibly hungry judges”, Lakens
A newer twist on residual confounding is to include a polygenic score, which typically measures a small fraction of all genetic influences on many outcomes & environmental measures (usually much less than half of the genetic variance), and declare that “[all] genetics have been controlled for” and proceed to interpret all remaining model coefficients as purely-environmental causal variables, and all remaining group differences as various kinds of societal bias or discrimination or environmental/nurture (eg. et al 2021, et al 2021, et al 2020, 2020, et al 2019, et al 2022). An example of correctly using PGSes (using the et al 2021 method to extrapolate the known incompleteness of the PGS to full heritability) is et al 2022.
-
“Evaluating the Effect of Inadequately Measured Variables in Partial Correlation Analysis”, 1936
“Gifted Today But Not Tomorrow? Longitudinal Changes in Ability and Achievement in Elementary School”, 2006 (Challenges in gifted education in elementary or earlier: IQ scores are unstable and so regression to the mean implies that few children in G&T programs will grow up to be gifted); Genius Revisited Revisited (early childhood IQ is measured with great error, and so extremely-high-IQ elementary schools select much less high IQ adults and correspondingly unimpressive results, in contrast to later selection); “To Understand Regression From Parent to Offspring, Think Statistically”, 1978
“Regression Fallacies in the matched groups experiment”, 1942
“Control of Spurious Association and the Reliability of the Controlled Variable”, 1965; “Nuisance Variables and the Ex Post Facto Design”, 1970
Kelley’s Paradox (cf. Lord’s paradox): the Roman poet Terence noted that “When two do the same, it isn’t the same.”; when we have prior knowledge about score distributions, 2 identical scores may have different implications because they will be shrunk differently by regression to the mean, and this will be stronger the more extreme the scores are & the larger the prior differences. (If measurements are not corrected and their predictive accuracy is reduced by leaving them as ‘raw’, this may manifest in the real world as statistical discrimination as optimizing agents learn to implicitly correct and ‘discriminate’.) This frequently comes up in standardized testing & exams, where one of the first to point out the implications was Truman Lee Kelley:
Statistical Method, 1923; Interpretation of Educational Measurements, 1927; Fundamentals of Statistics, 1947
Longitudinal: “Interpreting regression toward the mean in developmental research”, 1973; “Lord’s Paradox in a Continuous Setting and a Regression Artifact in Numerical Cognition Research”, Eriksson & 2014; “Allocation to groups: Examples of Lord’s paradox”, 2019
“The relevance of group membership for personnel selection: A demonstration using Bayes’ theorem”, 1994
“Kelley’s Paradox”, 2000
“Three Statistical Paradoxes in the Interpretation of Group Differences: Illustrated with Medical School Admission and Licensing Data”, 20063
“Measurement Error, Regression to the Mean, and Group Differences” (eg. selective attrition from college majors differs by group, and so a measurement like “has an bachelor degree” means different things by group)
Winner’s curse: “The Optimizer’s Curse: Skepticism and Postdecision Surprise in Decision Analysis”, 2006 (decisions with different accuracies/measurement-error will suffer different regression to the mean and be biased towards the most-overestimated options)
“Predicting the Next Big Thing: Success as a Signal of Poor Judgment”, 2010 constructs a scenario where regression is so severe that being in the tail constitutes evidence for true below-average accuracy
Dunning-Kruger Effect: the famous Dunning-Kruger effect can be caused by regression to the mean from measurement error, due to floor/ceiling effects in measured performance vs expressed confidence; “backfire effects” can also be manufactured this way (Swire- et al 2020)
Placebo Effects: just regression to the mean (on et al 2010) after all?
‘Nocebo’ effects also suffer from this critique (eg. in sports statistics, the supposed harm of status like the “Madden curse” or the ‘Forbes cover’ effect)
Replication Crisis: because systemic biases like p-hacking filter for the most extreme outliers4, published effect sizes will predictably decline over time, with additional replication, and with better methodologies
Baader-Meinhof Effect: 1989 propose that one reason that a rare word may seems to abruptly appear repeatedly is simply that for rare words which are not seen for the most unusually over-due period, the duration for the next appearance will be more ordinary and it’ll re-appear ‘quickly’
See Also: Second product syndrome, Order statistics: The Probability of a Double Maximum, the James-Stein estimator (1977/1990)