“Multivariate BWAS Can Be Replicable With Moderate Sample Sizes”, 2023-03-08 (; backlinks):
[code] …In their recent paper, et al 2002 evaluated the effects of sample size on univariate and multivariate BWAS in 3 large-scale neuroimaging datasets and came to the general conclusion that “BWAS reproducibility requires samples with thousands of individuals”. We applaud their comprehensive analysis, and we agree that (1) large samples are needed when conducting univariate BWAS and (2) multivariate BWAS reveal substantially larger effects and are therefore more highly powered.
…However, we distinguish between the effect-size estimation method (in-sample versus cross-validated) and the sample (discovery versus replication), and show that, with appropriate cross-validation, the in-sample inflation that et al 2002 report in the discovery sample can be entirely eliminated. With additional analyses, we demonstrate that multivariate BWAS effects in high-quality datasets can be replicable with substantially smaller sample sizes in some cases. Specifically, applying a standard multivariate prediction algorithm to functional connectivity in the Human Connectome Project yielded replicable effects with sample sizes of 75–500 for 5⁄6 phenotypes tested (Figure 1).
…The issue with claims of inflation is that the in-sample effect size estimates of et al 2022 were based on training multivariate models on the entire discovery sample, without cross-validation or other internal validation (as confirmed by inspection of the code and discussion with the authors). Such in-sample correlations are not valid effect-size estimates, as they produce a well-known overfitting bias that increases with model complexity5. Standard practice in machine learning is to evaluate model accuracy (and other performance metrics) on data independent of those used for training. In line with current recommendations for multivariate brain-behavior analyses6,7, this is typically performed using internal cross-validation (for example, k-fold) to estimate unbiased effect sizes in a discovery sample, and (less commonly) further validating statistically-significant cross-validated effects in held-out or subsequently acquired replication samples2,5.