“Making the Most of Clumping and Thresholding for Polygenic Scores”, Florian Privé, Bjarni J. Vilhjálmsson, Hugues Aschard, Michael G. B. Blum2019-05-30 (; backlinks; similar)⁠:

Polygenic prediction has the potential to contribute to precision medicine. Clumping and Thresholding (C+T) is a widely used method to derive polygenic scores. When using C+T, people usually test several p-value thresholds to maximize predictive ability of derived polygenic scores. Along with this p-value threshold, we propose to tune 3 other hyper-parameters for C+T. We implement an efficient way to derive C+T scores corresponding to many different sets of hyper-parameters. For example, you can now derive thousands of different C+T scores for 300K individuals and 1M variants in less than one day. We show that tuning 4 hyper-parameters of C+T consistently improves its predictive performance in both simulations and real data applications as compared to tuning only the p-value threshold.

Using this grid of computed C+T scores, we further extend C+T with stacking. More precisely, instead of choosing one set of hyper-parameters that maximizes prediction in some training set, we propose to learn an optimal linear combination of all these C+T scores using an efficient penalized regression. We call this method Stacked Clumping and Thresholding (SCT) and show that this makes C+T more flexible. When the training set is large enough, SCT can provide much larger predictive performance as compared to any of the C+T scores individually.