“Genetic Diversity Fuels Gene Discovery for Tobacco and Alcohol Use”, 2022-12-07 ():
Tobacco and alcohol use are heritable behaviors associated with 15% and 5.3% of worldwide deaths, respectively, due largely to broad increased risk for disease and injury. These substances are used across the globe, yet genome-wide association studies have focused largely on individuals of European ancestries.
Here we leveraged global genetic diversity across 3.4 million individuals from 4 major clines of global ancestry (~21% non-European) to power the discovery and fine-mapping of genomic loci associated with tobacco and alcohol use, to inform function of these loci via ancestry-aware transcriptome-wide association studies, and to evaluate the genetic architecture and predictive power of polygenic risk within and across populations.
…Using our multi-ancestry meta-analysis, we identified 2,143 associated loci across all phenotypes (sentinel variant p < 5 × 10−9), with 3,823 independently associated variants (Extended Data Figure 2, Supplementary Tables 2 & 3 & Supplementary Figures 2 & 3). Of these, 1,346 loci and 2,486 independent variants were associated with SmkInit, 33 loci (39 variants) with AgeSmk, 140 loci (243 variants) with CigDay, 128 loci (206 variants) with SmkCes and 496 loci (849 variants) with DrnkWk. ~64% (n = 1,364) of loci were phenotype-specific, 5 loci were associated with all 4 smoking phenotypes but not with DrnkWk, and 5 loci were associated with all 5 phenotypes. All sentinel variants within identified loci had high posterior probabilities that their effect would replicate in a sufficiently powered study according to a trans-ancestry extension of our GWAS cross-validation technique.
…We found that increases in sample size and genetic diversity improved locus identification and fine-mapping resolution, and that a large majority of the 3,823 associated variants (from 2,143 loci) showed consistent effect sizes across ancestry dimensions. However, polygenic risk scores [eg. 10% smoking] developed in one ancestry performed poorly in others, highlighting the continued need to increase sample sizes of diverse ancestries to realize any potential benefit of polygenic prediction.
…To characterize the multifactorial genetic aetiology of tobacco and alcohol use, we computed genetic correlations of our EUR-stratified results with 1,141 medical, biomarker and behavioral phenotypes from the UK Biobank (Supplementary Tables 10 & 11). An affinity propagation clustering algorithm was used to aid interpretability by grouping UK Biobank phenotypes such that each of the 5 current phenotypes were exemplars (Supplementary Figure 5). SmkInit and AgeSmk clustered together, as did SmkCes and CigDay, with all 4 forming a broad higher-level smoking cluster. Phenotypes with high positive genetic correlations with SmkInit included addiction to any substance, neighbourhood material deprivation, diagnosis of chronic obstructive pulmonary disease, and a negative correlation with age at first sexual intercourse (|rg| = 0.57–0.64). For AgeSmk, the largest genetic correlations were with reproductive phenotypes such as age at first birth (rg = 0.69–0.71) and measures of years of education and attainment (rg = 0.58–0.69). CigDay and SmkCes were most highly positively correlated with respiratory and cardiovascular diseases and cancers (rg = 0.52–0.72), highlighting their genetic link to adverse disease outcomes. Finally, DrnkWk was most strongly correlated with problematic drinking behaviors (rg = 0.52–0.70), indicating extensive overlap in the genetic architecture of DrnkWk and measures of alcohol use, problems and alcohol use disorder. This is consistent with previous findings of strong but imperfect genetic correlations (for example, rg = 0.8) between alcohol consumption and alcohol use disorder from large-scale GWAS. We note, however, that genetic correlations can be difficult to interpret as they may be affected by genetic confounding, mediation effects or sampling bias.
We used the ancestry-stratified meta-analysis results to construct ancestry-specific polygenic risk scores in Add Health, an independent target sample of individuals of diverse ancestries from the United States (n = 2,199 AFR, 1,132 AMR, 525 EAS and 6,092 EUR).