Characterizing the natural selection of complex traits is important for understanding human evolution and both biological and pathological mechanisms.
We leveraged genome-wide summary statistics for 870 polygenic traits and attempted to quantify signals of selection on traits of different forms in European ancestry across 4 periods in human history and evolution.
We found that 88% of these traits underwent polygenic change in the past 2,000–3,000 years. Recent selection was associated with ancient selection signals in the same trait. Traits related to pigmentation, body measurement and nutritional intake exhibited strong selection signals across different time scales. Our findings are limited by our use of exclusively European data and the use of genome-wide association study data, which identify associations between genetic variants and phenotypes that may not be causal.
In sum, we provide an overview of signals of selection on human polygenic traits and their characteristics across human evolution, based on a European subset of human genetic diversity. These findings could serve as a foundation for further population and medical genetic studies.
…As shown in Figure 1, we focus on 2 primary goals. First, we describe the selection pressure on each trait at 4 different time scales (Figures 2–5). This is achieved using various metrics derived from different statistical models (Mendelian Randomization (MR), singleton density score, ancient genome analysis and so on), each fitting a specific timeframe or form of selection. Second, we integrate these metrics to explore the association among selection pressures, trait characteristics and functional genomic patterns (Figures 6–8), using linear regression and unsupervised clustering.
…Body measurements and contemporary reproductive success: Our analysis started by exploring natural selection pressure at the present time. We hypothesized that the current natural selection of a trait is relevant to whether it could causally impact human reproductive success (that is, number of offspring) and mating success (for this, we used the proxy of number of overall sexual partners). To quantify these causal effects, we applied MR on GWAS summary statistics between tested traits and reproductive success, as well as between tested traits and mating success. At the statistical-significance cutoff of |zMR| > 4 (Method), we found that 7.4% of traits with valid MR results (that is, traits passing sensitivity analysis) (40⁄539) had a causal effect on the number of offspring of males, whereas 5.9% (32⁄542) of traits with valid MR results impacted the number of offspring of females (Supplementary Table 2). Separating the traits into 15 categories (Figure 2A, Figure 2B), we observed that 52% (23⁄44) of anthropometric body measurement traits such as height (zMR = 8.09, p = 3.33 × 10−16 in males; zMR = 4.91, p = 4.55 × 10−7 in females) were causally related to the number of offspring of males. By contrast, only 30% (14⁄47) of body measurement traits were causally related to the number of offspring of females. In addition, the effect of another type of body measurement (dermatology traits such as skin color) on reproductive success also exhibited sex specificity: 38% (5⁄13) of dermatology traits influenced the number of offspring of males, but none affected the number of offspring of females. However, when testing for 112 complex conditions such as schizophrenia11 and stroke15, polygenic risks showed no statistically-significant causal effect on the numbers of offspring for either males or females (nominal p > 0.05/112). The distribution of effect direction was also similar between disease and non-disease traits (Fisher p = 0.40 for males, p = 0.71 for females).
For mating success (Supplementary Figure 2), body measurement traits also had an impact: 44% of body measurement traits impacted the number of sexual partners of males, compared with 12% affecting the number of sexual partners of females. Interestingly, among all 112 tested polygenic disease traits, schizophrenia (zMR = 7.37, p = 8.53 × 10−14) and attention-deficit hyperactivity disorder (zMR = 4.62, p = 1.92 × 10−6) increased the number of sexual partners of males, in line with previous findings that increased genetic liability for schizophrenia does not confer a fitness advantage but does increase mating success16. For males, the impact on reproductive success of a trait was positively correlated with its impact on mating success (Supplementary Figure 2; Pearson correlation coefficient (PCC) 0.47, 95% CI 0.39 to 0.55, p = 9.30 × 10−31). However, this was not true for females, for whom the impact on reproductive success of a trait was negatively correlated with its impact on mating success (Supplementary Figure 2; PCC −0.10, 95% CI −0.20 to 0, p = 0.02). This discrepancy is consistent with the evolutionary psychology theory that males and females adopt distinct sexual strategies that shape assortative selection17.
Next, we investigated whether the trait impact on reproductive success and mating would differ between the sexes. In general, trait impact on human reproductive success was similar for males and females (Figure 2c; PCC 0.38, 95% CI 0.32 to 0.44, p = 6.85 × 10−31). Trait impacts of mating success were also similar between the sexes (Supplementary Figure 2; PCC 0.64, 95% CI 0.58 to 0.70, p = 9.18 × 10−106). Notably, high intelligence trait statistically-significantly reduced the number of offspring in both females and males (zMR = −7.55, p = 2.18 × 10−14 in females, zMR = −5.13, p = 1.45 × 10−7 in males), and increased the expected number of sexual partners for females (zMR = 7.05, p = 8.97 × 10−13) (Supplementary Figure 1).
In addition, we applied causal analysis using summary effect estimates18 to all MR results to analyse the role of genetic correlation. We found that most of the results were explained mainly by causal effects instead of genetic correlation. Using another GWAS19 dataset and applying MR bias estimation20, we again showed that our results were not explained by GWAS sample overlap (‘MR analysis details’ in Supplementary Information).
Figure 2: Selection pressure in the present day and in recent history.A, B: Proportion of traits showing MR causal effects on the number of offspring of males (a) and females (b) for each category. c, Comparison of MR z scores between males (x-axis) and females (y-axis). Dashed lines indicate the statistical-significance threshold (|z| > 4). The text indicates selected traits with results of special interest. DER, dermatology; NUT, nutrition; REP, reproduction; GI, gastrointestinal; PSY, psychiatry; RES, respiratory; MED, medication; COG, social cognition; MUSC, musculoskeletal; MET, metabolism; CIRC, circulation; NEU, neurology.
…Widespread polygenic adaptation in the past 2,000–3,000 years: At the statistical-significance threshold of p < (0.05/870 = 5.7 × 10−5), we found that 88% (761⁄870) of polygenic traits had a statistically-significant correlation between the GWAS p-value and tSDS (ρSDS; Supplementary Table 3). Previous analysis has found that population stratification of UK Biobank might bias the estimated polygenic adaptation22. Thus, to exclude this potential confound in our analyses, we applied another method with a different statistical model, which involves reconstructing the history of polygenic scores (RHPS)23, based on RELATE24 (RHPS-RELATE, Methods). We set the reference panel as all European participants of 1000 Genomes to avoid population stratification. As shown in Supplementary Table 3, the polygenic risk score (PRS) alteration in the past 100 generations (roughly equivalent to 2,800 years (ref. 24)) was mostly in accordance with ρSDS (PCC 0.25, 95% CI 0.18 to 0.32, p = 3.96 × 10−13). Among 755 traits with statistically-significant non-zero ρSDS, 13.8% (104⁄755) showed a consistent statistically-significant alteration of PRS (p for ‘Tx test’ from RHPS < 0.05/870, Methods), and 26.1% (197⁄755) showed a nominally statistically-significant alteration (p for Tx test < 0.05). Notably, our RHPS-RELATE results also highlighted those traits with the highest ρSDS, such as ease of skin tanning (p for ρSDS <10−100; p for Tx test <10−100) and raw vegetable intake (p for ρSDS <10−100; p for Tx test 2.69 × 10−51) (Supplementary Table 3). In general, the results of RHPS-RELATE were consistent with the ρSDS analysis, albeit at lower statistical power. Thus, we conclude that the ρSDS results are credible and can truly reflect recent adaptation prevalence…by using simulations of genetic drift and demographic isolation strategies, our results suggest that population stratification did not drive a systematic bias on ρSDS. We consequently propose that the observed bias on height might not represent the majority of traits.
…When analysing all traits, we observed that dermatology traits generally showed the most statistically-significant selection signals (median |ρSDS| = 0.69, Figure 3A, B), followed by nutrition intake (median |ρSDS| = 0.48; Supplementary Figure 4) and reproduction-related traits (median |ρSDS| = 0.30; Supplementary Figure 4). Ease of skin tanning was the trait with the most statistically-significant adaptation (ρSDS = 0.96, p < 10−100; Figure 3c). Ever been drinkers (ρSDS = −0.82, p < 10−100) and sitting height (ρSDS = 0.84, p < 10−100) were also among traits with an extreme adaptation signal (|ρSDS| > 0.8), which made up 3.3% of all traits (Supplementary Figure 4). Neurological traits such as brain structures exhibited the least polygenic adaptation (median |ρ| = 0.05).
In contrast to non-disease traits, the adaptive pressure on polygenic disease traits was generally negative (median ρSDS = −0.08; permutation p = 3.22 × 10−6), especially for early-onset conditions such as autism spectrum disorder (median ρSDS = −0.12; Supplementary Figure 5). The greatest evidence of negative adaptation was found for high cholesterol (ρSDS = −0.66, p < 10−100; Supplementary Figure 5). Still, we found evidence of positive adaptation for a few diseases such as skin cancer and inflammatory bowel disease (ρSDS > 0.2, p <10−100; Supplementary Figure 5), and even some early-onset conditions such as attention deficit hyperactivity disorder (ρSDS = 0.20, p <2.16 × 10−24) and anorexia nervosa (ρSDS = 0.16, p = 1.24 × 10−19) (Supplementary Table 3). This result suggested that some of the disease traits might be by-products of other positive selection events.
…As shown in Figure 4a & Supplementary Table 5, after controlling for covariances (for example, latitude, longitude and genotyping coverage) and multiple tests, the polygenic burden of 78 traits was statistically-significantly associated with the percentage of hunter-gatherer ancestry (HG%). By contrast, another 6 traits, such as denture usage, were associated with time in at least one of 3 datasets. 7⁄13 dermatology traits were most predominantly associated with HG% (Figure 4a), with ‘ease of skin tanning’ as the most statistically-significant example (regression tHG = 20.3, p = 1.74 × 10−38; Figure 4b). In the Near East dataset, we observed that signals of selection on skin tanning varied by latitude (Figure 4c), with signals of positive selection observed in regions of low latitude (latitude < 50°; t = 4.12, p = 1.91 × 10−5), but signals of negative selection observed at high latitudes (t = 4.95, p = 3.80 × 10−7). After controlling for the impact of latitude, we observed a general ascending trend for ‘ease of skin tanning’ for the Near East dataset, suggesting overall positive selection (regression tNear East = 5.81, p = 2.29 × 10−8; Figure 4c). We also found a nominally statistically-significant increment for ease of skin tanning in the pre-Neolithic period (regression tpre-Neolithic = 4.25, p = 1.11 × 10−5), but not in the Neolithic period (regression tNeolithic = 0.92, p = 0.18; Supplementary Figure 7).
Figure 3: Selection pressure in recent history.a, Distribution of absolute Spearman correlation (|ρSDS|) between the tSDS and GWAS p-value for each category. The upper and lower margins of the box indicate the first and third quartiles of ρSDS, and the thickened line its median. b, ρSDS for all dermatology traits. The diagonal of the rhombus indicates ρSDS, and the width its 95% CI, Scatter plot showing the correlation between tSDS and GWAS p-value bin for the trait ‘ease of skin tanning’. Each point represents a bin of 1,000 SNPs ranked by their GWAS p-value. The y-axis indicates the bin median tSDS. Abbreviations as in Figure 2.
Figure 7: Population-average polygenic risk score trajectory for 765 traits. Trajectories are grouped into 4 clusters according to their time-series similarity by hierarchical clustering. The y-axis shows the z scores of PRS. Colour marks different traits with overlapping trajectories, and dashed line marks median trajectory of each cluster.
…although all the above possibilities explained a proportion of disease heritability, there is still room for another ‘trivial explanation’: natural selection was indeed eliminating the risk alleles but simply not fast enough, due to the small effect of each allele and the small effective population size at the risk loci8,46