Examining the role of common variants in rare neurodevelopmental conditions

Huang, Qin Qin; Wigdor, Emilie M.; Malawsky, Daniel S.; Campbell, Patrick; Samocha, Kaitlin E.; Chundru, V. Kartik; Danecek, Petr; Lindsay, Sarah; Marchant, Thomas; Koko, Mahmoud; Amanat, Sana; Bonfanti, Davide; Sheridan, Eamonn; Radford, Elizabeth J.; Barrett, Jeffrey C.; Wright, Caroline F.; Firth, Helen V.; Warrier, Varun; Strudwick Young, Alexander; Hurles, Matthew E.; Martin, Hilary C.

doi:10.1038/s41586-024-08217-y

Article
Open access
Published: 20 November 2024

Examining the role of common variants in rare neurodevelopmental conditions

Qin Qin Huang ,
Emilie M. Wigdor,
Daniel S. Malawsky,
Patrick Campbell^,,
Kaitlin E. Samocha ^,,
V. Kartik Chundru ^,,
Petr Danecek,
Sarah Lindsay ,
Thomas Marchant,
Mahmoud Koko ,
Sana Amanat,
Davide Bonfanti,
Eamonn Sheridan^,,,
Elizabeth J. Radford ^,,
Jeffrey C. Barrett,
Caroline F. Wright ,
Helen V. Firth^,,
Varun Warrier ^,,
Alexander Strudwick Young^,,
Matthew E. Hurles &
…
Hilary C. Martin

Nature volume 636, pages 404–411 (2024)Cite this article

20k Accesses
2 Citations
147 Altmetric
Metrics details

Abstract

Although rare neurodevelopmental conditions have a large Mendelian component¹, common genetic variants also contribute to risk^2,3. However, little is known about how this polygenic risk is distributed among patients with these conditions and their parents nor its interplay with rare variants. It is also unclear whether polygenic background affects risk directly through alleles transmitted from parents to children, or whether indirect genetic effects mediated through the family environment⁴ also play a role. Here we addressed these questions using genetic data from 11,573 patients with rare neurodevelopmental conditions, 9,128 of their parents and 26,869 controls. Common variants explained around 10% of variance in risk. Patients with a monogenic diagnosis had significantly less polygenic risk than those without, supporting a liability threshold model⁵. A polygenic score for neurodevelopmental conditions showed only a direct genetic effect. By contrast, polygenic scores for educational attainment and cognitive performance showed no direct genetic effect, but the non-transmitted alleles in the parents were correlated with the child’s risk, potentially due to indirect genetic effects and/or parental assortment for these traits⁴. Indeed, as expected under parental assortment, we show that common variant predisposition for neurodevelopmental conditions is correlated with the rare variant component of risk. These findings indicate that future studies should investigate the possible role and nature of indirect genetic effects on rare neurodevelopmental conditions, and consider the contribution of common and rare variants simultaneously when studying cognition-related phenotypes.

Genetic modifiers of rare variants in monogenic developmental disorder loci

Article Open access 18 April 2024

The impact of rare protein coding genetic variation on adult cognitive function

Article Open access 25 May 2023

A review of the cognitive impact of neurodevelopmental and neuropsychiatric associated copy number variants

Article Open access 08 April 2023

Main

Rare conditions affect 3.5–6% of the global population⁶ and of these, most involve the central nervous system⁷. Whereas genomic sequencing has revolutionized the diagnosis of rare neurodevelopmental conditions, which typically include intellectual disability and/or developmental delay, a monogenic diagnosis is only identified for about 30–40% of patients^1,8. Common variants also contribute to risk for rare neurodevelopmental conditions^2,3. In particular, this common variant contribution overlaps with polygenic risk for schizophrenia and for predisposition to reduced educational attainment and cognitive performance². Accordingly, rare damaging variants in constrained genes, which play a major role in risk of rare neurodevelopmental conditions, are also associated with increased risk of mental health conditions and reduced educational attainment and cognitive performance in UK Biobank^9,10,11. In this work, we seek to address three fundamental questions (Extended Data Fig. 1). First, we aim to better understand the nature of common variant risk for rare neurodevelopmental conditions, particularly its overlap with common variant risk for mental health and cognitive phenotypes. Second, we aim to explore the interplay between common and rare variants in the context of these conditions. Third, we aim to test whether there is an effect of common variants in the parents on their child’s risk of these conditions, above and beyond the child’s own genetics.

We begin by leveraging new, larger genome-wide association studies (GWASs) than were previously available² to explore the extent to which common variant effects on rare neurodevelopmental conditions are correlated with their effects on a broad range of mental health conditions. This is motivated by findings that some psychiatric conditions have a partial neurodevelopmental origin^12,13,14, and that people with rare neurodevelopmental conditions¹⁵, as well as their relatives^16,17, are more likely to have psychiatric conditions. Some of this overlap seems to be driven by certain rare copy number variants with variable expressivity^18,19, suggesting some shared aetiology between psychiatric and rare neurodevelopmental conditions. Here, to address our first aim, we explore whether shared common variant effects may also contribute to the overlap between these conditions, and whether this is independent of the genetic overlap between these conditions and cognitive traits.

Little is known about the interplay between rare and common variants in the context of rare neurodevelopmental conditions, and dissecting this will be key to fully understanding their genetic architecture and improving genetic diagnosis and risk prediction. As the second aim of our study, we set out to address two hypotheses in this space, testing the liability threshold model and whether common variants modify the penetrance of inherited rare variants. The liability threshold model predicts that an individual will develop a condition once the sum of independent genetic and environmental risk factors exceeds some threshold^5,20. Under this model, one might expect that patients with neurodevelopmental conditions who have a highly penetrant damaging variant (constituting a monogenic diagnosis) would require, on average, less polygenic load to cross a diagnostic threshold than those without such variants (Extended Data Fig. 2a). We previously saw no significant difference in polygenic scores between patients with versus without a monogenic diagnosis², but in this work, we anticipated that increased sample size and improved diagnostic rate^1,21 might improve power to detect a difference. As rare variants associated with neurodevelopmental conditions seem to act additively with polygenic scores in affecting cognitive ability in UK Biobank^10,11, we hypothesized that polygenic background would modify the penetrance of these inherited rare variants in families with neurodevelopmental conditions, as it does, for example, in the context of BRCA1/2 variants predisposing to breast cancer²².

Finally, as our third aim, we explore whether common variants predisposing to rare neurodevelopmental conditions act directly on the affected individuals carrying them (‘direct genetic effects’). Many studies have shown that genetic associations between common genetic variants and cognition-related phenotypes estimated in population-based samples shrink when estimated within families^{4,23,24,25,26}. One possible explanation for this is that variants associated with these traits have indirect genetic effects, that is, they have some effect on the parents, and this then affects the offspring through the family or prenatal environment^4,26,27,28. However, confounding factors may also contribute to population-based genetic effect estimates^4,29,30. Studies of rare diseases have typically assumed implicitly that variants affecting risk have direct genetic effects on the affected individual. Given the genetic overlap with educational attainment and cognition, we hypothesized that the common variants associated with risk of rare neurodevelopmental conditions might not only reflect direct genetic effects.

We address these questions using two large UK-based cohorts of individuals with rare neurodevelopmental conditions, the Deciphering Developmental Disorders (DDD) study (N = 7,955 patients with genotype array and exome sequence data) and the Genomics England 100,000 Genomes project (GEL; N = 3,618 patients with genome sequence data), combined with several control cohorts (Supplementary Table 1). We have included a Frequently Asked Questions document in less technical language to explain the study, and to address some possible misunderstandings (Supplementary Note 1).

GWAS and genetic correlations

We first sought to validate the role of common genetic variation in neurodevelopmental conditions by replicating the key findings from our previous work in DDD in a large independent cohort. We identified a subset of GEL rare disease families with neurodevelopmental conditions and removed families overlapping with the DDD study (Methods). Almost all probands with neurodevelopmental conditions in GEL (97%) had intellectual disability or global developmental delay, versus 88% of those in DDD. The cohorts were broadly phenotypically similar (Extended Data Fig. 3 and Supplementary Note 2).

When comparing 3,618 unrelated patients with neurodevelopmental conditions to 13,667 unrelated controls within GEL, polygenic scores for educational attainment (PGS_EA)³¹, cognitive performance (PGS_CP)³¹ and schizophrenia (PGS_SCZ)³² each explained a significant but small amount of variance on the liability scale (R² < 1%; logistic regression P < 3.9 × 10⁻⁴). This was similar to that observed when comparing 6,397 unrelated patients from DDD with 9,270 independent unrelated controls (Supplementary Table 2). The polygenic score for neurodevelopmental conditions derived from our GWAS in DDD² (PGS_NDC,DDD) was also associated with neurodevelopmental conditions within GEL (P = 1.1 × 10⁻⁶, R² = 0.11%; Supplementary Table 2).

These results indicated that the polygenic contribution to rare neurodevelopmental conditions was similar between these two cohorts. Thus, to increase power to study common variant effects on these conditions, we conducted a GWAS in GEL, then meta-analysed the results with the DDD GWAS (Extended Data Fig. 4 and Supplementary Data 1–3). This meta-analysis revealed two genome-wide significant loci (Supplementary Note 3). Variants at one of these loci are associated with cognitive traits^31,33. The fraction of phenotypic variance explained by genome-wide common variants (that is, the single-nucleotide polymorphism (SNP) heritability on the liability scale assuming a population prevalence of 1%) was estimated at 11.2% (8.5–13.8%) (Supplementary Table 3).

In pursuit of our first main aim, to test for possible shared genetic contributors between rare neurodevelopmental conditions and other brain-related traits and conditions, we calculated genetic correlations (r_g) between them using our own and published GWAS meta-analyses. We observed the expected negative genetic correlations between neurodevelopmental conditions and educational attainment³¹ (r_g = −0.65 (−0.84, −0.47), P = 4.9 × 10⁻¹²) and cognitive performance³¹ (r_g = −0.56 (−0.73, −0.39), P = 1.6 × 10⁻¹⁰), stronger in magnitude than those observed with the DDD GWAS alone, and a positive genetic correlation with schizophrenia³² (r_g = 0.27 (0.13, 0.40), P = 9.7 × 10⁻⁵) (Fig. 1a and Supplementary Table 4). Furthermore, we detected significant genetic correlations (P < 0.0038 = 0.05/13; Bonferroni correction for 13 traits) with several other mental health and neurodevelopmental conditions including attention-deficit hyperactive disorder (ADHD)³⁴ (r_g = 0.46 (0.28, 0.64), P = 5.2 × 10⁻⁷), and with the ‘non-cognitive component of educational attainment’ derived from GWAS-by-subtraction (NonCogEA)³⁵ (r_g = −0.37 (−0.52, −0.22), P = 1.2 × 10⁻⁶) (Fig. 1a). We hypothesized that the genetic correlations with brain-related conditions could be explained at least in part by their relationship with educational attainment^35,36, given the strong negative genetic correlation between that and neurodevelopmental conditions. To explore this, we estimated the genetic correlations conditioning on the educational attainment GWAS summary statistics (Fig. 1b). The genetic correlations with ADHD and depression were no longer significant after conditioning on educational attainment, whereas those with schizophrenia and Tourette’s syndrome remained significant. The latent genetic component of neurodevelopmental conditions that was correlated with educational attainment explained 77% of the genetic correlation with ADHD, the highest among all tested conditions (Supplementary Fig. 1 and Supplementary Methods). These results confirmed that common variants collectively associate with rare neurodevelopmental conditions in two independent cohorts, and that these common variant effects are shared with other brain-related conditions and cognitive traits.

**Fig. 1: Genetic correlations between neurodevelopmental conditions and other brain-related traits and conditions.**

Below, we explore the extent and nature of the contribution of polygenic background to neurodevelopmental condition risk using PGS_NDC,DDD and polygenic scores for the most significantly genetically correlated traits (PGS_EA, PGS_CP, PGS_NonCogEA, PGS_SCZ) from much larger published GWASs. Several of these polygenic scores are significantly correlated with each other (Supplementary Fig. 2), thus our correction for multiples of five tests is conservative. Below, we often use the term ‘more polygenic risk’ for neurodevelopmental conditions as a shorthand for having higher PGS_NDC,DDD and/or PGS_SCZ, and/or lower PGS_EA, PGS_CP and/or PGS_NonCogEA.

Less polygenic risk in diagnosed probands

Thirty-six percent of patients in these cohorts have a molecular monogenic diagnosis, including de novo, recessive, X-linked or inherited dominant diagnoses that involve rare (or novel) variants¹. To address our second aim of investigating the interplay between common and rare genetic variants in these conditions, we tested whether these diagnosed patients differed from undiagnosed patients in terms of their polygenic risk. Consistent with the liability threshold model (Extended Data Fig. 2a), we observed significantly higher PGS_EA (DDD and GEL combined; average difference Δ = 0.12 standard deviations (s.d.), two-sided t-test P = 3.0 × 10⁻⁹), PGS_CP (Δ = 0.068 s.d., P = 1.2 × 10⁻³) and PGS_NonCogEA (Δ = 0.085 s.d., P = 3.7 × 10⁻⁵) in probands with versus without a monogenic diagnosis (Fig. 2a). Despite this, we observed that for all scores except for PGS_NonCogEA, the diagnosed probands still had significantly more polygenic risk than the controls (P < 0.01 = 0.05/5; Fig. 2a and Supplementary Table 5). Sensitivity analyses suggest that this observation is not driven by ascertainment bias in the controls, although the effect size is sensitive to the choice of control cohort, particularly for PGS_EA (Supplementary Note 4 and Supplementary Table 6). To mitigate this, we developed a set of statistical weights adjusting for sampling and non-response bias in the Millenium Cohort Study (MCS), an extra control cohort, to calculate weighted average polygenic scores that should be representative of the full UK population (Supplementary Note 4 and Extended Data Figs. 5 and 6). Both undiagnosed and diagnosed probands had a significantly lower average PGS_EA than weighted MCS controls (0.17 and 0.049 s.d., respectively; Supplementary Table 7).

**Fig. 2: Disentangling polygenic score associations with diagnostic status.**

The difference between the diagnosed probands and controls is driven by those with affected parents (those reported by clinicians to show a similar phenotype to their child), who had significantly lower polygenic scores for educational attainment and cognitive performance than those with unaffected parents (for example, PGS_EA Δ = 0.26 s.d., P = 3.4 × 10⁻³) (light blue points and diamonds in Extended Data Fig. 5). Diagnosed probands with unaffected parents did not show significantly different polygenic scores from the weighted MCS controls.

We next explored whether the difference in polygenic risk between diagnosed and undiagnosed probands was related to various technical, clinical and prenatal factors that are associated with receiving a monogenic diagnosis in DDD¹. For example, diagnosed probands were more likely than undiagnosed to be in a trio (probably due to the ability to distinguish de novo from inherited variants) and to have severe intellectual disability, and less likely to have been born prematurely (a known risk factor for neurodevelopmental conditions^37,38) (Fig. 2b and Supplementary Table 8). We hypothesized that some of these associations might be confounding, or be confounded by the association between PGS_EA and diagnostic status, as, for example, single-parent households and premature birth are associated with higher levels of deprivation and/or lower parental educational attainment³⁹. Indeed, we observed that the probands’ PGS_EA was significantly associated with several of these factors (Fig. 2c): a higher chance of being in a trio and having more severe intellectual disability, and a lower chance of being born prematurely and having any affected first-degree relatives (Extended Data Fig. 7a). However, it was not associated with sex (Supplementary Note 5 and Extended Data Fig. 8a) or maternal diabetes (Fig. 2c and Supplementary Table 8). Controlling for PGS_EA minimally altered the association between these factors and diagnostic status (Fig. 2b). Similarly, after controlling for these factors, the association between PGS_EA and diagnostic status remained significant with negligible change in effect size (Extended Data Fig. 7b). Thus, the observation that diagnosed patients tend to have lower polygenic risk than undiagnosed probably largely reflects the liability threshold model under which both common and rare variants contribute to risk (Extended Data Fig. 2a).

Assessing transmission of polygenic risk

Most of the parents in our sample are reported by clinicians to be clinically unaffected (89.2% in DDD and 95.4% in GEL, although the clinical annotation of parental affected status may be imperfect). Given this, and results in autism⁴⁰, we hypothesized that probands without monogenic diagnoses might inherit more common variant risk for neurodevelopmental conditions from unaffected parents than one would expect given their parents’ mean risk, a phenomenon termed ‘polygenic transmission disequilibrium’⁴⁰. Applying the polygenic transmission disequilibrium test (pTDT)⁴⁰ to undiagnosed trios with unaffected parents (Fig. 3a), we saw nominally significant over-transmission of PGS_NDC,DDD in 1,567 families not included in the original GWAS (pTDT deviation 0.062, paired t-test P = 0.014). This over-transmission was significant in females (pTDT deviation 0.10, P = 0.0078 in 589 trios) but not in males (pTDT deviation 0.036, P = 0.27 in 978 trios) (Extended Data Fig. 8c and Supplementary Note 5). However, we saw no significant transmission disequilibrium for the other polygenic scores (paired t-test P > 0.05) in either sex (Extended Data Fig. 8c) or in both sexes combined (Fig. 3a). Given the known over-transmission of PGS_EA to autistic individuals⁴⁰, we excluded autistic individuals from our sample and repeated the pTDT, but still only saw significant transmission disequilibrium for PGS_NDC,DDD (Supplementary Fig. 3a). Among probands with a monogenic genetic diagnosis, we saw no significant transmission disequilibrium for any polygenic score tested (Supplementary Fig. 3b).

**Fig. 3: Polygenic background in parents of patients with neurodevelopmental conditions.**

To put the pTDT results in context, we compared average polygenic scores between unaffected parents of undiagnosed patients and controls. For all five scores tested, the parents had more polygenic risk than the weighted MCS controls (P < 0.026) (Fig. 3b and Supplementary Table 7). Given this observation and the results from the pTDT, we conclude that risk for neurodevelopmental conditions is affected both by familial polygenic background, or factors correlated with it, and by polygenic risk (specifically, PGS_NDC,DDD) that is over-transmitted from unaffected parents to affected children.

Association with non-transmitted alleles

Given these findings, and to address our third aim, we next tested whether parental alleles are correlated with their child’s risk of neurodevelopmental conditions independently of the alleles transmitted to the child: in other words, whether there is an effect of parental alleles that are not transmitted to the child (‘non-transmitted alleles’) on the child’s phenotype. This could potentially be indicative of indirect genetic effects; that is, effects of alleles in parents on parental phenotypes that affect their offspring’s risk through the family environment (otherwise known as ‘genetic nurture’), as opposed to the direct genetic effects of alleles transmitted to the child. Indirect genetic effects have been argued to explain around 30–45% of the association between polygenic predictors of educational attainment and school grades^26,30 and educational attainment⁴, although these inferences have been contested as confounded by parental assortment and population stratification^29,30. To investigate the possible role of non-transmitted parental alleles in risk of neurodevelopmental conditions, we compared 2,866 affected trio probands from DDD + GEL whose parents are unaffected with 4,804 control trios from two UK birth cohorts (N = 3,932 trios) and from GEL (N = 872 trios without neurodevelopmental conditions). We first tested whether the child’s polygenic scores for traits related to neurodevelopmental conditions were significantly associated with case status (‘proband-only’ model), and then whether this held after conditioning on the parents’ polygenic scores (‘trio model’) (Fig. 4). The trio model removes the environmentally mediated portion of polygenic risk in the parents from the direct genetic effects of alleles transmitted to their children. We refer to the coefficients on the parental scores in the trio model as the ‘non-transmitted coefficients’ as they represent the association with non-transmitted parental alleles²⁴. For more explanation and formal mathematical definition of this model, see the Methods section on ‘Association with non-transmitted alleles’ and the legend of Fig. 4.

**Fig. 4: Assessing direct genetic effects and associations with non-transmitted parental alleles.**

For PGS_EA, PGS_CP and PGS_NonCogEA, we found that undiagnosed probands’ polygenic scores were no longer significantly associated with having a neurodevelopmental condition after conditioning on their parents’ scores in the trio model. This implies limited or no direct genetic effects, whereas the non-transmitted coefficients were highly significant (Fig. 4 and Supplementary Table 10). This result held for PGS_EA and PGS_NonCogEA in sensitivity analyses of subsets of trios; PGS_CP showed more equivocal results but the estimate of direct genetic effects was never significantly different from zero (Supplementary Fig. 4). We also observed a significant non-transmitted coefficient in the mother when using a polygenic score derived from a within-family GWAS for educational attainment²⁵ (Supplementary Note 6). This finding could imply that there are aspects of the environment—including the prenatal environment—that are correlated with these non-transmitted alleles and that affect risk of neurodevelopmental conditions, including genetically influenced parental phenotypes. However, our observations could also be due to the effects of parental assortment (that is, phenotypic correlation between partners), which we discuss further below.

For PGS_NDC,DDD, we found that the probands’ polygenic scores were still nominally significantly associated with having a neurodevelopmental condition after controlling for their parents’ scores in the trio model (Fig. 4). This implies that there is a direct genetic effect of PGS_NDC,DDD on the probands’ risk of neurodevelopmental conditions, consistent with the over-transmission observed in Fig. 3a. For PGS_SCZ, we saw no significant effect of the probands’ score (P = 0.089) in the trio model, whereas the mothers’ score was significant (P = 8.6 × 10⁻³) (Fig. 4). In summary, there is evidence for direct genetic effects of the polygenic score for rare neurodevelopmental conditions, but not for polygenic scores for related traits.

Exploring the role of prenatal factors

We explored whether prenatal factors might mediate the effects of non-transmitted parental alleles on risk of neurodevelopmental conditions (Supplementary Note 7). These included preterm delivery, smoking, alcohol use, gestational hypertension and sleep apnoea. Among them, preterm delivery (that is, giving birth prematurely)⁴¹, a risk factor for neurodevelopmental conditions in the offspring^37,38, showed the strongest genetic correlation with neurodevelopmental conditions (r_g = 0.58 (0.18, 0.97), P = 0.004) (Extended Data Fig. 9a and Supplementary Table 11), and was significantly genetically correlated with lower educational attainment (r_g = −0.30 (−0.39, −0.21), P = 2.3 × 10⁻¹⁰), mirroring the epidemiological association⁴². Premature birth was also associated with lower PGS_EA in DDD (P = 0.0125; Extended Data Fig. 9d). However, controlling for prematurity or removing premature probands did not significantly change the non-transmitted coefficients in the trio model (Supplementary Note 7 and Supplementary Fig. 5). Thus, there is no significant evidence at present that prematurity explains the association between neurodevelopmental conditions and non-transmitted common variants in the parents that are associated with educational attainment.

Correlated common and rare variant risk

Another factor that may contribute to the significant correlation between non-transmitted alleles in parents and neurodevelopmental conditions in their children is parental assortment, the phenomenon whereby people are more likely to choose partners with similar traits to themselves. Parental assortment is particularly strong for educational attainment and cognitive ability⁴³. It is also observed for psychiatric conditions^43,44,45, including in parents of autistic individuals and of individuals with neurodevelopmental conditions due to the 16p12.1 deletion⁴⁶. Parental assortment induces a correlation between alleles that act in the same direction on a trait, both between parents and, in their descendants, within and between loci⁴⁷ (Extended Data Fig. 2b). Thus, parental assortment on cognitive ability or correlated traits (for example, educational attainment) would be expected to lead to individuals with inherited rare variants associated with reduced cognitive ability^9,10,48,49 also having a polygenic background of common variants associated with reduced cognitive ability^46,47. In the proband-only model in Fig. 4, the proband’s polygenic score would statistically capture (‘tag’) the correlated effects of these rare variants (which causally affect neurodevelopmental conditions⁵⁰). However, in the trio model, the proband’s polygenic score would no longer capture effects of the rare variant component after conditioning on the parents’ scores (Extended Data Fig. 10a). Instead, this correlation with the rare variant component would be reflected by the non-transmitted coefficients on the parents’ polygenic scores²⁹.

To explore this potential genetic consequence of parental assortment in our cohorts, we tested whether the common and rare variant components contributing risk of neurodevelopmental conditions are indeed correlated. From the sequencing data in DDD and GEL, we extracted rare (minor allele frequency (MAF) <1 × 10⁻⁴) protein-truncating variants (PTVs) and damaging missense variants in genes intolerant of loss-of-function variation (‘constrained genes’), which are associated with reduced cognitive ability¹⁰ and risk of neurodevelopmental conditions^49,50. Consistent with the effects of parental assortment, among unaffected parents of probands with neurodevelopmental conditions, we observed that the number of rare damaging coding variants in constrained genes (the ‘rare variant burden score’) in one parent was significantly negatively correlated with the other parent’s PGS_EA (r = −0.065, P = 5.5 × 10⁻⁹), PGS_CP (r = −0.036, P = 1.4 × 10⁻³), and PGS_NonCogEA (r = −0.046, P = 4.3 × 10⁻⁵) (orange points in Fig. 5 and Supplementary Table 12). As expected, a similar correlation was seen within the probands themselves, regardless of whether including all probands, undiagnosed probands or probands with de novo diagnoses (blue lines in Fig. 5 and Supplementary Fig. 6b,c, respectively), and if restricting rare variant burden score to haploinsufficient genes associated with developmental disorders (three leftmost columns in Supplementary Fig. 6). We also saw a similar result among control children from the MCS (pale blue points in Supplementary Fig. 7), indicating that this correlation is not only observed in patients with neurodevelopmental conditions. We saw no significant correlation between any of the polygenic scores and the burden of rare synonymous variants in tested gene sets (right-hand panel in Fig. 5, third and sixth columns in Supplementary Fig. 6), confirming that the result observed for deleterious variants is unlikely to be due to population structure artefacts. The correlations between polygenic scores and rare damaging variants may explain why we saw very limited evidence that these scores modify the penetrance of such variants in families with neurodevelopmental conditions (Supplementary Note 8 and Supplementary Fig. 8).

**Fig. 5: Correlation between rare variant burden scores and polygenic scores.**

We next explored whether the correlation between common and rare variants associated with neurodevelopmental conditions could be driving the association between non-transmitted common alleles and children’s risk shown in Fig. 4. We extended the trio model to control for the probands’, mothers’ and fathers’ rare variant burden scores as well as polygenic scores when comparing trio probands with (N = 1,343) versus without (N = 872) neurodevelopmental conditions in GEL (red boxes in Extended Data Fig. 10b). Correcting for rare variant burden scores did not change our original conclusion from the trio regression analysis of common variants. However, we cannot rule out that the association between neurodevelopmental conditions and non-transmitted common alleles is primarily driven by the assortment-induced correlation between common and rare variants, because the rare variant burden score we have used probably only captures a small proportion of the total rare variant component (just as the polygenic score only captures a small fraction of SNP heritability). Thus, further work and new datasets are needed to confirm whether the association between risk of neurodevelopmental conditions and the non-transmitted alleles is due to true indirect genetic effects and/or parental assortment.

Discussion

Here we combined two large cohorts of patients with rare neurodevelopmental conditions to explore the contribution of common variants to risk. After first demonstrating that polygenic scores for neurodevelopmental conditions and several related traits were significantly associated with risk for neurodevelopmental conditions within both DDD and GEL (Supplementary Table 2), we conducted a GWAS meta-analysis of patients with neurodevelopmental conditions from the two cohorts and revealed significant genetic correlations with several psychiatric conditions that had not been previously reported² (Fig. 1a). Conditional genetic correlations show that these are only partially driven by the component of polygenic risk for neurodevelopmental conditions that is shared with educational attainment (for example, between 22% for Tourette’s and 77% for ADHD; Supplementary Fig. 1). This suggests that these brain-related conditions share underlying biology with neurodevelopmental conditions that is partly independent of that captured by effects of common variants on educational attainment, although we acknowledge that estimates of genetic correlations can be biased by cross-trait parental assortment and other confounding factors⁵¹. Furthermore, although we observe a significant negative genetic correlation with what has been termed the non-cognitive component of educational attainment, we note that this could also contain elements of cognitive ability not captured in the GWAS for cognitive performance³¹ that was used in the paper that derived it³⁵.

We showed that polygenic scores for several traits that are genetically correlated with neurodevelopmental conditions were significantly associated with having a monogenic diagnosis, with the strongest effect observed for educational attainment (Fig. 2a). Our previous work had found no such difference in polygenic background between diagnosed and undiagnosed probands in DDD², and it is likely that power has been improved here by our larger sample size and better definition of which probands truly have a monogenic diagnosis^1,21. Our result is consistent with a liability threshold model for rare neurodevelopmental conditions, and consistent with recent findings in a population-based cohort, UK Biobank¹¹, and a rare disease cohort⁵². Children without a large-effect monogenic variant may require higher polygenic load (or a major environmental contribution such as a teratogenic infection, for example, Zika virus⁵³) to move their phenotype over the threshold required to be clinically diagnosed with a neurodevelopmental condition (Extended Data Fig. 2a). Our findings suggest we can rule out a model whereby liability for neurodevelopmental conditions is conferred only by fully penetrant monogenic causes and environmental factors. Important for consideration in clinical settings, we find probands with more affected first-degree relatives had both a lower PGS_EA (hence, more polygenic risk for neurodevelopmental conditions) and a lower chance of getting a monogenic diagnosis in DDD than probands with no affected relatives (Extended Data Fig. 7a). This emphasizes that if there are several first-degree relatives with neurodevelopmental conditions in a family, this may not necessarily be due to a monogenic cause. Our observation that diagnosed patients with affected parents (most of whom have inherited dominant diagnoses), and their parents, have lower average PGS_EA than those with unaffected parents (Extended Data Fig. 5) is consistent with the effects of parental assortment (Fig. 5).

As most parents of the patients we studied are annotated as clinically unaffected, we hypothesized that they might be over-transmitting polygenic risk to their affected offspring. We saw nominally significant over-transmission of PGS_NDC,DDD from unaffected parents to undiagnosed probands, but saw no significant transmission disequilibrium for PGS_EA or PGS_CP (Fig. 3a), despite these polygenic scores explaining far more variance in risk than PGS_NDC,DDD (Supplementary Table 2). Consistent with this, in a trio model (Fig. 4), we found evidence for a direct genetic effect of PGS_NDC,DDD on risk of neurodevelopmental conditions, but not for other scores tested. Instead, we observed that the parents’ PGS_EA, PGS_CP and PGS_NonCogEA were significantly associated with their children’s risk even after controlling for the children’s polygenic score, indicating a correlation between non-transmitted alleles and the children’s phenotype. Thus, a key conclusion from this work is that the association between common variants and neurodevelopmental conditions is not entirely due to their having direct genetic effects on risk.

The correlation between non-transmitted alleles in the parents and neurodevelopmental conditions in the children may be due to indirect genetic effects, population stratification and/or the consequences of parental assortment^4,29,30,54. Parental assortment induces a correlation between the polygenic score associated with the trait under assortment and the remaining genetic component of the phenotype. This includes the component due to rare variants, which could have a much stronger effect on risk of neurodevelopmental conditions than the common variant component. We demonstrated a correlation between the rare and common variant components that affect cognitive and educational outcomes, both between partners (one parent’s rare variant burden score and the other parent’s polygenic score), and within individuals (an individual’s rare variant burden score and their own polygenic score, in both offspring and parents) (Fig. 5 and Supplementary Figs. 6 and 7). This supports the hypothesis that the association of PGS_EA with lower risk of neurodevelopmental conditions is at least partly due to the assortment-induced correlation of PGS_EA with rare variants affecting both neurodevelopmental conditions and educational attainment. Given that polygenic scores and our rare variant burden scores capture only small fractions of total common and rare variant components of risk, respectively, the actual correlation is substantially higher than the observed estimates. Very large whole-genome sequenced (WGS) datasets will be required to better characterize the total rare variant component of these traits and estimate this correlation more accurately.

With the current study design, we were unable to demonstrate the presence of indirect genetic effects on risk of neurodevelopmental conditions unambiguously, and nor could we test whether, if present, these are mediated by parenting behaviours. However, we did explore whether common genetic variants might influence risk by affecting prenatal risk factors (a form of indirect genetic effects). We found that educational attainment showed a significant negative genetic correlation with preterm delivery, whereas neurodevelopmental conditions showed a significant positive genetic correlation with it, of which only 35% was due to the educational attainment component (Extended Data Fig. 9b). This is consistent with epidemiological studies that found an association between prematurity and poorer cognitive outcomes even after controlling for socioeconomic confounders^37,55. We saw no significant evidence that prematurity mediates the effects of non-transmitted common parental alleles associated with educational attainment (Supplementary Note 7). However, it may be that our analysis was simply underpowered at this sample size, as we did see some attenuation (albeit not significant) of the non-transmitted coefficients for PGS_EA when removing premature probands (Supplementary Fig. 7). Nonetheless, our results emphasize how genetics may confound epidemiological associations between risk factors and neurodevelopmental conditions^56,57, and also suggest that studies seeking to characterize indirect genetic effects on educational outcomes should consider the contribution of prenatal factors.

Our study has several limitations. First, the overall variance in risk of neurodevelopmental conditions explained by common variants is low (roughly 10%) and the polygenic scores tested here explain only a fraction of this. However, these polygenic scores are statistically significant predictors of neurodevelopmental conditions (Supplementary Table 2) and are likely to explain more variance as GWAS sample sizes grow. Second, the reported significance of detected polygenic score effects does not simply reflect the strength of the real associations, but also the power of the original GWAS from which SNP effect sizes were derived. Thus, one must be cautious when comparing effects between polygenic scores for different traits. We explored combining the different polygenic scores into a composite polygenic score to try to improve power; although this explained slightly more variance on the liability scale than PGS_EA (Supplementary Table 2), results from the main analyses were very concordant between this composite polygenic score and PGS_EA (which had the highest weight). Third, the phenotypic heterogeneity of the cohorts probably limits our power and may confound results. For example, missed diagnoses of autism among DDD and GEL participants with neurodevelopmental conditions (perhaps due to the young average age; Supplementary Note 2) could be confounding our result of there being no apparent under-transmission of PGS_EA (Fig. 3a and Supplementary Fig. 3a), as PGS_EA may be over-transmitted to autistic individuals^20,40 but under-transmitted to patients with intellectual disability who are not autistic. In future, larger cohorts with quantitative phenotype data (for example, on IQ or social responsiveness) may allow us to revisit these questions while subsetting to reduce phenotypic heterogeneity. Fourth, the fact that probands in trios tend to have higher PGS_EA than those not in trios (Extended Data Fig. 6b) suggests that the trio probands are a non-random sample, which could potentially induce biases in trio-based analyses; for example, the undiagnosed trio probands may be enriched for monogenic causes in as-yet-undiscovered genes, which could reduce power when assessing over-transmission of polygenic risk (Fig. 3a). Furthermore, many of our analyses are predicated on the assumption that the ‘unaffected parents’ (those reported by the clinician not to have a similar phenotype to the proband) do not have phenotypes related to neurodevelopmental conditions. However, some of them may have (or may have had, earlier in life) relevant phenotypic features (for example, learning difficulty, speech delay) that were not detected and recorded by clinicians. The inclusion of these parents could be reducing power or confounding results in several analyses. Finally, the correlation between the rare and common variant components of neurodevelopmental conditions (Fig. 5), which is probably due to parental assortment, may have confounded several of these analyses.

In future, as GWAS discovery cohorts for both rare neurodevelopmental conditions and related traits increase in size, we will have more power to explore common variant effects on risk, penetrance and phenotypic expressivity of these conditions. These studies should seek to confirm whether there really are no direct genetic effects of common variants influencing educational attainment and cognitive performance on risk of neurodevelopmental conditions, or whether these are just small. To disentangle the contribution of indirect genetic effects and parental assortment to common variant associations with neurodevelopmental conditions, future studies will need to use extended genealogies and/or more sophisticated modelling of the influence of parental assortment on common and rare variants than is possible at present^29,30,54. If these studies also had measures of epidemiological and prenatal risk factors such as prematurity, and of parental phenotypes and nurturing behaviours, one could explore how indirect genetic effects (if present) are mediated, which has potential implications for assessing the modifiability of risk. Larger GWASs for neurodevelopmental conditions will also give us more power to explore the extent to which the common variants affecting these conditions are targeting different pathways and cell types from the rare variants (Supplementary Note 9). Finally, it will be important for future studies to explore the role of polygenic background in neurodevelopmental conditions in families with non-European genetic ancestries.

Methods

Cohort descriptions and phenotypes

DDD

The aim of the DDD study is to find molecular diagnoses for families and patients affected by previously genetically undiagnosed, severe developmental conditions. Recruitment was conducted from 2011 to 2015 across 24 clinical genetics services in the United Kingdom and Ireland⁵⁸. The clinical inclusion criteria included neurodevelopmental conditions, congenital, growth or behavioural abnormalities and dysmorphic features. Probands were systematically phenotyped through DECIPHER⁵⁹ using Human Phenotype Ontology (HPO)⁶⁰ terms and a bespoke online questionnaire that collected information on developmental milestones, growth measurements, number of affected relatives, prematurity, maternal diabetes, and other clinically relevant parameters. The cohort has been described extensively^1,50,58,61.

We focused on probands in the DDD cohort who had neurodevelopmental conditions, which were defined previously by Niemi et al.² Briefly, these were probands who had at least one of the following neurodevelopmental HPO terms or their descendent terms: abnormality of higher mental function (HP:0011446), neurodevelopmental abnormality (HP:0012759), abnormality of the nervous system morphology (HP:0012639), behavioural abnormality (HP:0000708), seizures (HP:0001250), encephalopathy (HP:001298), abnormal synaptic transmission (HP:0012535), or abnormal nervous system electrophysiology (HP:0001311).

GEL project

The 100,000 Genomes project is an initiative by the UK Department of Health and Social Care to sequence the whole genomes of individuals with rare conditions or cancer in the National Health Service^62,63. The rare disease branch of the project consists of sequencing data from roughly 72,000 patients with rare conditions and their relatives, in roughly 34,000 families with a variety of structures. There are more than 190 rare conditions represented in the cohort, and about 23% of the patients have neurodevelopmental conditions. The cohort was sequenced at around 35 times coverage, and variant calling and quality control (QC) were performed by Genomics England^63,64.

Patients from GEL with neurodevelopmental conditions were defined as those recruited under the ‘Neurodevelopmental disorders’ disease subcategory, or with more than one HPO term that was a descendant of ‘Neurodevelopmental Abnormality’ (HP:0012759). We removed probands whose age of onset was above 16 years or who had neurodegenerative conditions.

The set of unrelated GEL controls included patients with cancer above 30 years old (N = 10,469) and unaffected relatives (N = 3,198) of probands with rare conditions who were not in the neurodevelopmental condition set and did not have phenotypes similar to probands from DDD (‘DDD-like’). The DDD-like probands were defined as those who:

1.
were recruited into a disease model that was also used to recruit probands who had previously been recruited into DDD (section below on identifying probands overlapping between the two cohorts), or
2.
had one the top five HPO terms used in DDD and their descendants, namely HP:0000729 (autistic behaviour), HP:0001250 (seizure), HP:0000252 (microcephaly), HP:0000750 (delayed speech and language development), and HP:0001263 (global developmental delay).

Probands recruited into the neurodegenerative disorders subcategory or with an age of onset greater than 16 years were removed from the DDD-like set, as were probands recruited into a disease subcategory for which the average age of probands was older than 16 years.

To define relatedness, we used a file generated by GEL consisting of a pairwise kinship matrix produced using the PLINK2 (refs. ^65,66) implementation of the KING robust algorithm⁶⁷ and a --king-cutoff of 0.0442 (that is, 1/2^4.5).

Control cohorts

The UK Household Longitudinal Study (UKHLS) cohort consists of a continuation of the British Household Panel Survey of individuals living in the United Kingdom^68,69. The Avon Longitudinal Study of Parents and Children (ALSPAC) is a birth cohort study of children born in Avon, England with expected dates of delivery between 1 April 1991 and 31 December 1992 (ref. ⁷⁰). Eligible pregnant women (N = 13,761) were recruited and their children have been phenotyped extensively over the past 30 years. Please note that the study website (http://www.bristol.ac.uk/alspac/researchers/our-data/) contains details of all the data that are available through a fully searchable data dictionary and variable search tool. The MCS is a birth cohort study of children born across the UK during 2000 and 2001 from 18,552 families^71,72. Further information about recruitment of these cohorts is given in Supplementary Note 4.

Ethics

The DDD study has UK Research Ethics Committee approval (10/H0305/83, granted by the Cambridge South Research Ethics Committee and GEN/284/12, granted by the Republic of Ireland Research Ethics Committee). The 100,000 Genomes project was approved by the East of England—Cambridge Central Research Ethics Committee (REF 20/EE/0035). Ethical approval for ALSPAC was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Ethical approval for each sweep of MCS was obtained from NHS Research Ethics Committees (MREC). Ethical approval for the sixth MCS sweep, which included the collection of saliva samples from children and biological resident parents, was obtained from London-Central REC (MREC; 13/LO/1786).

Preparation of genetic data

Individuals from DDD, UKHLS, ALSPAC and MCS were genotyped on various arrays, whereas GEL individuals were whole-genome sequenced. The available data are summarized here briefly:

A subset of the DDD cohort (all children and several thousand parents) was genotyped on three genotype array chips: the Illumina HumanCoreExome chip (CoreExome), the Illumina OmniChipExpress (OmniChip) and the Illumina Infinium Global Screening Array (GSA). Some probands were genotyped on more than one chip, as shown in Supplementary Fig. 9. In downstream analysis, we used the CoreExome and OmniChip data for analyses of probands, and the GSA and OmniChip data for analyses of trios. QC of CoreExome (including DDD patients and 9,270 UKHLS controls genotyped on the same chip) and OmniChip data were performed by Niemi et al.² and we performed QC in the GSA data specifically for this paper (Supplementary Tables 13 and 14). The DDD cohort was also exome sequenced, and those data were used for the analyses involving rare variants.

GEL individuals were whole-genome sequenced with 150 bp paired-end reads using Illumina HiSeqX. Variant calling and QC were performed by Genomics England. We used 78,195 post-QC germline genomes from the Aggregated Variant Calls (aggV2) prepared by the GEL team. We kept variants that passed the QC filters shown in Supplementary Table 15.

Data we received from ALSPAC were processed in two batches⁶⁹. In the first batch, we received post-QC array data for G0 mothers (N = 8,884) who were genotyped on the Illumina Human 660W chip and G1 children (N = 8,932) genotyped on the HumanHap550 quad chip. In the second batch, we received another 2,198 parents (G0 mothers and G0 partners⁷³) who were genotyped on the CoreExome array.

We received data for 21,181 MCS samples who were genotyped using the GSA array chip⁷⁴.

We applied standard QC filters in each dataset separately, described further in Supplementary Methods. We used the maximum subset of unrelated individuals that passed QC. We did not use any statistical methods to predetermine sample sizes.

Genetically predicted ancestry

To avoid spurious results due to population stratification, all genetic analyses were conducted in a genetically homogeneous subset of individuals with genetic similarity to British individuals from the 1,000 Genomes Project⁷⁵, henceforth referred to as having GBR ancestry. The Supplementary Methods provide detailed information on ancestry inference, but we summarize it briefly here. The identification of GBR-ancestry samples from the DDD CoreExome and OmniChip data was described previously². To identify individuals of genetically inferred GBR ancestry in DDD GSA samples, we first projected post-QC samples onto 1,000 Genomes phase 3 individuals⁷⁵ (Supplementary Fig. 10). We then performed another principal component analysis (PCA) within the loosely defined European ancestry subset and identified a homogeneous subgroup (Supplementary Fig. 11) using uniform manifold approximation and projection (UMAP)⁷⁶. As we merged parent–offspring trios genotyped on GSA and OmniChip array chips in downstream analysis, we kept GSA individuals who were similar to OmniChip individuals in terms of genetic ancestry in PCA space (Supplementary Fig. 12). In GEL, we used individuals with genetically inferred European ancestry, which were identified by the GEL bioinformatics team. We further restricted to a homogeneous subset (N = 56,249) that represents White British individuals (Supplementary Fig. 13). Array data received from the ALSPAC all had genetically predicted European ancestry, so we did not perform any filtering based on genetic ancestry. We performed similar PCA and UMAP clustering to identify individuals of GBR ancestry in MCS (Supplementary Figs. 14 and 15), and further filtered to individuals who self-reported as being of White ethnicity.

Relatives within and across cohorts

Within each dataset, we identified up to third-degree relatives (kinship coefficient greater than 0.0442 by KING v.2.2.4 (ref. ⁶⁷) using post-QC genotyped array data or WGS data. We always used a subset of unrelated individuals (that is, more distant than third-degree relatives) in downstream analysis. In analyses using trios, we made sure probands in trios were unrelated and parents were unrelated with parents from other families.

In analyses combining DDD and GEL, we removed from GEL any participants who were also recruited into DDD and/or who were related to DDD participants, and also removed Scottish samples from DDD as we were unable to check whether GEL samples were related to them (Supplementary Methods). We removed individuals from the two birth cohorts who were related to each other or to DDD participants, which left 1,434 and 2,498 trios from ALSPAC and MCS, respectively (Supplementary Methods).

Imputation and post-imputation QC

Imputation of array data was performed in each genotyped cohort separately using the maximum number of variants available after QC. Before imputation, we removed palindromic SNPs, SNPs that were not in the imputation reference panel, and SNPs with mismatched alleles. DDD samples and UKHLS controls who were genotyped on the CoreExome array were imputed with the HRC r1.1 reference panel by Niemi et al.² DDD GSA and OmniChip samples and ALSPAC samples were imputed to the TOPMed r2 reference panel using the TOPMed imputation server, and the MCS samples to the HRC r1.1 reference panel^77,78,79. We kept well-imputed common variants with Minimac4 R² > 0.8 and MAF > 1%. For polygenic score analyses, we subsequently restricted to common variants that passed these QC filters in all genotyped cohorts and also passed QC in the GEL WGS data.

Extraction and QC of rare variants

QC of DDD exome sequencing data and extraction of rare single-nucleotide variants, and insertion and deletions (indels) is summarized in Supplementary Table 16. Indels in the same gene and sample were removed (4% of indels with MAF < 1%), as these were often part of complex mutational events that would require haplotype-aware annotation.

For GEL, details of the QC of single-nucleotide variants and indels in the WGS data are provided by the GEL team^63,64 and variant QC is summarized in Supplementary Table 15. We use a custom python script to extract rare variants from GEL aggregated WGS variant call format files (aggV2). We filtered genotypes to those with genotype quality (GQ) ≥ 20 and read depth (DP) ≥ 10. We removed heterozygous genotypes that did not pass a binomial test of balanced REF and ALT alleles (P < 1 × 10⁻³) or for which ALT/(REF + ALT) (AB ratio) was not between 0.2 and 0.8. We further removed variants with missing high-quality genotypes in more than 5% of all samples in aggV2. We removed indels in the same gene and sample for the same reason described above for DDD.

For MCS, details of the QC of exome sequencing data are in Supplementary Methods.

Defining monogenic diagnoses in patients

DDD

The DDD study identified clinically relevant rare variants from exome sequencing and microarray data using a filtering procedure described in ref. ⁵⁸. The procedure focuses on identifying rare damaging variants that fit an appropriate inheritance mode in a set of genes that cause developmental disorders (DDG2P, https://www.deciphergenomics.org/ddd/ddgenes). Variants that pass clinical filtering are uploaded to DECIPHER⁵⁹, where the patients’ clinicians are asked to classify them as definitely pathogenic, likely pathogenic, uncertain, likely benign or benign. We defined ‘diagnosed’ probands as those with one or more variants either annotated as pathogenic or likely pathogenic in DECIPHER by their referring clinician, or predicted as pathogenic or likely pathogenic using diagnoses autocoded following the American College of Medical Genetics and Genomics guidelines as described in ref. ¹. All remaining probands were classed as ‘undiagnosed’. Probands with a de novo diagnosis are those with a de novo mutation in a monoallelic or X-linked DDG2P gene that was either annotated or predicted as pathogenic or likely pathogenic.

GEL

The probands assigned diagnostic status were those included in the Genomic Medicine Service exit questionnaire, in which a clinician evaluated the pathogenicity of variants of interest identified through GEL’s custom pipeline. We defined diagnosed probands as those that had a pathogenic or likely pathogenic variant that is annotated as partially or fully explaining their phenotype in this exit questionnaire. Probands with a de novo diagnosis are those whose pathogenic or likely pathogenic variants from the exit questionnaire were annotated as de novo protein-truncating or missense variants in DDG2P monoallelic or X-linked genes. We defined undiagnosed probands as those that were present in the exit questionnaire but not annotated as having a pathogenic or likely pathogenic variant and not annotated as ‘yes’ or ‘partially’ in the ‘case_solved_family’ column. We further removed from this undiagnosed set any probands who have potential diagnoses in the Diagnostic Discovery data in GEL, which is a list of variants submitted by researchers that are thought probably to be pathogenic by the GEL clinical team.

Defining trio sample sets in DDD and GEL

The procedure used for filtering trios used in DDD and GEL is shown in Supplementary Fig. 16. Briefly, in DDD, we combined data across GSA and OmniChip arrays and kept trios in which all three members had GBR ancestry and the proband had a neurodevelopmental condition. We excluded trios recruited from Scottish centres and kept unrelated trios. We then split trios into those with both parents unaffected and those with one or both parents affected. These were then categorized as genetically diagnosed or undiagnosed. We applied similar filtering in GEL trios. See Supplementary Methods for more information.

GWAS of neurodevelopmental conditions

We used PLINK v.1.9 to conduct a GWAS comparing individuals with neurodevelopmental conditions (N = 3,618) to controls (N = 13,667) in GEL, controlling for 20 genetic principal components, age and sex. Before running the GWAS, we removed variants with MAF < 1%, missingness > 2% or Hardy–Weinberg equilibrium P < 1 × 10⁻⁵, and performed a differential missingness test between the patients with neurodevelopmental conditions and controls and removed variants with P < 1 × 10⁻⁵. We repeated the GWAS comparing DDD patients with neurodevelopmental conditions on the CoreExome array (N = 6,397) to UKHLS controls (N = 9,270) using PLINK v.1.9, after excluding DDD patients recruited from Scottish centres.

We used METAL⁸⁰ to conduct an inverse-variance-weighted GWAS meta-analysis between the DDD-UKHLS and GEL GWASs. We removed palindromic SNPs with MAF > 0.4 as the strand could not be easily inferred using MAF. We also excluded SNPs with discordant allele frequency (difference > 0.05) between the two cohorts. This left 5,451,801 overlapping SNPs in the meta-analysis.

Heritability

We used several methods to estimate the SNP heritability (the fraction of phenotypic variance explained by genome-wide common variants) on the liability scale assuming a cumulative population prevalence of 1% for rare neurodevelopmental conditions². First, we applied two methods to estimate SNP heritability using individual-level data in DDD and GEL separately. We performed GREML-LDMS⁸¹ stratified by linkage disequilibrium (LD; two bins of equal size) and MAF (three bins: 1–5%, 5–10%, >10%). We also ran phenotype correlation–genotype correlation (PCGC) regression⁸², using the LDAK-Thin Model to compute the kinship matrix using the direct method. We corrected for sex, and ten genetic principal components as covariates in both methods. We then meta-analysed the SNP heritability estimates from DDD and GEL using an inverse-variance-weighted method. We also used linkage disequilibrium score regression (LDSC)⁸³ to estimate SNP heritability using summary statistics from the GWAS of neurodevelopmental conditions in DDD, in GEL, and a meta-analysis of the two cohorts. We used roughly 1 million common SNPs from HapMap3 with precomputed LD scores. We used the effective sample size (4/(1/N_cases + 1/N_controls)) or the sum of two effective sample sizes for the meta-analysis and a sample prevalence of 50% in LDSC, as recommended previously⁸⁴. We presented the GREML-LDMS estimate in the results, because the estimates were similar to PCGC, and LDSC estimates are known to be under-estimated, especially at low sample size. All estimates are reported in Supplementary Table 3.

Genetic correlations

We used LDSC to estimate genetic correlations between the DDD GWAS or the meta-analysed GWAS for neurodevelopmental conditions and various brain-related traits and conditions listed in Supplementary Table 17. We did not use the GEL GWAS to calculate genetic correlations as the SNP heritability was not significantly different from zero according to LDSC.

To estimate the genetic correlations between neurodevelopmental conditions and various brain-related traits or conditions independent of cognitive performance or educational attainment signals, we used genomic structural equation modelling (GenomicSEM)^35,85. We estimated the genetic correlation between the target trait and a latent variable representing the non-cognitive component of neurodevelopmental conditions, which was genetic influences on neurodevelopmental conditions that were not captured in the GWAS for cognitive performance³¹. We applied the GenomicSEM model without SNP effects. We also estimated genetic correlation with the ‘non-educational attainment’ latent variable, which represented genetic influences on neurodevelopmental conditions that were not accounted for by the educational attainment latent variable. We also used GenomicSEM to estimate the percentage of the genetic correlation between neurodevelopmental conditions and the target trait that was explained by latent variables, namely the cognitive and non-cognitive components of neurodevelopmental conditions when conditioning on the cognitive performance GWAS, or EA and non-EA components of neurodevelopmental conditions when conditioning on the educational attainment GWAS (Supplementary Fig. 1 and Extended Data Fig. 9bc). The GenomicSEM model and formulae used to estimate these percentages can be found in Supplementary Fig. 17 and Supplementary Methods.

Calculating polygenic scores

For calculating polygenic scores, we used the set of SNPs that were well imputed in all array cohorts (Minimac4 R² > 0.8), passed QC in GEL aggV2 samples, and had MAF > 1% in all cohorts. We used LDPred⁸⁶ to estimate weights for calculating polygenic scores and an LD reference panel composed of HapMap3 (ref. ⁸⁷) common variants based on 5,000 unrelated individuals of genetically inferred White British ancestry from the UK Biobank⁸⁸ (Supplementary Methods). GWAS summary statistics for years of schooling (a measure for EA)³¹, the non-cognitive component of educational attainment (NonCogEA)³⁵, cognitive performance (CP)³¹, schizophrenia (SCZ)³² and neurodevelopmental conditions² were matched with the list of overlapping SNPs (Supplementary Table 17). PGS_NDC,DDD was evaluated in the DDD OmniChip samples and the GEL samples that were not in the DDD GWAS. To make polygenic scores comparable across cohorts (DDD, GEL, UKHLS, MCS and ALSPAC), we performed a joint PCA across all cohorts and adjusted the raw scores for 20 principal components. For most analyses and unless noted otherwise, residuals were scaled so that the combined set of unrelated control samples from GEL and UKHLS (or GEL controls only for PGS_NDC,DDD) had mean of 0 and s.d. of 1, and the resultant scores were used for all analyses unless otherwise indicated. In Fig. 3b and Extended Data Fig. 5, we instead show principal component-adjusted polygenic scores that were standardized using weighted MCS average polygenic scores that should represent an unbiased estimate representative of the background population (Supplementary Methods). We also constructed composite polygenic scores combining individual polygenic scores (Supplementary Methods).

Analyses of polygenic scores

Evaluating variance explained by polygenic score

We evaluated how much variance in risk of neurodevelopmental conditions was explained by the polygenic score on the liability scale^82,89,90. We compared 6,397 probands with neurodevelopmental conditions from DDD to 9,270 controls from UKHLS, and 3,618 probands with neurodevelopmental conditions from GEL to 13,667 GEL controls defined as described above. We assumed the population prevalence of neurodevelopmental conditions to be 1% (ref. ²).

Comparing polygenic scores between different subsets

We used two-sided t-tests to compare polygenic scores between different groups of probands, parents and controls seen in Figs. 2a and 3b, Extended Data Figs. 5 and 6 and Supplementary Tables 5–7. We report the mean difference in principal component-corrected polygenic scores between groups. Groups who were compared with each other include:

Combined set of controls from GEL and UKHLS
Control individuals from UK birth cohorts, ALSPAC and MCS
Undiagnosed neurodevelopmental condition (NDC) probands regardless of trio status
Diagnosed NDC probands regardless of trio status
Undiagnosed NDC probands for whom both parents are unaffected
Unaffected parents of undiagnosed NDC probands
Undiagnosed NDC probands with one or both parents affected
Affected parents of undiagnosed NDC probands
Diagnosed NDC probands for whom both parents are unaffected
Unaffected parents of diagnosed NDC probands
NDC probands with de novo diagnoses for whom both parents are unaffected
Unaffected parents of NDC probands with de novo diagnoses
Diagnosed NDC probands with one or both parents affected
Affected parents of diagnosed NDC probands.

Note that ‘undiagnosed’ and ‘diagnosed’ here indicate whether the patient has a monogenic diagnosis. The sample size of each subset is listed in Supplementary Table 1. We excluded controls from UKHLS as well as DDD CoreExome and GSA probands when testing the DDD-derived polygenic score for neurodevelopmental conditions (as these had been included in the original GWAS, whereas the individuals genotyped on the OmniChip had not). All the t-tests involving probands with a neurodevelopmental condition or their parents were performed in samples from DDD and GEL combined.

We also compared female probands versus male probands without a monogenic diagnosis regardless of trio status (2,427 and 1,574 male probands from DDD and GEL, and 1,426 and 918 female probands from DDD and GEL), and unaffected mothers versus unaffected fathers (1,523 trios from DDD and 1,343 trios from GEL) using two-sided t-tests (Extended Data Fig. 8ab).

Polygenic score and diagnostic status

We compared average polygenic scores in probands with a neurodevelopmental condition with and without a monogenic diagnosis using two-sided t-tests, combining probands from DDD and GEL regardless of whether they were in a trio or not. We compared subgroups from families affected by neurodevelopmental conditions to the combined control set from UKHLS and GEL, as well as to unrelated children from the MCS cohort who were reweighted using available sociodemographic data to make them more representative of the general UK population (Supplementary Note 4).

Within DDD (N = 7,549 without excluding Scottish samples or samples who were related to GEL participants), we tested whether the proband’s PGS_EA was associated with factors affecting getting a diagnosis in linear regression models:

P G S \sim f a c t o r

${\rm{PGS}}\sim {\rm{factor}}$

Note that we use the tilde symbol to indicate that the variable before the tilde was regressed on the variable(s) after the tilde. We investigated the following binary factors: trio status (N = 5,507 with both parents exome sequenced but not necessarily genotyped), proband sex (N = 4,421 male probands), whether the proband had any affected first-degree relatives (N = 1,623), whether the proband was born preterm (N = 1,098 with gestation <37 weeks), whether the mother had diabetes (N = 242) and whether the proband had severe intellectual disability or developmental delay (ID/DD; N = 941) versus mild or moderate ID/DD (N = 1,887). We compared probands with the above-mentioned characteristics to all other probands, except when comparing probands with severe versus mild or moderate ID/DD for which we excluded probands without ID/DD or with ID/DD of unknown severity. We also investigated a continuous factor, the degree of consanguinity, quantified by the fraction of the genome in runs of homozygosity (F_ROH) divided by 0.0625, which is the expected fraction given a first-cousin marriage.

We also tested whether the mother’s or father’s PGS_EA was associated with the above factors, in a total of 2,497 samples; we did not test for association with trio status as parental genotype data were only available for full trios anyway.

To assess how the association between the above-mentioned factors and diagnostic status changed after correcting for proband’s PGS_EA, as well as how the association between proband’s PGS_EA and diagnostic status changed after controlling for these factors, we fitted the following logistic regression models:

D i a g n o s t i c s t a t u s \sim f a c t o r

${\rm{Diagnostic\; status}}\sim {\rm{factor}}$

D i a g n o s t i c s t a t u s \sim P G S

${\rm{Diagnostic\; status}}\sim {\rm{PGS}}$

D i a g n o s t i c s t a t u s \sim P G S + f a c t o r

${\rm{Diagnostic\; status}}\sim {\rm{PGS}}+{\rm{factor}}$

We also fitted a joint model to assess the effect of PGS_EA on diagnostic status controlling for both trio status and prematurity, which showed significant associations with both PGS_EA and diagnostic status. We excluded from this joint model factors that were not associated with PGS_EA or diagnostic status within the DDD samples with European ancestry (sex, maternal diabetes and F_ROH), and factors that are likely to be the consequence of having or not having a monogenic diagnosis, rather than a cause of getting a diagnosis (severity of ID/DD and having affected family members).

See the Supplementary Methods for a description of estimation of the odds ratio of diagnosis for different configurations of affected relatives shown in Extended Data Fig. 7a.

Evaluating over-transmission of polygenic scores

We conducted polygenic transmission disequilibrium tests (pTDTs) in undiagnosed and diagnosed probands from DDD (N = 1,523 undiagnosed, 443 diagnosed) and GEL (N = 1,343 undiagnosed, 507 diagnosed) combined. We also conducted pTDTs in these trios excluding autistic probands.

The pTDT is a two-sided one-sample t-test of the probands’ polygenic score deviation from expectation, which is their parents’ mean polygenic score. The pTDT deviation is defined as:

p T D T d e v i a t i o n = {P G S}_{c h i l d} - ({P G S}_{m o t h e r} + {P G S}_{f a t h e r}) / 2

${\rm{pTDT}}\;{\rm{deviation}}={{\rm{PGS}}}_{{\rm{child}}}-({{\rm{PGS}}}_{{\rm{mother}}}+{{\rm{PGS}}}_{{\rm{father}}})/2$

To evaluate whether the pTDT deviation is significantly different from 0, the pTDT test statistic (t_pTDT) is defined as:

t_{p T D T} = \frac{m e a n (p T D T d e v i a t i o n)}{\frac{s . d . (p T D T d e v i a t i o n)}{\sqrt{n}}}

${t}_{{\rm{pTDT}}}=\frac{{\rm{mean}}({\rm{pTDT}}\;{\rm{deviation}})}{\frac{{\rm{s.d.}}({\rm{pTDT}}\;{\rm{deviation}})}{\sqrt{n}}}$

Association with non-transmitted alleles

Alleles in parents that are not transmitted to the child can still influence the child’s phenotype by affecting the parents’ behaviour. This phenomenon is called genetic nurture or indirect genetic effects^4,26,30. Alleles that are transmitted to the child can influence the child’s phenotype both directly (direct genetic effects) and indirectly through other relatives who carry the same alleles (indirect genetic effects) and whose behaviour is influenced by those alleles. Kong et al. proposed to estimate the direct genetic effect as δ = θ_T − θ_NT, where θ_T indicates the effect of parental transmitted alleles and θ_NT indicates the effect of parental non-transmitted alleles, which capture both the indirect genetic effects and potential confounding factors^4,91. We can estimate θ_T and θ_NT of a given polygenic score in the following regression model:

{c h i l d}^{'} s p h e n o t y p e \sim {\hat{θ}}_{T} \times {P G S}_{T} + {\hat{θ}}_{N T} \times {P G S}_{N T}

${{\rm{c}}{\rm{h}}{\rm{i}}{\rm{l}}{\rm{d}}}^{{\prime} }\,{\rm{s}}\,{\rm{p}}{\rm{h}}{\rm{e}}{\rm{n}}{\rm{o}}{\rm{t}}{\rm{y}}{\rm{p}}{\rm{e}}\sim {\hat{\theta }}_{{\rm{T}}}\times {{\rm{P}}{\rm{G}}{\rm{S}}}_{{\rm{T}}}+{\hat{\theta }}_{{\rm{N}}{\rm{T}}}\times {{\rm{P}}{\rm{G}}{\rm{S}}}_{{\rm{N}}{\rm{T}}}$

where PGS_T is a polygenic score calculated using transmitted alleles (which is the child’s polygenic score), and PGS_NT is a polygenic score calculated using parental non-transmitted alleles, which is equivalent to the difference between the sum of parents’ polygenic scores and the child’s polygenic score. This model can also be rewritten as:

\begin{array}{l} {c h i l d}^{'} s p h e n o t y p e \sim ({\hat{θ}}_{T} - {\hat{θ}}_{N T}) \times {P G S}_{c h i l d} \\ + {\hat{θ}}_{N T} \times ({P G S}_{m o t h e r} + {P G S}_{f a t h e r}) \end{array}

$\begin{array}{l}{{\rm{child}}}^{{\prime} }\,{\rm{s}}\,{\rm{phenotype}}\sim ({\widehat{\theta }}_{{\rm{T}}}-{\widehat{\theta }}_{{\rm{NT}}})\,\times \,{{\rm{PGS}}}_{{\rm{child}}}\\ \,\,\,\,\,\,\,\,\,+\,{\widehat{\theta }}_{{\rm{NT}}}\times ({{\rm{PGS}}}_{{\rm{mother}}}+{{\rm{PGS}}}_{{\rm{father}}})\end{array}$

Therefore, in a regression model in which the child’s polygenic score and parents’ polygenic scores are both fitted, the coefficient on the child’s polygenic score captures the direct genetic effect, and the coefficient on parents’ polygenic scores captures the association between non-transmitted alleles and the child’s phenotype. The latter may reflect true indirect genetic effects as well as confounding effects such as uncorrected population stratification and parental assortment²⁹. Thus, we refer to the coefficients on parents’ polygenic scores in this model as ‘non-transmitted coefficients’ rather than simply ‘indirect genetic effects’, following Young et al.²⁴, as they are mathematically equivalent to the coefficients on the polygenic score constructed from the non-transmitted alleles in a joint regression with the proband’s polygenic score.

We evaluated direct genetic effects ( $\hat{\delta }$ ) and effects of maternal and paternal non-transmitted common alleles ( ${\widehat{\theta }}_{m,{\rm{NT}}}$ and ${\widehat{\theta }}_{f,{\rm{NT}}}$ ) on case status in the following trio model using logistic regression on polygenic scores:

1_{N D C s t a t u s} \sim \hat{δ} \times {P G S}_{c h i l d} + {\hat{θ}}_{m, N T} \times {P G S}_{m o t h e r} + {\hat{θ}}_{f, N T} \times {P G S}_{f a t h e r}

${1}_{{\rm{N}}{\rm{D}}{\rm{C}}{\rm{s}}{\rm{t}}{\rm{a}}{\rm{t}}{\rm{u}}{\rm{s}}}\sim \hat{\delta }\times {{\rm{P}}{\rm{G}}{\rm{S}}}_{{\rm{c}}{\rm{h}}{\rm{i}}{\rm{l}}{\rm{d}}}+{\hat{\theta }}_{m,{\rm{N}}{\rm{T}}}\times {{\rm{P}}{\rm{G}}{\rm{S}}}_{{\rm{m}}{\rm{o}}{\rm{t}}{\rm{h}}{\rm{e}}{\rm{r}}}+{\hat{\theta }}_{f,{\rm{N}}{\rm{T}}}\times {{\rm{P}}{\rm{G}}{\rm{S}}}_{{\rm{f}}{\rm{a}}{\rm{t}}{\rm{h}}{\rm{e}}{\rm{r}}}$

where 1_{NDC status} is an indicator variable for whether the individual is a case with a neurodevelopmental condition (1) or control (0). We also ran the regression without correcting for parents’ polygenic scores (proband-only model) in the same samples for comparison:

1_{N D C s t a t u s} \sim {\hat{θ}}_{T} \times {P G S}_{c h i l d}

${1}_{{\rm{N}}{\rm{D}}{\rm{C}}{\rm{s}}{\rm{t}}{\rm{a}}{\rm{t}}{\rm{u}}{\rm{s}}}\sim {\hat{\theta }}_{{\rm{T}}}\times {{\rm{P}}{\rm{G}}{\rm{S}}}_{{\rm{c}}{\rm{h}}{\rm{i}}{\rm{l}}{\rm{d}}}$

Probands with a neurodevelopmental condition were from DDD and GEL trios where the proband was undiagnosed and both parents were unaffected (N = 2,866 trios). Control samples were trios from the two birth cohorts (ALSPAC and MCS, N = 1,434 and N = 2,498, respectively) as well as trios from GEL where the proband did not have DDD-like developmental disorders or neurodevelopmental conditions (N = 872).

We verified that the polygenic scores in the trio model did not show excessive collinearity (Supplementary Methods).

We performed various sensitivity analyses in the following subsets (Supplementary Fig. 4): patients versus controls from GEL trios only, and patients from GEL and DDD versus each of the three control cohorts separately (GEL, MCS or ALSPAC). We also conducted the analysis while controlling for the rare variant burden score (RVBS) in GEL trios (Extended Data Fig. 10b; section below on ‘Analyses of polygenic scores and rare coding variants’).

\begin{matrix} 1_{N D C s t a t u s} \sim {P G S}_{c h i l d} + {R V B S}_{c h i l d} + {P G S}_{m o t h e r} \\ + {R V B S}_{m o t h e r} + {P G S}_{f a t h e r} + {R V B S}_{f a t h e r} \end{matrix}

$\begin{array}{c}{1}_{{\rm{N}}{\rm{D}}{\rm{C}}{\rm{s}}{\rm{t}}{\rm{a}}{\rm{t}}{\rm{u}}{\rm{s}}}\sim {{\rm{P}}{\rm{G}}{\rm{S}}}_{{\rm{c}}{\rm{h}}{\rm{i}}{\rm{l}}{\rm{d}}}+{{\rm{R}}{\rm{V}}{\rm{B}}{\rm{S}}}_{{\rm{c}}{\rm{h}}{\rm{i}}{\rm{l}}{\rm{d}}}+{{\rm{P}}{\rm{G}}{\rm{S}}}_{{\rm{m}}{\rm{o}}{\rm{t}}{\rm{h}}{\rm{e}}{\rm{r}}}\\ \,\,\,\,\,+\,{{\rm{R}}{\rm{V}}{\rm{B}}{\rm{S}}}_{{\rm{m}}{\rm{o}}{\rm{t}}{\rm{h}}{\rm{e}}{\rm{r}}}+\,{{\rm{P}}{\rm{G}}{\rm{S}}}_{{\rm{f}}{\rm{a}}{\rm{t}}{\rm{h}}{\rm{e}}{\rm{r}}}+{{\rm{R}}{\rm{V}}{\rm{B}}{\rm{S}}}_{{\rm{f}}{\rm{a}}{\rm{t}}{\rm{h}}{\rm{e}}{\rm{r}}}\end{array}$

We restricted this latter analysis to GEL trios to minimize artefactual differences in rare variant calling and QC between cases and controls, which could otherwise create spurious associations.

See the Supplementary Methods for a description of how we modified the running of this trio model to investigate the hypothesis that the effects of non-transmitted alleles associated with educational attainment and cognition might be mediated by prematurity.

Analyses of polygenic scores and rare coding variants

Sequence data from DDD, GEL and MCS were annotated with the Ensembl Variant Effect Predictor (VEP)⁹². We kept the ‘worst consequence’ annotation across transcripts. From parents and probands, we extracted autosomal heterozygous PTVs (transcript ablation, frameshift, splice acceptor, splice donor and stop gained) annotated as high-confidence by LOFTEE⁹³ (HC PTVs), as well as variants in the following classes that we grouped as ‘missense’: missense, stop lost, start lost, inframe insertion, inframe deletion and loss-of-function variants annotated as low-confidence by LOFTEE⁹³. We retained rare variants with MAF < 1 × 10⁻⁵ in each gnomAD super-population and MAF < 1 × 10⁻⁴ in the respective cohorts.

We considered four (non-mutually exclusive) groups of damaging rare variants:

1.
HC PTVs in constrained genes (pLI > 0.9)⁹⁴
2.
HC PTVs and missense variants (MPC ≥ 2)⁹⁵ in constrained genes (pLI > 0.9)
3.
HC PTVs in monoallelic DDG2P genes with a loss-of-function mechanism (that is, ‘absent gene product’)
4.
HC PTVs and missense variants (MPC ≥ 2) in monoallelic DDG2P genes with a loss-of-function mechanism.

We retained probands and parents who were heterozygous for these variants. We required the variants in the children to have been inherited from a parent.

To investigate whether parental assortment leads to correlated rare and common variant burden, we calculated rare variant burden scores as the number of rare variants in the classes described above, then calculated the Pearson’s correlation coefficients between rare variant burden scores and polygenic scores using the ‘cor’ function in R. We used trios in which both parents were unaffected in this analysis. Rare variant burden scores were corrected for 20 genetic principal components using linear regression models. We then calculated the correlation coefficients between the principal component-adjusted rare variant burden scores in parents and the principal component-adjusted polygenic scores in their partners. We also assessed the correlation within the same person among either children or parents. We repeated the analysis in subsets of trios in which the proband was undiagnosed as well as in trios in which the proband had a monogenic de novo diagnosis (Supplementary Fig. 6). The main analysis in Fig. 5 and the sensitivity analysis in Extended Data Fig. 10b is based on group 2 above, whereas Supplementary Figs. 6–8 show the results for all four groups of variants. To investigate whether the results were affected by uncorrected population structure, we also calculated rare variant burden scores using rare synonymous variants in either monoallelic DDG2P genes with a loss-of-function mechanism or constrained genes, and assessed their correlation with polygenic score.

To assess whether polygenic scores modify penetrance of rare inherited variants, we conducted one-sided paired t-tests comparing the polygenic score between unaffected parents transmitting a damaging variant to their affected offspring who inherited the variant (Supplementary Fig. 8). We hypothesized that the unaffected parents would have a more protective polygenic background than their affected offspring (indicated by higher PGS_EA, PGS_CP, PGS_NonCogEA and lower PGS_SCZ, PGS_NDC,DDD). If more than one parent transmitted a variant to a proband, one parent–child pair was chosen at random from the trio. We used trios in which the proband was undiagnosed and both parents were unaffected in this analysis.

Construction and use of weights for MCS

We were concerned that control cohorts might not be random samples of the population with respect to educational attainment, and that this might bias our effect sizes for the difference in polygenic scores between cases and controls (Supplementary Note 4). We decided to use MCS, for which extensive sociodemographic data are available, to calculate a mean polygenic score that would be representative of the general population, using inverse-probability weighting. MCS deliberately oversampled minority ethnic and disadvantaged individuals in the United Kingdom⁹⁶ (sampling bias), and they provide sampling weights to account for this. Furthermore, missingness in each wave of data collection, including the collection of DNA for genotyping, was non-random (non-response bias). To correct for non-response bias, we produced non-response weights per individual using the inverse of the probability of being genotyped estimated from a logistic regression, considering covariates collected at the first study sweep, as previously described^96,97 (Supplementary Methods). We fitted the model to predict who was in the sample of unrelated children of GBR ancestry with genotype data (N = 5,884 of 6,036 children who had complete data for these covariates), and separately to predict who was in the subset of these that also had genotype data on both parents (N = 2,445 of 2,498 trio children who had no missingness). To produce weights that account for both sampling bias and non-response bias, we multiplied the non-response weight from regression models by the sampling weights provided by MCS. These weights were then used to calculate adjusted polygenic scores shown in Fig. 3b and Extended Data Figs. 5 and 6c and adjusted correlation between polygenic score and rare variant burden score shown in Supplementary Fig. 7.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The raw and post-QC genotype array data and exome sequence data from DDD are available through European Genome-phenome Archive, under EGAS00001000775. WGS data and phenotypic data from the 100,000 Genomes project can be accessed by application to Genomics England (https://www.genomicsengland.co.uk/research/academic/join-gecip). GWAS summary statistics of neurodevelopmental conditions generated in this study are available at Figshare (https://doi.org/10.6084/m9.figshare.27060895)⁹⁸. Researchers can apply to access genotype array data from UKHLS (https://www.understandingsociety.ac.uk/documentation/access-data/), ALSPAC (https://www.bristol.ac.uk/alspac/researchers/access/) and MCS (https://cls.ucl.ac.uk/data-access-training/data-access/). Publicly available GWAS summary statistics can be accessed at various resources: http://www.thessgac.org/data, https://pgc.unc.edu/for-researchers/download-results/ and https://egg-consortium.org/Gestational-duration-2023.html. DDG2P genes can be downloaded at https://www.deciphergenomics.org/ddd/ddgenes.

Code availability

We used publicly available software: LDpred (https://github.com/bvilhjal/ldpred), LDSC (https://github.com/bulik/ldsc), GCTA-LDMS (https://yanglab.westlake.edu.cn/software/gcta/#GREMLinWGSorimputeddata), PCGC regression (https://dougspeed.com/pcgc-regression/) and GenomicSEM (https://github.com/PerlineDemange/non-cognitive/blob/master/GenomicSEM/Genetic%20correlations/Without%20using%20SNP%20effects/function_rG_woSNP.R). Custom code is available on GitHub (https://github.com/QinqinHuang/NDC_polygenic).

References

Wright, C. F. et al. Genomic diagnosis of rare pediatric disease in the United Kingdom and Ireland. N. Engl. J. Med. 388, 1559–1571 (2023).
Article CAS PubMed PubMed Central Google Scholar
Niemi, M. E. K. et al. Common genetic variants contribute to risk of rare severe neurodevelopmental disorders. Nature 562, 268–271 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Kurki, M. I. et al. Contribution of rare and common variants to intellectual disability in a sub-isolate of Northern Finland. Nat. Commun. 10, 410 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Kong, A. et al. The nature of nurture: effects of parental genotypes. Science 359, 424–428 (2018).
Article ADS CAS PubMed Google Scholar
Falconer, D. S. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. 29, 51–76 (1965).
Article Google Scholar
Nguengang Wakap, S. et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur. J. Hum. Genet. 28, 165–173 (2020).
Article PubMed Google Scholar
Sanders, S. J. et al. A framework for the investigation of rare genetic disorders in neuropsychiatry. Nat. Med. 25, 1477–1487 (2019).
Article CAS PubMed PubMed Central Google Scholar
Srivastava, S. et al. Meta-analysis and multidisciplinary consensus statement: exome sequencing is a first-tier clinical diagnostic test for individuals with neurodevelopmental disorders. Genet. Med. 21, 2413–2421 (2019).
Article PubMed PubMed Central Google Scholar
Gardner, E. J. et al. Reduced reproductive success is associated with selective constraint on human genes. Nature 603, 858–863 (2022).
Article ADS CAS PubMed Google Scholar
Chen, C.-Y. et al. The impact of rare protein coding genetic variation on adult cognitive function. Nat. Genet. 55, 927–938 (2023).
Article CAS PubMed PubMed Central Google Scholar
Kingdom, R., Beaumont, R. N., Wood, A. R., Weedon, M. N. & Wright, C. F. Genetic modifiers of rare variants in monogenic developmental disorder loci. Nat. Genet. 56, 861–868 (2024).
Article CAS PubMed PubMed Central Google Scholar
Murray, R. M., Bhavsar, V., Tripoli, G. & Howes, O. 30 Years on: how the neurodevelopmental hypothesis of schizophrenia morphed into the developmental risk factor model of psychosis. Schizophr. Bull. 43, 1190–1196 (2017).
Article PubMed PubMed Central Google Scholar
O’Brien, H. E. et al. Expression quantitative trait loci in the developing human brain and their enrichment in neuropsychiatric disorders. Genome Biol. 19, 194 (2018).
Article PubMed PubMed Central Google Scholar
Mallard, T. T. et al. Multivariate GWAS of psychiatric disorders and their cardinal symptoms reveal two dimensions of cross-cutting genetic liabilities. Cell Genom. 2, 100140 (2022).
Wolstencroft, J. et al. Neuropsychiatric risk in children with intellectual disability of genetic origin: IMAGINE, a UK national cohort study. Lancet Psychiatry 9, 715–724 (2022).
Article PubMed PubMed Central Google Scholar
Marquis, S. M., McGrail, K. & Hayes, M. V. A population-level study of the mental health of siblings of children who have a developmental disability. SSM Popul. Health 8, 100441 (2019).
Article PubMed PubMed Central Google Scholar
Baker, K. et al. Childhood intellectual disability and parents’ mental health: integrating social, psychological and genetic influences. Br. J. Psychiatry 218, 315–322 (2021).
Article PubMed Google Scholar
Alexander-Bloch, A. et al. Copy number variant risk scores associated with cognition, psychopathology, and brain structure in youthsin the Philadelphia Neurodevelopmental Cohort. JAMA Psychiatry 79, 699–709 (2022).
Article PubMed PubMed Central Google Scholar
Chawner, S. J. R. A. et al. Genotype-phenotype associations in children with copy number variants associated with high neuropsychiatric risk in the UK (IMAGINE-ID): a case-control cohort study. Lancet Psychiatry 6, 493–505 (2019).
Article PubMed Google Scholar
Antaki, D. et al. A phenotypic spectrum of autism is attributable to the combined effects of rare variants, polygenic risk and sex. Nat. Genet. 54, 1284–1292 (2022).
Wright, C. F. et al. Making new genetic diagnoses with old data: iterative reanalysis and reporting from genome-wide data in 1,133 families with developmental disorders. Genet. Med. 20, 1216–1223 (2018).
Article PubMed PubMed Central Google Scholar
Kuchenbaecker, K. B. et al. Evaluation of polygenic risk scores for breast and ovarian cancer risk prediction in BRCA1 and BRCA2 mutation carriers. J. Natl Cancer Inst. 109, djw302 (2017).
Okbay, A. et al. Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nat. Genet. 54, 437–449 (2022).
Article CAS PubMed PubMed Central Google Scholar
Young, A. I. et al. Mendelian imputation of parental genotypes improves estimates of direct genetic effects. Nat. Genet. 54, 897–905 (2022).
Article CAS PubMed PubMed Central Google Scholar
Howe, L. J. et al. Within-sibship genome-wide association analyses decrease bias in estimates of direct genetic effects. Nat. Genet. 54, 581–592 (2022).
Article CAS PubMed PubMed Central Google Scholar
Demange, P. A. et al. Estimating effects of parents’ cognitive and non-cognitive skills on offspring education using polygenic scores. Nat. Commun. 13, 4801 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Bates, T. C. et al. Social competence in parents increases children’s educational attainment: replicable genetically-mediated effects of parenting revealed by non-transmitted DNA. Twin Res. Hum. Genet. 22, 1–3 (2019).
Article PubMed Google Scholar
Wang, B. et al. Robust genetic nurture effects on education: a systematic review and meta-analysis based on 38,654 families across 8 cohorts. Am. J. Hum. Genet. 108, 1780–1791 (2021).
Article CAS PubMed PubMed Central Google Scholar
Young, A. S. Estimation of indirect genetic effects and heritability under assortative mating. Preprint at bioRxiv https://doi.org/10.1101/2023.07.10.548458 (2023).
Nivard, M. G. et al. More than nature and nurture, indirect genetic effects on children’s academic achievement are consequences of dynastic social processes. Nat. Hum. Behav. 8, 771–778 (2024).
Article PubMed Google Scholar
Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).
Article CAS PubMed PubMed Central Google Scholar
Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502–508 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Davies, G. et al. Study of 300,486 individuals identifies 148 independent genetic loci influencing general cognitive function. Nat. Commun. 9, 2098 (2018).
Article ADS PubMed PubMed Central Google Scholar
Demontis, D. et al. Genome-wide analyses of ADHD identify 27 risk loci, refine the genetic architecture and implicate several cognitive domains. Nat. Genet. 55, 198–208 (2023).
Article CAS PubMed PubMed Central Google Scholar
Demange, P. A. et al. Investigating the genetic architecture of noncognitive skills using GWAS-by-subtraction. Nat. Genet. 53, 35–44 (2021).
Article CAS PubMed PubMed Central Google Scholar
The Brainstorm Consortium et al. Analysis of shared heritability in common disorders of the brain. Science 360, eaap8757 (2018).
Joseph, R. M. et al. Neurocognitive and academic outcomes at age 10 years of extremely preterm newborns. Pediatrics 137, e20154343 (2016).
Huang, J., Zhu, T., Qu, Y. & Mu, D. Prenatal, perinatal and neonatal risk factors for intellectual disability: a systemic review and meta-analysis. PLoS ONE 11, e0153655 (2016).
Article PubMed PubMed Central Google Scholar
Morelli, S., Nolan, B., Palomino, J. C. & Van Kerm, P. The wealth (disadvantage) of single-parent households. Ann. Am. Acad. Pol. Soc. Sci. 702, 188–204 (2022).
Article Google Scholar
Weiner, D. J. et al. Polygenic transmission disequilibrium confirms that common and rare variation act additively to create risk for autism spectrum disorders. Nat. Genet. 49, 978–985 (2017).
Article CAS PubMed PubMed Central Google Scholar
Solé-Navais, P. et al. Genetic effects on the timing of parturition and links to fetal birth weight. Nat. Genet. 55, 559–567 (2023).
Article PubMed PubMed Central Google Scholar
Granés, L., Torà-Rocamora, I., Palacio, M., De la Torre, L. & Llupià, A. Maternal educational level and preterm birth: exploring inequalities in a hospital-based cohort study. PLoS ONE 18, e0283901 (2023).
Article PubMed PubMed Central Google Scholar
Horwitz, T. B., Balbona, J. V., Paulich, K. N. & Keller, M. C. Evidence of correlations between human partners based on systematic reviews and meta-analyses of 22 traits and UK Biobank analysis of 133 traits. Nat. Hum. Behav. 7, 1568–1583 (2023).
Article PubMed PubMed Central Google Scholar
Nordsletten, A. E. et al. Patterns of nonrandom mating within and across 11 major psychiatric disorders. JAMA Psychiatry 73, 354–361 (2016).
Article PubMed PubMed Central Google Scholar
Cabrera-Mendoza, B., Wendt, F. R., Pathak, G. A., Yengo, L. & Polimanti, R. The impact of assortative mating, participation bias and socioeconomic status on the polygenic risk of behavioural and psychiatric traits. Nat. Hum. Behav. 8, 976–987 (2024).
Article PubMed Google Scholar
Smolen, C. et al. Assortative mating and parental genetic relatedness contribute to the pathogenicity of variably expressive variants. Am. J. Hum. Genet. 110, 2015–2028 (2023).
Article CAS PubMed PubMed Central Google Scholar
Yengo, L. et al. Imprint of assortative mating on the human genome. Nat. Hum. Behav. 2, 948–954 (2018).
Article PubMed PubMed Central Google Scholar
Fenner, E. et al. Rare coding variants in schizophrenia-associated genes affect generalised cognition in the UK Biobank. Preprint at medRxiv https://doi.org/10.1101/2023.08.14.23294074 (2023).
Kingdom, R. et al. Rare genetic variants in genes and loci linked to dominant monogenic developmental disorders cause milder related phenotypes in the general population. Am. J. Hum. Genet. 109, 1308–1316 (2022).
Article CAS PubMed PubMed Central Google Scholar
Deciphering Developmental Disorders Study. Prevalence and architecture of de novo mutations in developmental disorders. Nature 542, 433–438 (2017).
Article Google Scholar
Border, R. et al. Cross-trait assortative mating is widespread and inflates genetic correlation estimates. Science 378, 754–761 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Smail, C. et al. Complex trait associations in rare diseases and impacts on Mendelian variant interpretation. Nat. Commun. 15, 8196 (2024).
Yates, E. F. & Mulkey, S. B. Viral infections in pregnancy and impact on offspring neurodevelopment: mechanisms and lessons learned. Pediatr. Res. 96, 64–72 (2024).
Article PubMed Google Scholar
Balbona, J. V., Kim, Y. & Keller, M. C. Estimation of parental effects using polygenic scores. Behav. Genet. 51, 264–278 (2021).
Article PubMed PubMed Central Google Scholar
Beauregard, J. L., Drews-Botsch, C., Sales, J. M., Flanders, W. D. & Kramer, M. R. Does socioeconomic status modify the association between preterm birth and children’s early cognitive ability and kindergarten academic achievement in the United States? Am. J. Epidemiol. 187, 1704–1713 (2018).
Article PubMed PubMed Central Google Scholar
Madley-Dowd, P. et al. Maternal smoking during pregnancy and offspring intellectual disability: sibling analysis in an intergenerational Danish cohort. Psychol. Med. 52, 1847–1856 (2022).
Article PubMed Google Scholar
Havdahl, A. et al. Associations between pregnancy-related predisposing factors for offspring neurodevelopmental conditions and parental genetic liability to attention-deficit/hyperactivity disorder, autism, and schizophrenia: the Norwegian Mother, Father and Child Cohort Study (MoBa). JAMA Psychiatry 79, 799–810 (2022).
Article PubMed PubMed Central Google Scholar
Wright, C. F. et al. Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. Lancet 385, 1305–1314 (2015).
Article PubMed PubMed Central Google Scholar
Firth, H. V. et al. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am. J. Hum. Genet. 84, 524–533 (2009).
Köhler, S. et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42, D966–D974 (2014).
Article ADS PubMed Google Scholar
Deciphering Developmental Disorders Study. Large-scale discovery of novel genetic causes of developmental disorders. Nature 519, 223–228 (2015).
Article ADS Google Scholar
Turnbull, C. et al. The 100 000 Genomes Project: bringing whole genome sequencing to the NHS. Brit. Med. J. 361, k1687 (2018).
Article PubMed Google Scholar
Turro, E. et al. Whole-genome sequencing of patients with rare diseases in a national health system. Nature 583, 96–102 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Genomics England Trusted research Environment User Guide: Aggregated variant calls. Genomics England https://re-docs.genomicsengland.co.uk/aggv2/ (2014).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
Article CAS PubMed PubMed Central Google Scholar
McFall, S., Petersen, J., Kaminska, O. & Lynn, P. Understanding Society—The UK Household Longitudinal Study: Waves 2 and 3 Nurse Health Assessment, 2010–2012 Guide to Nurse Health. UK Data Service https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=7251 (2024).
Boyd, A. et al. Cohort profile: the ‘children of the 90s’–the index offspring of the Avon Longitudinal Study of Parents and Children. Int. J. Epidemiol. 42, 111–127 (2013).
Article PubMed Google Scholar
Fraser, A. et al. Cohort profile: the Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort. Int. J. Epidemiol. 42, 97–110 (2013).
Article PubMed Google Scholar
Connelly, R. & Platt, L. Cohort profile: UK Millennium Cohort Study (MCS). Int. J. Epidemiol. 43, 1719–1725 (2014).
Article PubMed Google Scholar
Joshi, H. & Fitzsimons, E. The Millennium Cohort Study: the making of a multi-purpose resource for social science and policy. Longit. Life Course Stud. 7, 409–430 (2016).
Article Google Scholar
Northstone, K. et al. The Avon Longitudinal Study of Parents and Children. ALSPAC G0 Partners: a cohort profile (Wellcome Open Research, 2023).
Fitzsimons, E. et al. Collection of genetic data at scale for a nationally representative population: the UK Millennium Cohort Study. Longit. Life Course Stud. 13, 169–187 (2021).
Article PubMed Google Scholar
1000 Genomes Project Consortium. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at arXiv https://arxiv.org/abs/1802.03426 (2018).
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Article CAS PubMed PubMed Central Google Scholar
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Article CAS PubMed PubMed Central Google Scholar
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).
Article CAS PubMed PubMed Central Google Scholar
Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl Acad. Sci. USA 111, E5272–E5281 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Grotzinger, A. D., Fuente, J., de la, Privé, F., Nivard, M. G. & Tucker-Drob, E. M. Pervasive downward bias in estimates of liability-scale heritability in genome-wide association study meta-analysis: a simple solution. Biol. Psychiatry 93, 29–36 (2023).
Article CAS PubMed Google Scholar
Grotzinger, A. D. et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525 (2019).
Article PubMed PubMed Central Google Scholar
Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).
Article PubMed PubMed Central Google Scholar
The International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Article ADS Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Lee, S. H., Goddard, M. E., Wray, N. R. & Visscher, P. M. A better coefficient of determination for genetic profile analysis. Genet. Epidemiol. 36, 214–224 (2012).
Article PubMed Google Scholar
Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).
Article PubMed PubMed Central Google Scholar
Young, A. I. et al. Relatedness disequilibrium regression estimates heritability without environmental bias. Nat. Genet. 50, 1304–1310 (2018).
Article CAS PubMed PubMed Central Google Scholar
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Article PubMed PubMed Central Google Scholar
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS PubMed PubMed Central Google Scholar
Samocha, K. E. et al. Regional missense constraint improves variant deleteriousness prediction. Preprint at bioRxiv https://doi.org/10.1101/148353 (2017).
Plewis, I. The Millennium Cohort Study: Technical Report on Sampling 4th edn (UK Data Service, 2007); http://doc.ukdataservice.ac.uk/doc/4683/mrdoc/pdf/mcs_technical_report_on_sampling_4th_edition.pdf.
Plewis, I. Non‐response in a birth cohort study: the case of the Millennium Cohort Study. Int. J. Soc. Res. Methodol. 10, 325–334 (2007).
Article Google Scholar
Huang, Q. Q., Wigdor, E. M. & Martin, H. Dissecting the contribution of common variants to risk of rare neurodevelopmental conditions. figshare https://doi.org/10.6084/m9.figshare.2706089 (2024).

Download references

Acknowledgements

We are grateful to families for their participation and engagement in the DDD study and 100,000 Genomes projects; without them, this research would not be possible. We also thank their clinicians and our colleagues (including E. Delage and the Sanger Human Genetics Informatics team, particularly I. Popov and R. Eberhardt) who assisted in the generation and processing of data. We are grateful to J. Hastings-Ward, H. Podd and H. Humphrey from the Participant Panel for the 100,000 Genomes project, A. L. Taylor Tavares from Genomics England and S. Wynn from the patient organization Unique, for their assistance with writing the frequently asked questions. We thank A. Kousathanas and L. Moutsianas from Genomics England Bioinformatics Research Services for help with data QC, H. Wong for useful discussions on prematurity, M. Nivard for advice on use of GenomicSEM and A. Ronald, N. Wray, N. Martin and O. Wootton for helpful discussions. DDD: the DDD study presents independent research commissioned by the Health Innovation Challenge Fund (grant no. HICF-1009-003). The full acknowledgements can be found at www.ddduk.org/access.html. This study makes use of DECIPHER, which is funded by the Wellcome Trust. GEL: this research was made possible through access to data in the National Genomic Research Library, which is managed by Genomics England Limited (a wholly owned company of the Department of Health and Social Care). The National Genomic Research Library holds data provided by patients and collected by the NHS as part of their care and data collected as part of their participation in research. The National Genomic Research Library is funded by the National Institute for Health Research and NHS England. The Wellcome Trust, Cancer Research UK and the Medical Research Council have also funded research infrastructure. UKHLS: we used data from ‘Understanding Society: The UK Household Longitudinal Study’, which is led by the Institute for Social and Economic Research at the University of Essex and funded by the Economic and Social Research Council (grant number ES/M008592/1). The data were collected by NatCen and the genome-wide scan data were analysed by the Wellcome Trust Sanger Institute. Data governance was provided by the METADAC data access committee, funded by the ESRC, Wellcome and MRC (grant number MR/N01104X/1). ALSPAC: we are extremely grateful to all the families who took part in ALSPAC, the midwives for their help in recruiting them and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. The UK Medical Research Council and Wellcome (grant no. 217065/Z/19/Z) and the University of Bristol provide core support for ALSPAC. This publication is the work of the authors and H.C.M. will serve as a guarantor for the contents of this paper. Genome-wide genotyping data was generated by Sample Logistics and Genotyping Facilities at the Wellcome Sanger Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe. MCS: we are grateful to the Centre for Longitudinal Studies (CLS), UCL Social Research Institute, for the use of these data and to the UK Data Service for making them available. However, neither CLS nor the UK Data Service bear any responsibility for the analysis or interpretation of these data. This research was funded in part by Wellcome (grant no. 220540/Z/20/A, ‘Wellcome Sanger Institute Quinquennial Review 2021–2026’). For the purpose of open access, the authors have applied a CC-BY public copyright license to any author accepted manuscript version arising from this submission. D.B. thanks the University of Cambridge Amgen Scholar Program for support. H.V.F. was supported by the NIHR Cambridge Biomedical Research Centre (grant no. NIHR203312). The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.

Author information

These authors contributed equally: Qin Qin Huang, Emilie M. Wigdor

Authors and Affiliations

Wellcome Sanger Institute, Hinxton, UK
Qin Qin Huang, Emilie M. Wigdor, Daniel S. Malawsky, Patrick Campbell, V. Kartik Chundru, Petr Danecek, Sarah Lindsay, Thomas Marchant, Mahmoud Koko, Sana Amanat, Davide Bonfanti, Eamonn Sheridan, Elizabeth J. Radford, Jeffrey C. Barrett, Helen V. Firth, Matthew E. Hurles & Hilary C. Martin
Department of Medical and Molecular Genetics, King’s College London, London, UK
Patrick Campbell
Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
Kaitlin E. Samocha
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Kaitlin E. Samocha
Institute of Biomedical and Clinical Science, University of Exeter, Exeter, UK
V. Kartik Chundru & Caroline F. Wright
Leeds Institute of Medical Research, University of Leeds, St. James’s University Hospital, Leeds, UK
Eamonn Sheridan
Yorkshire Regional Genetics Service, Chapel Allerton Hospital, Leeds, UK
Eamonn Sheridan
Department of Paediatrics, University of Cambridge, Cambridge Biomedical Campus, Cambridge, UK
Elizabeth J. Radford
Cambridge University Hospitals Foundation Trust, Addenbrooke’s Hospital, Cambridge, UK
Helen V. Firth
Department of Psychiatry, University of Cambridge, Cambridge, UK
Varun Warrier
Department of Psychology, University of Cambridge, Cambridge, UK
Varun Warrier
University of California Los Angeles Anderson School of Management, Los Angeles, CA, USA
Alexander Strudwick Young
Human Genetics Department, UCLA David Geffen School of Medicine, Los Angeles, CA, USA
Alexander Strudwick Young

Contributions

Q.Q.H. and E.M.W. conducted most of the analyses, with the remainder being conducted by P.C. and D.S.M. Q.Q.H. and E.M.W. carried out data preparation and QC, with assistance from K.E.S., V.K.C., P.D., S.L., T.M., M.K., S.A. and D.B. E.S., C.F.W. and H.V.F. helped supervise the DDD study, together with M.E.H. Q.Q.H., E.M.W., D.S.M., E.J.R., V.W., A.S.Y. and M.E.H. provided key intellectual input. H.C.M. supervised the analyses and directed the study. Q.Q.H., E.M.W. and H.C.M. wrote the first draft of the manuscript, with input from P.C., D.S.M., J.C.B., V.W., A.S.Y. and M.E.H. All authors read and commented on the final manuscript.

Corresponding author

Correspondence to Hilary C. Martin.

Ethics declarations

Competing interests

M.E.H. is a cofounder of, consultant to and holds shares in Congenica, a genetics diagnostic company and is also a consultant to AstraZeneca Centre for Genomics Research.

Peer review

Peer review information

Nature thanks Santhosh Girirajan, Gerome Breen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Outline of main questions and analyses in this paper, and the key findings from these.

We conducted a GWAS of neurodevelopmental conditions in GEL, and meta-analysed the results with the DDD-derived GWAS. We calculated genetic correlations between neurodevelopmental conditions and various brain-related conditions and traits using published GWAS summary statistics (Fig. 1), then estimated the fraction of each genetic correlation that was explained by genetic effects shared with educational attainment (Supplementary Fig. 1). Next we constructed polygenic scores for neurodevelopmental conditions and relevant traits using the DDD-derived GWAS and external GWASs. We tested for differences in average polygenic scores between patients with versus without a monogenic diagnosis (Fig. 2). Given that clinically unaffected parents and probands showed similar polygenic background (Fig. 3), we tested whether non-transmitted common alleles in the parents were correlated with their child’s risk of neurodevelopmental conditions (Fig. 4), and explored two potential explanations. In the figure in the bottom right, T and NT indicate transmitted and non-transmitted alleles in the parents, respectively. We indicate two possible reasons (left and right) that parental non-transmitted alleles may associate with the child’s phenotype, both of which can pertain to either maternal and/or paternal non-transmitted alleles. The first is that prenatal risk factors, specifically prematurity, might mediate the correlation between parental non-transmitted alleles and child’s risk (Extended Data Fig. 9) (a type of indirect genetic effect, which has a causal interpretation, hence the arrow); we did not find significant evidence for this (Supplementary Fig. 5). A second possible explanation we explored (blue box) is that the non-transmitted common alleles may simply tag rare variant effects due to parental assortment (hence, the association may simply reflect correlation with the causal factor, as indicated by the dotted line). We show a correlation between common and rare variant components of risk for neurodevelopmental conditions in Fig. 5.

Extended Data Fig. 2 Schematic illustrating key concepts in the paper.

(A) Illustration of the liability threshold model for rare neurodevelopmental conditions. The figure shows why one might expect patients with a monogenic diagnosis to have less polygenic (common variant) risk than those without a monogenic diagnosis. The normal distribution represents the underlying distribution of liability in the population, which is assumed to be Gaussian. Both genetic and environmental factors of different effects contribute to this total liability. Each panel represents a hypothetical example of one individual, either unaffected, affected and diagnosed with a monogenic cause, or affected and without a monogenic diagnosis. The red line indicates a threshold for being diagnosed with neurodevelopmental conditions. Circles represent different genetic factors, and diamonds represent environmental factors. The size of circles and diamonds represents their impact on disease risk. The second patient, who has a monogenic diagnosis, has fewer green circles (fewer NDC risk-increasing common variants) than the undiagnosed patient on the right, since the orange circle (diagnostic large-effect variant) is sufficient on its own to push the diagnosed patient over the diagnostic threshold. (B) Illustration of how parental assortment leads to correlation between the common and rare variant components of risk for neurodevelopmental conditions. The figure shows three hypothetical families in which the mother in each pair has a similar level of cognitive ability/educational attainment to the father (a phenomenon called parental assortment). Mother and father from the same family also have similar genetic predispositions towards these traits and hence also towards risk of NDCs. Numbers on the bottom of each jar represents the simulated count of risk alleles from NDC-associated common variants represented by green circles (PGS) and that from NDC-associated rare variants represented by blue circles (RVBS). In the lefthand two families, both parents have a low risk for NDCs, as shown by the total height of the blue and green circles being well below the liability threshold indicated by the red line. Children in these two families have inherited about the expected number of parental common and rare variant risk alleles (the average of their parents) and also have low risk for developing NDCs. In the third family, both parents are not clinically affected by NDCs but both have subclinical phenotypes (for example, mild learning difficulties) due to having more risk alleles at rare (lefthand parent) or common (righthand parent) variants which contribute to reduced cognitive performance. Their child’s risk is above the diagnostic threshold indicated by the red line. In the parents’ generation, when parental assortment starts, there is no significant correlation between PGS and RVBS (two-sided P = 0.87, Pearson correlation r = 0.08 using the simulated counts). In their children, those who have more polygenic risk also tend to have more rare variant risk (correlation between PGS and RVBS is significant with P = 0.023, r = 0.999). Note that the values for PGS and RVBS have been chosen deliberately to emphasize the point for illustrative purposes, but the correlation in the child is much weaker than this in reality (Fig. 5). Also note that when analysing the real data, we regressed out principal components from PGS and RVBS before calculating the correlations.

Extended Data Fig. 3 Phenotypic comparisons between DDD and GEL.

Distribution of age at assessment (A) and number of HPO terms (B) in both DDD and GEL probands with neurodevelopmental conditions who have GBR ancestry. The vertical lines indicate the means. A small number of probands in each program were aged over 50 and had more than 30 HPOs, and these have been omitted from the plot due to data sharing restrictions. (C) Proportion of probands from each cohort with at least one HPO term within the indicated chapter (black text) or specific phenotype (green text), ordered by the prevalence in DDD. The asterisks indicate results from a logistic regression testing whether there was a significant difference in phenotype prevalence between cohorts after controlling for sex and age (** indicates two-sided P < 0.05/43; * indicates two-sided P < 0.05; exact P values are annotated beside the asterisks). (D) Proportion of probands recruited to both DDD and GEL (N = 789) with at least one HPO term within the indicated chapter (black) or specific phenotype (green text) from the phenotype information from each program, ordered by the prevalence in DDD. The same logistic regression was used as in (C).

Extended Data Fig. 4 GWAS meta-analysis of neurodevelopmental conditions.

We meta-analyzed the GWASs derived from DDD-UKHLS (6,397 cases with neurodevelopmental conditions and 9,270 controls from UKHLS) and GEL (3,618 cases and 13,667 controls). We used overlapping SNPs with MAF > 1% in both cohorts. (A) Manhattan plot. The red line indicates the genome-wide significance threshold (5×10⁻⁸). (B) Quantile-quantile plot. GWAS summary statistics including exact P values are available in Supplementary Data 3.

Extended Data Fig. 5 Average polygenic scores in undiagnosed (red) and diagnosed (blue) probands with neurodevelopmental conditions from DDD and GEL combined.

PGSs were standardized so that, after reweighing to adjust for sampling and non-response bias, MCS children had mean of 0 and s.d. of 1 (see Methods and Supplementary Note 4). Subsets of probands with neurodevelopmental conditions and their parents from trios are shown in light red (undiagnosed subsets) and light blue (diagnosed subsets). PGS_NDC,DDD was tested in a held-out set of patients in DDD that were not included in the original GWAS as well as in GEL. Error bars show 95% confidence intervals. Asterisks in blue or red indicate subgroups that showed significantly different PGS compared to weighted MCS control children indicated by the horizontal line. Black asterisks indicate significant differences in average PGS between two subgroups highlighted by brackets which are specifically mentioned in the main text. One asterisk indicates nominally significant differences (P < 0.05) and a double asterisk indicates significant differences that passed Bonferroni correction for five PGSs (P < 0.01). See also Supplementary Table 7 for results of two-sided t-tests comparing groups.

Extended Data Fig. 6 Average polygenic scores in various subgroups.

A) Average polygenic score for educational attainment (PGS_EA) in different control cohorts and subsets thereof, subsets of probands with neurodevelopmental conditions, and their unaffected parents. B) Comparing average PGS_EA in trio probands and probands who did not have genetic data on both parents in ALSPAC, MCS, and affected patients from DDD and GEL. Note that in the case of DDD, “in trios” refers to those who had exome sequence data on both parents (only a subset of which also had genotype array data, since we prioritized genotyping full trios for which the child was undiagnosed), whereas in the rest of the manuscript (except for Fig. 2b which uses the same definition as here), “trio proband” refers to those who had genotype data on both parents. C) Average polygenic scores for all five traits in MCS before and after reweighting to adjust for sampling bias and attrition. Note that the PGS are corrected for 20 PCs and then normalized so that a combined set of unrelated controls from UKHLS and GEL have mean of 0 and s.d. of 1. Error bars show 95% confidence intervals. See Supplementary Table 6 for results of two-sided t-tests comparing the various groups.

Extended Data Fig. 7 Factors associated with having a monogenic diagnosis in DDD.

(A) Association between different configurations of affected relatives and the child’s PGS_EA (left) or the odds of getting a monogenic diagnosis (right). Left: Average proband PGS_EA in subgroups with different configurations of affected relatives based on the number of affected parents, siblings, and more distant relatives. Right: Odds ratio for having a monogenic diagnosis, compared to probands with no affected relatives, estimated from logistic regression. See Supplementary Methods for a description of how this was calculated. (B) Association between proband’s PGS_EA and diagnostic status, with or without correcting for technical, clinical and prenatal factors that are associated with receiving a monogenic diagnosis in DDD, assessed via logistic regression. We corrected for each factor individually (light purple), and also corrected both trio status and prematurity in a joint model (dark purple). In the joint model, we did not include factors that were not associated with PGS_EA (sex and maternal diabetes) or diagnostic status (F_ROH) (Fig. 2), nor factors that are likely the consequence of having or not having a monogenic diagnosis, rather than a cause of getting a diagnosis (severity of ID/DD or having any affected family members). One asterisk indicates nominally significant results (P < 0.05) and double asterisk indicates significant results that passed Bonferroni correction for seven factors. See Supplementary Table 8 for exact estimates and two-sided P values. Error bars show 95% confidence intervals in both panels.

Extended Data Fig. 8 Exploring sex differences in polygenic risk.

A) Comparison of polygenic scores between undiagnosed male and female probands in DDD and GEL combined. We used all undiagnosed probands with neurodevelopmental conditions regardless of trio status in this analysis (N = 1,426 females and N = 2,427 males in DDD; N = 112 females and N = 146 males in DDD excluding GWAS samples; N = 918 females and N = 1,574 males in GEL). Square points show the differences in average polygenic scores between female and male probands. A positive difference indicates that female probands have higher PGS than male probands. B) Comparison of polygenic scores between unaffected mothers and fathers of undiagnosed probands from a combined sample of 1,523 trios and 1,343 trios from DDD and GEL, respectively. Triangles show the differences in average polygenic scores between mothers and fathers. A positive difference indicates that mothers have higher PGS than fathers. Two-sided t-tests were used to compare average PGSs in A) and B). C) pTDT results in undiagnosed female and male probands with unaffected parents (N = 586 females and N = 937 males in DDD; N = 99 females and N = 125 males in DDD excluding GWAS samples; N = 490 females and N = 853 males in GEL). We tested if probands’ polygenic scores deviated from the mean parental polygenic scores using two-sided one-sample t-tests. Points show the mean pTDT deviation (difference between the child’s polygenic score and the mean parental polygenic score, in units of the s.d. of the latter). Error bars show 95% confidence intervals. The significant result that passes Bonferroni correction of five tests is highlighted by a double asterisk. See Supplementary Table 9 for results of pTDT.

Extended Data Fig. 9 Exploring prenatal factors that may influence risk of neurodevelopmental conditions.

(A) Points show genetic correlations between neurodevelopmental conditions and prenatal risk factors, before and after conditioning on educational attainment or cognitive performance. Genetic correlations with our GWAS meta-analysis for neurodevelopmental conditions were estimated using Linkage Disequilibrium Score Regression. Those conditioned on the GWAS summary statistics for educational attainment or cognitive performance were estimated using GenomicSEM. See Supplementary Table 11 for exact estimates of genetic correlations and two-sided P values. (B) Percentage of the genetic correlation between neurodevelopmental conditions and prenatal risk factors that is explained by the latent educational attainment (EA) variable estimated using GenomicSEM (red bars and percentage written in text). Green bars indicate the contribution from the non-EA latent variable. The estimates are standardized so that the total height represents the genetic correlation between neurodevelopmental conditions and prenatal risk factors. (C) Percentage of the genetic correlation between neurodevelopmental conditions and prenatal risk factors that is explained by the latent cognitive (Cog) variable (red bars and percentage written in text). Green bars indicate the contribution from the non-cognitive (Non-Cog) variable. In (B) and (C), we focused on prenatal factors that showed significant genetic correlations with neurodevelopmental conditions. (D) Association between PGSs and prematurity, a risk factor for neurodevelopmental conditions. Points show the differences in PGSs between premature and term probands, estimated in DDD using linear regression models. See Supplementary Table 8 for exact two-sided P values and sample sizes. Note that for PGS_NDC,DDD, probands who were included in the GWAS were not tested, which left 703 probands, of which 83 were born prematurely. A negative estimate indicates that probands who were born prematurely had a lower polygenic score than term probands, or their parents had a lower polygenic score than the parents of term probands. Associations that pass Bonferroni correction for five traits in (A) or five polygenic scores in (B) are indicated by a double asterisk and nominally significant (P < 0.05) results by one asterisk. Error bars show 95% confidence intervals.

Extended Data Fig. 10 Exploring how the correlation between rare and common variant components of risk for NDCs affects estimates from the trio model.

(A) Illustration of how assortment-induced correlation between common and rare components of risk for neurodevelopmental conditions affects the non-transmitted coefficients but not the estimate of the direct genetic effect in the trio model. We simulated three NDC trios and three control trios. Each individual has a polygenic score (PGS) and a rare variant burden score (RVBS), representing the measured common and rare variant risk for NDCs, respectively. The child in each trio family has inherited about the expected number of risk alleles (the average of their parents) - the transmitted alleles (T). In these simulated hypothetical families, the child does not show significant deviation from parental average, which is what we observe for PGS_EA (Fig. 3). We also show the PGS and RVBS derived from the parental non-transmitted risk alleles (NT). An individual’s PGS is correlated with their RVBS (black double arrows) due to parental assortment which started in previous generations (Extended Data Fig. 2b). However, in these hypothetical families, the child’s PGS deviation from their parental average is not significantly correlated with their RVBS deviation (grey double arrows). In the ‘proband-only model’, θ_T captures both the association between child’s PGS and NDC risk and the association between child’s RVBS and NDC risk (blue solid arrow) due to the correlation between child’s PGS and RVBS. In the ‘trio model’, the parental non-transmitted coefficients (θ_m,NT, θ_f,NT) capture the effects of both the parental PGS and RVBS (purple solid arrows) for the same reason. However, the coefficient on the child’s PGS (the estimate of the direct genetic effect, δ) captures the association of the deviation from parental average PGS due to Mendelian segregation (orange solid arrow), which is uncorrelated with the rare variant effects. Note that the values for PGS and RVBS have been chosen deliberately to emphasize the point for illustrative purposes, but real correlations between the measured scores are much weaker (Fig. 5). We used simulated counts to calculate Pearson correlation coefficients and reported two-sided P values. (B) Effect sizes of PGS and RVBS on case/control status within GEL estimated from the ‘proband-only’ and ‘trio’ models. Two-sided P values and effect sizes (reported in Supplementary Table 10) were estimated from logistic regression models fitted to 1,343 trios in which the proband with a neurodevelopmental condition is undiagnosed and parents are unaffected, and 872 trios without neurodevelopmental conditions. Case/control status was regressed on either the child’s PGS (proband-only model), the child’s PGS and child’s RVBS (proband-only model + RVBS), all three trio members’ PGSs (trio model), or all three trio members’ PGSs and RVBSs (trio model+RVBS). We have indicated results from the latter with a red box, since they are the main focus of this figure. One asterisk indicates nominally significant results (P < 0.05) and a double asterisk indicates significant results that passed Bonferroni correction for five PGSs. Note that the ‘proband-only’ model and ‘trio’ model were also shown in Fig. 4 using additional cases and controls, rather than just GEL. The RVBS was defined as the number of rare damaging PTVs and missense variants in constrained genes (excluding de novo mutations in the child), corrected for 20 genetic principal components.

Supplementary information

Supplementary Information

Supplementary Methods, Figs. 1–21, Notes 1–9, Descriptions of Tables 1–21 and Descriptions of Data 1–3.

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Tables 1–21.

Supplementary Data 1

Summary statistics from the GWAS of neurodevelopmental conditions comparing cases to controls within the Genomics England (GEL) 100,000 Genomes Project.

Supplementary Data 2

Summary statistics from the GWAS of neurodevelopmental conditions comparing DDD cases to UKHLS controls, excluding the Scottish samples from DDD.

Supplementary Data 3

Summary statistics from the GWAS meta-analysis of neurodevelopmental conditions combining the DDD and GEL GWASs.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Huang, Q.Q., Wigdor, E.M., Malawsky, D.S. et al. Examining the role of common variants in rare neurodevelopmental conditions. Nature 636, 404–411 (2024). https://doi.org/10.1038/s41586-024-08217-y

Download citation

Received: 12 February 2024
Accepted: 15 October 2024
Published: 20 November 2024
Issue Date: 12 December 2024
DOI: https://doi.org/10.1038/s41586-024-08217-y

Subjects

This article is cited by

Common genetic variants contribute more to rare diseases than previously thought
- Gerome Breen
Nature (2024)

Abstract

Similar content being viewed by others

Main

GWAS and genetic correlations

Less polygenic risk in diagnosed probands

Assessing transmission of polygenic risk

Association with non-transmitted alleles

Exploring the role of prenatal factors

Correlated common and rare variant risk

Discussion

Methods

Cohort descriptions and phenotypes

DDD

GEL project

Control cohorts

Ethics

Preparation of genetic data

Genetically predicted ancestry

Relatives within and across cohorts

Imputation and post-imputation QC

Extraction and QC of rare variants

Defining monogenic diagnoses in patients

DDD

GEL

Defining trio sample sets in DDD and GEL

GWAS of neurodevelopmental conditions

Heritability

Genetic correlations

Calculating polygenic scores

Analyses of polygenic scores

Evaluating variance explained by polygenic score

Comparing polygenic scores between different subsets

Polygenic score and diagnostic status

Evaluating over-transmission of polygenic scores

Association with non-transmitted alleles

Analyses of polygenic scores and rare coding variants

Construction and use of weights for MCS

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data figures and tables

Supplementary information

Rights and permissions

About this article

Cite this article

Subjects

This article is cited by