“Imputation of Structural Variants Using a Multi-Ancestry Long-Read Sequencing Panel Enables Identification of Disease Associations”, Boris Noyvert, A. Mesut Erzurumluoglu, Dmitriy Drichel, Steffen Omland, Till F. M. Andlauer, Stefanie Mueller, Lau Sennels, Christian Becker, Aleksandr Kantorovich, Boris A. Bartholdy, Ingrid Braenne, Julio Cesar Bolivar-Lopez, Costas Mistrellides, Gillian M. Belbin, Jeremiah H. Li, Joseph K. Pickrell, Johann de Jong, Jatin Arora, Yao Hu, Boehringer Ingelheim, Digital Sciences, Clive R. Wood, Jan M. Kriegl, Nikhil Podduturi, Jan N. Jensen, Jan Stutzki, Zhihao Ding2023-12-22 (, )⁠:

Advancements in long-read sequencing technology have accelerated the study of large structural variants (SVs). We created a curated, publicly available, multi-ancestry SV imputation panel by long-read sequencing 888 samples from the 1000 Genomes Project. This high-quality panel was used to impute SVs in ~500,000 UK Biobank participants.

We demonstrated the feasibility of conducting genome-wide SV association studies at biobank scale using 32 disease-relevant phenotypes related to respiratory, cardiometabolic and liver diseases, in addition to 1,463 protein levels.

This analysis identified thousands of genome-wide statistically-significant SV associations, including hundreds of conditionally independent signals, thereby enabling novel biological insights.

Focusing on genetic association studies of lung function as an example, we demonstrate the added value of SVs for prioritising causal genes at gene-rich loci compared to traditional GWAS using only short variants.

We envision that future post-GWAS gene-prioritization workflows will incorporate SV analyses using this SV imputation panel and framework.