Detecting structural variants in the human genome remains a substantial challenge for most sequencing projects. For example, using short-read sequencing methods, only 10%-70% of SVs can be detected, with up to 89% false positive rates1. Yet, accurately detecting SV is critical, not only to building a more comprehensive view of human genetic variation, but to better understanding the genetics behind complex conditions like schizophrenia, Alzheimer’s disease, and cancer2-4.
Structural variants—defined as insertions, deletions, duplications, or inversions that span more than 50bp—represent a prevalent and underexplored source of human genetic variation. Estimates suggest that each person has between 23,000 to 31,000 structural variants1. In order to both identify these complex variants and predict their functional consequence, researchers need the ability to perform large-scale, genome-wide association studies (GWAS). However, the limitations of current sequencing technology make this difficult to do. Most DNA sequencing projects use short-read NGS platforms, making it challenging to accurately resolve long, complex structural mutations. While long-read sequencing platforms are both available and well-suited for structural variant detection, the significant cost of using these platforms has prevented their widespread use.
As a result, the vast majority of genomics research is carried out using short-read sequencing. Large repositories of short-read sequencing data have become available to researchers around the world, enabling statistically well-powered studies that continue to uncover subtle gene-phenotype associations. Yet, the presence and effect of structural variants in these data sets remains obscured, greatly limiting our understanding of this common mutation type.
Developing a tool that allows researchers to infer the presence of structural variants from short-read sequencing data would open the flood gates, enabling researchers to scour existing data sets for new information. Towards this end, a team of researchers from Gencove and Boehringer Ingelheim came together to create a multi-ancestry structural variation imputation panel based on Oxford Nanopore long-read sequencing data5. This panel enables the imputation of structural variation from short-read sequencing data. Such a resource could greatly improve our understanding of the human genome and the diseases that are borne from it.