Modeling structural variations sequencing information to address missing heritability and enhance risk prediction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Missing heritability remains a significant challenge in genome-wide studies focused on single nucleotide polymorphisms (SNPs) when analyzing the genetic basis of complex traits. Structural variations (SVs), which span broader genomic regions and often exert larger functional impacts than SNPs, hold promise for capturing this heritability. Although graph-based pangenomes have advanced SV detection and association analysis, traditional SNP-centric frameworks typically overlook detailed SV sequence information, treating overlapping SVs as independent variants and leading to reduced statistical power. To overcome this limitation, we developed SVrefiner , an algorithm that aligns overlapping SVs and partitions them into non-overlapping refined SVs (rSVs) based on sequence congruence. By generating precise genotype matrices, SVrefiner enables more accurate association studies. When applied to human, tomato, and pig genomes, SVrefiner delineated 48,712 rSVs from 77,696 human SVs, 6,607 rSVs from 51,561 tomato SVs, and 5,237 rSVs from 142,784 pig SVs. Incorporating rSVs alongside SNPs, indels, and unrefined SVs enhanced heritability estimates by up to 24.47%. Further, refined analysis revealed that gene expression traits are often influenced by specific SV subregions rather than entire SVs. Notably, integration of rSVs in human datasets increased total eQTLs by 71.56%, with cis- and trans- eQTLs rising by 21.3% and 94.4%, respectively. Mean risk prediction accuracy across over 16,000 traits improved by as much as 16.8%. These findings deepen the understanding of complex trait heritability and demonstrate the utility of rSVs for genetic improvement and disease risk assessment.