A Long-read based Haplotype Panel Enhances Imputation and Discovery of Functional Small and Structural Variants
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Haplotype reference panels are commonly used for genotype imputation in genome-wide association studies (GWAS). Although structural variations (SVs) are recognized as major contributors to human phenotypes, they are often excluded from GWAS analyses. Here, we integrate long-read-based and statistical methods to provide a comprehensive haplotype reference panel (Han-SV panel) incorporating 32,603,300 single nucleotide variants (SNPs), 3,180,227 small deletions and insertions and 172,569 SVs derived from 943 Han Chinese individuals. Our hybrid phasing approach had a 12.7-fold reduction in phasing error for small variants and 3.6-fold for SVs compared to conventional statistical phasing. This Han-SV panel enabled a more than two-fold in amount and four-fold in accuracy improvement of SV imputation compared to the expanded 1000 Genomes Project panel. Two GWASs using our panel-imputed variants identified 69 associated SVs and 101 previously unreported regions associated with skin-related and fingerprint phenotypes—substantially outperforming both short-read and SNP-array-based GWAS. This Han-SV panel offers a valuable resource for variant imputation and SV-included association studies to further uncover the novel phenotype associations and address critical gaps in missing heritability. An imputation server was provided for the use of the Han-SV panel (https://www.biosino.org/svrp).