Utilizing low-pass sequence data to study the impact of structural variants on polygenic traits

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background The impact of single nucleotide polymorphisms (SNPs) on polygenic traits have been well studied because of technological advances making SNP genotyping cost effective. Studying the impact of structural variants (SV) on polygenic traits is far more complex and requires large datasets with accurately genotyped structural variants (SVs) with SVs typically only being identified from high coverage sequencing data. As such data is costly to generate, low-pass sequencing might be a less accurate but more practical and cost-effective alternative. In this study, we aim to call and impute SVs in a low-pass sequence dataset from two broiler lines of ~ 1000 individuals each and subsequently compare SVs calls to those obtained from high coverage data and assess their potential for use in breeding by including SVs in genomic prediction models. Results Deletions, duplications, and inversions were called in a high coverage reference panel of 76 founder individuals, and in the low coverage data of the 2,119 broilers. We discovered a total of 35,278 SVs in the high coverage dataset, and 58,296 SVs in the low pass dataset. A large proportion of the SVs called in the low pass dataset are deletions (47,269) generally with a low MAF. The imputation accuracy of low-pass SVs to a whole genome sequence reference set shows good accuracy overall based on Beagle R2, particularly for deletions. Duplication variants also maintain relatively good accuracies, but inversions exhibit a somewhat lower imputation accuracy. Overall, less than 3% of the variation in the finally obtained SV data was explained by SNP genotypes from a 60k array. Substantial improvements are observed when including SVs in genomic prediction models with a relative improvement in prediction accuracies of more than 5% compared to a purely SNP based prediction model. Conclusions Detection of SVs in low-pass sequence data directly is possible and yields not only a reasonable overlap with SVs called in high coverage WGS reference population, but in addition detects more low frequent SVs. Our results show that genomic prediction including SVs in addition to SNP results in improved genomic prediction accuracy, highlighting the importance of SVs to understanding the genomic process and the underlying architecture of traits.

Article activity feed