Utilizing low-pass sequence data to study the impact of structural variants on polygenic traits

Martijn Derks
Torsten Pook
Jun Chen
Rachel Hawken
Aniek Bouwman

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background The impact of single nucleotide polymorphisms (SNPs) on polygenic traits have been well studied because of technological advances making SNP genotyping cost effective. Studying the impact of structural variants (SV) on polygenic traits is far more complex and requires large datasets with accurately genotyped structural variants (SVs) with SVs typically only being identified from high coverage sequencing data. As such data is costly to generate, low-pass sequencing might be a less accurate but more practical and cost-effective alternative. In this study, we aim to call and impute SVs in a low-pass sequence dataset from two broiler lines of ~ 1000 individuals each and subsequently compare SVs calls to those obtained from high coverage data and assess their potential for use in breeding by including SVs in genomic prediction models. Results Deletions, duplications, and inversions were called in a high coverage reference panel of 76 founder individuals, and in the low coverage data of the 2,119 broilers. We discovered a total of 35,278 SVs in the high coverage dataset, and 58,296 SVs in the low pass dataset. A large proportion of the SVs called in the low pass dataset are deletions (47,269) generally with a low MAF. The imputation accuracy of low-pass SVs to a whole genome sequence reference set shows good accuracy overall based on Beagle R2, particularly for deletions. Duplication variants also maintain relatively good accuracies, but inversions exhibit a somewhat lower imputation accuracy. Overall, less than 3% of the variation in the finally obtained SV data was explained by SNP genotypes from a 60k array. Substantial improvements are observed when including SVs in genomic prediction models with a relative improvement in prediction accuracies of more than 5% compared to a purely SNP based prediction model. Conclusions Detection of SVs in low-pass sequence data directly is possible and yields not only a reasonable overlap with SVs called in high coverage WGS reference population, but in addition detects more low frequent SVs. Our results show that genomic prediction including SVs in addition to SNP results in improved genomic prediction accuracy, highlighting the importance of SVs to understanding the genomic process and the underlying architecture of traits.

Version published to 10.21203/rs.3.rs-6812361/v1 on Research Square
Jun 4, 2025

A sensitive and accurate framework for population-scale structural variant discovery and genotyping across sequence types

This article has 4 authors:
1. Xin Wang
2. Guangbao Luo
3. Li Xiao
4. Zhangjun Fei
This article has no evaluationsLatest version Feb 18, 2026
HitSV: Maximizing discovery of structural variants across sequencing technologies

This article has 5 authors:
1. Yadong Wang
2. Gaoyang Li
3. Yadong Liu
4. Bo Liu
5. Long Qian
This article has no evaluationsLatest version Feb 20, 2026
Benchmark of open-access star-allele callers to accurately assess haplotypes and phenotypes in pharmacogenetic studies

This article has 3 authors:
1. Marc Gros La Faige
2. Emmanuelle Génin
3. Anthony Herzig
This article has no evaluationsLatest version Feb 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A sensitive and accurate framework for population-scale structural variant discovery and genotyping across sequence types

HitSV: Maximizing discovery of structural variants across sequencing technologies

Benchmark of open-access star-allele callers to accurately assess haplotypes and phenotypes in pharmacogenetic studies