Detection and evaluation of copy number variation using both linked-read and short-read sequencing in New Zealand dairy cattle

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

In recent years, genetic studies have made significant progress in identifying single-nucleotide polymorphisms (SNPs) associated with cattle health and production traits. However, it is still challenging to identify and validate more complicated forms of variation, such as copy number variation (CNV) and other types of structural variation (SV). In this study, SV regions were identified using 37 New Zealand dairy cattle with linked-read sequence data. A transmission-based framework was used to validate these variants at the population scale. 62,438 putative autosomal SV regions were identified with the LongRanger pipeline following the 10x Genomics recommendations. Copy number states for these regions were subsequently estimated via a read-depth based genotyping method using CNVpytor in a population-representative cohort of 2306 animals using Illumina short-read sequencing technology. Mendelian inheritance of copy number states was assessed using linear mixed models incorporating pedigree information, and transmission levels were used to quantify the biological validity of each CNV region. Transmission levels ranged widely, with a mean of 0.5162 across all regions, where higher transmission levels were proportionally enriched for larger SVs. A total of 7218 CNV regions exhibited high transmission levels (>0.9), indicating strong evidence of inheritance. Among these, 7136 overlapped CNV regions reported in one or more public datasets, while 82 high-confidence regions represent previously unreported variants. High-transmission CNV regions tended to show clear, discrete inheritance patterns in trio families, providing the biological evidence that these CNVs are inherited within the population. Together, these results demonstrate that integrating linked-read sequencing with population-scale transmission-based validation provides a robust framework for identifying high-confidence CNV regions. This catalogue of validated CNV regions represents an important resource for downstream functional analyses and the incorporation of structural variation into genomic selection and breeding programs.

Article activity feed