Quantification of organelle genome contamination in public Silene latifolia RAD-seq datasets
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective: Restriction site-associated DNA sequencing (RAD-seq) is widely used for linkage mapping and population/phylogenetic inference, yet off-target reads originating from chloroplast and mitochondrial genomes may bias downstream analyses. We quantified organelle-derived reads in publicly available Silene latifolia RAD-seq libraries and compared them with whole-genome sequencing (WGS) and cDNA (RNA-seq) datasets. Results: We analyzed 42 libraries from public repositories (26 RAD-seq, 9 WGS, 7 cDNA). Organelle-derived reads were detected in every library. RAD-seq libraries were typically low in organelle reads (23/26 libraries <5%), but three RAD-seq runs showed extreme organelle carry-over (26.4-32.9%). Across RAD-seq libraries, organelle-mapped reads were predominantly mitochondrial rather than chloroplast. These results highlight that organelle carry-over can vary markedly among RAD-seq libraries and may benefit from routine quantification and filtering prior to SNP calling to reduce potential downstream bias. Conclusions: Organelle-derived reads are detectable in all examined S. latifolia libraries and may be extreme in a subset of RAD-seq runs; routine quantification and removal of organelle-mapped reads may be a useful component of RAD-seq quality control to minimize potential downstream bias.