Beyond SNPs: Scalable Detection of Structural Variants Unlocks Hidden Genetic Diversity in Tomato

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Structural variants (SVs)-large genomic alterations such as insertions, deletions, duplications, and translocations-are widespread in tomato genomes and play a critical role in phenotypic diversity. However, their detection has traditionally depended on expensive long-read sequencing technologies. Consistent with previous studies on structural variation, this study demonstrates a cost-effective approach for SV discovery using repurposed short-read sequencing data (150 bp), enabling integration of SVs into breeding workflows without additional sequencing investment. Using Illumina whole-genome data from 60 diverse tomato lines, including wild accessions, landraces, transgenic lines, and modern breeding lines, we identified over 71,000 high-confidence SVs, including a significant number of private doubletons with the Manta caller, as well as 10.9 million short genetic variants. SVs were unevenly distributed across chromosomes, clustering in subtelomeric regions and near disease-resistance loci, with chromosomes 6, 7, and 9 showing the highest densities. Wild accessions harbored nearly twice as many SVs as cultivated lines, with deletions dominating wild genomes and insertions more prevalent in cultivated tomatoes than in the reference genome. Comparative phylogenetic analysis revealed strong concordance between SV-based and SNP/InDel-based trees (Baker’s γ = 0.95), while SV data improved pedigree-consistent clustering and resolved ambiguous lineage relationships. These findings highlight SVs as hidden drivers of tomato diversity and valuable resources for marker-assisted selection, trait mapping, and genomic studies. Mining archived short-read datasets could offer breeding programs a scalable, low-cost strategy to unlock latent SV information, accelerate genetic improvement, and enhance genome-to-phenome insights. Limitations include under-detection of complex rearrangements, warranting targeted validation for critical loci.

Article activity feed