A k-mer-based GWAS approach empowering gene mining in polyploids
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genome-wide association studies (GWAS) serve as a cornerstone for deciphering the genetic architecture of complex traits. However, conventional GWAS tools, predominantly optimized for diploid species, encounter substantial limitations when applied to complex polyploids due to challenges such as genotyping complexity, multi-allelic variant interpretation, and allele dosage ambiguity. Here, we present KMERIA, a k-mer-based framework specifically engineered to address these challenges, enabling efficient genotyping and robust association mapping in complex polyploid genomes. Rigorous benchmarking with simulated and empirical datasets demonstrates that KMERIA surpasses existing methods in both genotyping accuracy and statistical power. To demonstrate its utility in high-ploidy systems, we deployed KMERIA in an auto-polyploid natural population of 290 wild sugarcane accessions (Saccharum spontaneum) spanning diverse ploidy levels. To assess biases inherent to linear reference genomes in capturing allelic diversity and structural variations, we constructed a graph-based pan-genome integrating structural variations and haplotype diversity across 15 S. spontaneum accessions. Integrating KMERIA with a graph pangenome revealed novel sucrose biosynthesis (SsMGT) and tillering regulators (SsERF14, SsNGA5, SsNAC, SsARF8, SsLOG, SsSCR) in S. spontaneum, including functionally validated SsNGA5. These discoveries not only elucidate the genetic basis of S. spontaneum for its yield potential but also provide actionable targets for sugarcane breeding. Collectively, KMERIA bridges a critical methodological gap in polyploid genomics, while the integration of graph pan-genomes provides a robust framework for deciphering genotype-phenotype relationships in crops with complex genome architectures.