Genome Assembly of the Iconic Samba Mahsuri Delineates Locus-specific Population Structure within Indica Rice
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
High-quality reference genomes enable detailed analysis of structural variation and its consequences for genome organization in crops. Here, we present a chromosome-scale genome assembly of Oryza sativa cv. Samba Mahsuri (SM), an elite Indian mega rice variety cultivated for its grain and cooking quality. Using PacBio HiFi sequencing in combination with Illumina reads and Bionano optical mapping, we generated a ∼395 Mb assembly (SMv1.0) with 97.7% BUSCO completeness. A robust annotation framework identified 31,138 evidence-guided protein-coding gene models alongside 59,152 ab initio predictions. Comparative genomic analyses revealed extensive macrosynteny with established rice reference genomes, while uncovering pronounced locus-specific sequence and structural polymorphisms. Notably, a complex inversion-match-inversion (IMI) configuration on chromosome 6 differentiates SM from the japonica reference Nipponbare, but not from the indica reference R498. Population-scale analyses of 533 cultivated and 4 wild rice accessions demonstrate that genetic variation within the IMI region produces a markedly sharper and more coherent population structure than is observed in flanking regions or genome-wide, including tight subpopulation-based clustering and segregation of alternative IMI configurations within indica rice. Together, these results establish SMv1.0 as a robust chromosome-scale reference genome sequence for rice and demonstrate how large structural polymorphisms can shape locus-specific patterns of relatedness that diverge from genome-wide ancestry.
Significance Statement
We present a high-quality chromosome-scale genome sequence of the elite Indian rice variety Samba Mahsuri (SM), which, to the best of our knowledge, represents the first chromosome-scale reference genome from Indian rice germplasm assembled using a map-based method. Using this reference to analyze population-scale genotyping data from 533 cultivated and four wild rice accessions reveals markedly tighter population clustering within a megabase-scale inversion (IMI) region, than at the whole-genome scale, along with a pronounced split within indica rice that is independent of genome-wide ancestry.