Chromosome-level phased genome assembly of the argan tree Sideroxylon spinosum
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Argan ( Sideroxylon spinosum L., formerly Argania spinosa ) is a tree endemic to Morocco, primarily valued for its seed oil. Growing interest in its biology and in genes linked to oil quality and stress resistance highlights the need for high-quality genome and transcriptome models. We integrated PacBio HiFi long-read and Illumina Hi-C sequencing data to generate independently assembled, phased genome models for both parental haplotypes, measuring 636 Mb and 655 Mb, respectively, with BUSCO completeness scores exceeding 97.8%. Each haplotype consists of 11 fully resolved telomere-to-telomere chromosomes, consistent with chromosome numbers in other Sapotaceae species (n = 10–13), and contains approximately 60% repetitive sequences. Annotation predicted ~28,720 protein-coding genes per haplotype. Comparative analyses with other Sapotaceae genomes indicate overall chromosome conservation within the family, alongside repeat expansion and fusion events in the two largest chromosomes (chr1 and chr2). We also independently assembled the complete chloroplast genome. This high-quality assembly provides a valuable resource for future research on argan biology, genetic diversity, and traits relevant to adaptation and oil biosynthesis.