Comparative Genomics of Two Chaetoceros muelleri Strains Reveals Structural and Functional Variation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Diatoms are major contributors to global primary production and biogeochemical cycling, yet high quality nuclear genome resources remain limited for ecologically dominant lineages such as Chaetoceros . Here, we present a highly contiguous nuclear genome assembly of Chaetoceros muelleri generated from living cells resurrected from resting spores preserved in Baltic Sea sediments and sequenced using PacBio HiFi long read technology. The resurrected BS20 assembly is highly contiguous and gapless, enabling robust resolution of genes, protein domain architecture, and transposable element (TE) landscapes that are poorly captured in fragmented assemblies.

By directly comparing this resurrected genome with a contemporary laboratory strain ( C. muelleri NMCA1316), we disentangle biological divergence from assembly-driven artefacts. Despite a strongly conserved core genome, the two strains differ markedly in repeat content, gene family copy number, and functional enrichment patterns. The resurrected genome harbors ∼38% repetitive DNA approximately 1.8-fold higher than the laboratory strain dominated by LTR retrotransposons and a large fraction of unclassified repeats, indicating extensive historical TE activity. Expanded orthogroups in the resurrected strain are enriched for retroelement-associated domains, nucleic-acid processing functions, and stress-responsive gene families, including small heat shock proteins, whereas apparent expansions in the laboratory strain largely reflect annotation inflation arising from assembly fragmentation.

Spatial analyses further reveal widespread proximity between TEs, and genes involved in membrane transport and environmental responsiveness, suggesting a role for TE dynamics in regulatory and functional diversification. Together, our results demonstrate that assembly strategy and strain history critically shape genomic inference and highlight the value of resurrected genomes for accessing historical diversity. This study provides a foundational genomic resource for C. muelleri and establishes a genomic framework that explicitly accounts for strain level variation when investigating diatom genome evolution, TE mediated innovation, and long-term ecological adaptation.

Article activity feed