Integrative phylogenomic and pangenome landscape of Bacillus : insights from 10,000 genomes into taxonomy, functional potential, and biotechnological applications
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The genus Bacillus comprises Gram-positive, endospore-forming rods with a ubiquitous distribution across terrestrial, aquatic, and aerial environments, as well as associations with plants, animals, and food. To explore its diversity, we conducted an integrated phylogenomic and pangenome analysis using 10,839 publicly available RefSeq genomes (67 type strain genomes). Taxonomic delimitation combining genomic-distance metrics (Mash/ANI), network analyses, and a label-propagation algorithm assigned 10,276 genomes to operational communities, revealing novel complexes within B. cereus sensu lato and other clades. A robust phylogeny reconstructed from 103 representative genomes corroborated the ANI-based groupings. Forty-eight communities (≥10 genomes) were further analyzed for pangenome openness, showing a strong negative correlation between the saturation coefficient (α) and genomic fluidity (φ) (ρ = −0.636), indicating that open pangenomes exhibit high gene-content variability. Functional profiling revealed 135 antifungal genes and 15 secondary metabolite clusters, highlighting B. velezensis , B. amyloliquefaciens , and B. subtilis as rich reservoirs of hydrolytic enzymes, NRPS/PKS systems, and nutrient-competition traits. Additionally, 135 resistance determinants and 115 virulence factors were identified, mainly within B. cereus sensu lato. Biofertilization genes related to phosphorus, nitrogen, siderophore, potassium, and sulfur metabolism were broadly conserved, underscoring the genus biotechnological potential.