Comprehensive OrgDb Packages for Fungal Comparative Genomics: MycoCosm-Derived Standardized GO and InterPro Annotations Across Five Major Phyla
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Fungal comparative genomics and functional analysis require standardized annotation frameworks, yet functional annotations remain fragmented across inconsistent database formats. We present five comprehensive Bioconductor-compatible OrgDb packages consolidating Gene Ontology and InterPro annotations for 2,748 fungal strains spanning five major phyla: Ascomycota (1,610 strains), Basidiomycota (654 strains), Mucoromycota (172 strains), Chytridiomycota (37 strains), and Zoopagomycota (25 strains). These databases achieve 97.5-98.2% GO coverage across over 14 million genes with consistent annotation depth (mean 3.5-3.7 GO terms per gene) and high strain-level coverage (99.3-99.5%). The standardized architecture addresses critical methodological limitations in fungal functional genomics by providing properly structured background gene universes for statistical enrichment analysis, enabling rigorous cross-strain and cross-phylum comparative studies. Integration with established Bioconductor workflows (clusterProfiler, enrichplot, topGO) eliminates technical barriers that have historically impeded comparative analyses. The databases support diverse applications including: (1) RNA-seq differential expression interpretation with fungal-specific functional resolution, (2) systematic profiling of virulence mechanisms in plant pathogenic fungi, (3) metabolic capability assessment for biotechnological strain selection, and (4) cross-phylum functional evolution analysis. Validation against model organism databases confirms 94.7% annotation concordance, while within-phylum analyses reveal 78-92% shared GO term coverage across strains, demonstrating biological coherence of functional signatures. The databases are distributed through GitHub repositories with version control and will be manually updated to incorporate new genome annotations as they become available from public repositories. By providing the first standardized, statistically rigorous functional annotation framework spanning major fungal lineages, this resource democratizes sophisticated comparative genomics approaches and establishes a foundation for reproducible fungal functional genomics research.