AlphaGEM Enables Precise Genome-Scale Metabolic Modelling by Integrating Protein Structure Alignment with Deep-learning-based Dark Metabolism Mining

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Constructing high-quality genome-scale metabolic models (GEMs) for less-studied species remains challenging. To address this, we developed AlphaGEM, a versatile toolbox leveraging proteome-scale structural alignment and deep-learning-based predictions for efficient genomic mining to generate GEMs ready for applications. Our findings show that the pipelines employed in AlphaGEM building on structural alignment or protein-language-model-based inference (i.e., PLMSearch), could enhance the identification of more homologous relationships between proteins from different species than sequence-blast-based alignment, contributing to the reliable metabolic modelling for target organisms. Additionally, AlphaGEM encompasses an ensemble procedure empowered by multiple deep learning toolboxes to effectively mine the dark metabolic functions encoded by nonhomologous proteins, significantly expanding species-specific metabolic networks. We validate AlphaGEM's accuracy by building GEMs for eukaryotes (e.g., Schizosaccharomyces pombe , Candida albicans ) and prokaryotes (e.g., Klebsiella pneumoniae , Bacillus subtilis ), achieving predictions comparable to manually curated models while outperforming existing tools. AlphaGEM also successfully reconstructs GEMs for Mus musculus and Cricetulus griseus , showcasing its great potential for uncovering dark metabolism in complex mammals. Lastly, we demonstrate that AlphaGEM could facilitate the automatic GEMs reconstruction for 332 distinct yeast species with high prediction fidelity. In conclusion, AlphaGEM provides unprecedented opportunities for the precise, rapid construction of GEMs across diverse domains, which sets a solid foundation for universal functional analysis of non-model organisms having genome sequences available.

Article activity feed