EnzymeCAGE: A Geometric Foundation Model for Enzyme Retrieval with Evolutionary Insights
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Enzyme catalysis is fundamental to life, driving the chemical transformations that sustain biological processes and support industrial applications. However, unraveling the intertwined relationships between enzymes and their catalytic reactions remains a significant challenge. Here, we present EnzymeCAGE, a catalytic-specific geometric foundation model trained on approximately 1 million structure-informed enzyme-reaction pairs, spanning over 2,000 species and encompassing an extensive diversity of genomic and metabolic information. EnzymeCAGE features a geometry-aware multi-modal architecture coupled with an evolutionary information integration module, enabling it to effectively model the nuanced relationships between enzyme structure, catalytic function, and reaction specificity. EnzymeCAGE supports both experimental and predicted enzyme structures and is applicable across diverse enzyme families, accommodating a broad range of metabolites and reaction types. Extensive evaluations demonstrate EnzymeCAGE’s state-of-the-art performance in enzyme function prediction, reaction de-orphaning, catalytic site identification, and biosynthetic pathway reconstruction. These results highlight its potential as a transformative foundation model for understanding enzyme catalysis and accelerating the discovery of novel biocatalysts.