Multimodal AI Decodes Extreme Environment Functional Dark Matter Beyond Homology
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Functional annotation of proteins from extreme environments represents a major bottleneck for bioresource discovery, as a vast reservoir of functional dark matter defies existing homology-based methods. We demonstrate that environmental pressures impart conserved physicochemical energy signatures that co-determine protein function with sequence and structure. Here we developed ACCESS, a multimodal graph neural network employing hierarchical contrastive learning with a tailored label-sample co-embedding to fuse energy, sequence, and structural information and overcome homology scarcity. ACCESS surpasses state-of-the-art methods including BLASTp and CLEAN in annotating low-identity enzymes. Applied to extreme environmental metagenomics, we constructed a function map of extremophile enzymes to expand the biocatalyst library, pinpointed functionally critical residues to guide rational design, and enabled large-scale, function-based macro-evolutionary analyses. This paradigm transcends the limitations of homology, illuminating protein dark matter and accelerating the exploration of the biosphere’s functional diversity for applications in biotechnology and therapeutic development.