Predicting New Delhi Metallo-beta-Lactamase-1 (NDM-1) Mutation-Tolerant Residues via ESM-2 Embeddings and Graph-Based Score Propagation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: New Delhi metallo-beta-lactamase-1 (NDM-1) confers resistance to most beta-lactam antibiotics, including carbapenems. With over 29 documented NDM variants worldwide, identifying residue positions that can tolerate mutation while preserving function is important for resistance surveillance and inhibitor design. We developed an unsupervised computational pipeline that combines ESM-2 protein language model embeddings with graph-based score propagation to rank all 270 NDM-1 residues by predicted mutation tolerance. Results: The pipeline used no mutation labels during scoring; known mutation sites were used only for evaluation. On this benchmark (10 known mutation positions, 260 non-mutation positions), the method achieved ROC-AUC 0.792 (95% CI: 0.558-0.992), outperforming random expectation (AUC = 0.5) and a per-residue ESM-2 log-likelihood baseline (AUC approximately 0.65). Seven of 10 known mutation sites were recovered in the top 30 predictions (70% recall; 6.0-fold enrichment, Fisher’s exact test p < 0.0001). Bootstrap, leave-one-out, and permutation analyses supported robustness despite the limited number of validated positives. Conclusions: ESM-2 embedding variance combined with structural graph propagation provides a reproducible and biologically plausible strategy for prioritizing candidate mutation-tolerant hotspots in NDM-1 under low-label conditions. Predictions represent mutation tolerance, not direct resistance enhancement, and should be treated as candidates for experimental validation.