Large-language-model-based antibiotic resistance gene prediction and resistomes mining in cyanobacterial blooms
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The rapid spread of antibiotic resistance genes (ARGs) in aquatic ecosystems poses a serious public health threat. Conventional ARG detection methods, including sequence-alignment and machine learning, are limited by high false-negative or false-positive rates, especially due to sequence diversity and class imbalance in metagenomic data. Moreover, few tools offer comprehensive solutions for ARG identification, classification, and functional annotation. To address these limitations, we developed ESMARG, a novel protein language model framework based on ESM1v, for accurate ARG identification, classification, and annotation. Trained with both ARG and abundant non-ARG sequences from real-world metagenomes to enhance robustness and reduce false positives, ESMARG significantly outperformed traditional alignment-based (BLAST, DIAMOND) and recent deep learning models (ARGNet, ARG-SHINE), achieving 0.998 precision, 0.939 recall, and 0.968 F1-score for ARG identification. Functional and mechanism classification modules also demonstrated high accuracy (0.986 and 0.987) and computational efficiency. We applied ESMARG to analyze 26 cyanobacterial aggregate (CA) metagenomes collected from Lake Taihu across a full annual cyanobacterial harmful algal bloom (CyanoHAB) cycle. A total of 110 ARGs, spanning 24 drug classes and multiple resistance mechanisms, were detected, with seasonal shifts in ARG abundance and composition. Strong correlations were identified between CA resistomes and microbial community structure, especially with cyanobacteria. Further, ARG abundance was shaped by both biotic factors—such as cyanobacterial dominance and bacterial functional profiles—and abiotic variables, including biochemical oxygen demand and water temperature. Our findings demonstrate the power of ESMARG for high-resolution ARG profiling and reveal complex ecological interactions between resistomes, microbiomes, and environmental factors in CyanoHAB.
Highlights
Developed ESMARG model outperforms BLAST/other deep learning models in ARG detection
ARG showed seasonal dynamics in cyanobacterial metagenomes across algal bloom cycle
ARG was strongly correlate with microbiome structure and cyanobacteria dominance
ARG abundance was shaped by cyanobacteria, bacterial functions, BOD and temperature