Integrating Phenotypic and Genomic Data with Machine Learning to Predict Antimicrobial Resistance and Identify Genetic Biomarkers in<em> E. coli</em>
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Antimicrobial resistance in Escherichia coli is a significant public health concern globally, driven by increased resistance to commonly used antimicrobial agents such as β-lactams and fluoroquinolones. This study aimed to develop a machine-learning framework to predict antimicrobial resistance in Escherichia coli by integrating antimicrobial susceptibility testing data with genomic biomarker analysis. A dataset comprising 17,122 Escherichia coli clinical isolates was obtained from the Bacterial and Viral Bioinformatics Resource Center (BV-BRC). After preprocessing, fivefold cross-validation was used to train and test five machine learning models: Random Forest, XGBoost, Support Vector Machine, Logistic Regression, and k-Nearest Neighbors. The highest-performing model was XGBoost, with 0.86 accuracy and 0.932 ROC-AUC, followed by Random Forest, with 0.82 accuracy and 0.89 ROC-AUC. Phylogenetic analysis revealed that resistant isolates clustered together relative to the reference genome of Escherichia coli K-12 MG1655. Genomic biomarkers such as gyrA, parC, CTX-M-15, OXA-1, and various multidrug efflux pumps were identified by the Comprehensive Antibiotic Resistance Database (CARD) and ResFinder as significant resistance determinants in this study. In conclusion, this study demonstrates that combining antimicrobial susceptibility testing with machine learning and genomic biomarkers is a powerful framework for predicting antimicrobial resistance in Escherichia coli.