Machine learning based lineage prediction from AMR phenotypes for Escherichia coli ST131 clade C surveillance across infection types
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Rising antimicrobial resistance (AMR) in Escherichia coli bloodstream infections (BSIs) in high-income settings has typically been dominated by one clone, the sequence type (ST) 131. More specifically, ST131 clade C (ST131-C) is associated with fluoroquinolone resistance and extended-spectrum β-lactamases (ESBLs). Even though urinary tract infections (UTIs) are a known common precursor to BSIs, there is currently limited knowledge on the longitudinal prevalence of ST131-C in UTIs and, therefore, the temporal link between the two infection types. Leveraging available genomic and antimicrobial susceptibility test (AST) data for ciprofloxacin, gentamicin, and ceftazidime in 2790 E. coli BSI isolates, we trained random forest and Extreme Gradient Boosting (XGBoost) classifiers to predict if an E. coli isolate belongs to ST131-C using only AST data. These models were used to predict the yearly prevalence of ST131-C in 22942 UTI and 24866 BSI isolates from Norway. The XGBoost classifier achieved a prediction F1-score of over 70% on a highly unbalanced dataset where only 4.3% of the genomic BSI isolates belonged to ST131-C. The predicted prevalence of ST131-C in UTIs exhibited a similar annual trend to that of BSIs, with a stable infection burden for eight years after its rapid expansion, confirming that the persistence of ST131-C in BSIs is largely driven by ST131-C UTIs. However, a higher prevalence of ST131-C in BSIs (∼7%) compared to UTIs (∼4%) suggests a subsequent enrichment of ST131-C. Our study highlights how existing epidemiological knowledge can be supplemented by utilising extensive data from AMR surveillance efforts without genomic markers.
Impact statement
This study proposes a potential analysis method that leverages AST data, which is already regularly collected for AMR surveillance purposes. Using such data to approximate the population-wide prevalence of MDR clones, such as ST131-C, could allow for larger-scale, retrospective studies of its prevalence in a population than genomic-based methods at a significantly lower cost. Such a method could supplement existing knowledge and epidemiology study practices. We use the proposed method to find relationships between the prevalence of the important MDR E. coli clone ST131-C in UTIs and BSIs in Norway. These results suggest that monitoring and reducing MDR in UTIs could reduce the burden of this invasive clone in hard-to-treat BSIs.
Data summary
All AST data, clonal information for isolates with genomic data, and code used in this study can be found in the following repository: https://github.com/theodorross/EColi-UTI-Predictions . Only the clone and published metadata information is shown for the UTI data shared by Handal, Kaspersen et al .