A Machine Learning Approach Reveals CRISPR-Cas I-F as a Genomic Marker of Antibiotic Susceptibility in Uropathogenic E. coli
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Antimicrobial resistance (AMR) in Escherichia coli is a critical global health challenge, particularly in urinary tract infections, where first-line treatments are increasingly compromised. While horizontal gene transfer (HGT) via mobile genetic elements is a major driver of AMR, the genomic factors that may constrain resistance gene acquisition remain underexplored. CRISPR-Cas systems, which provide adaptive immunity against foreign DNA, could influence AMR dynamics, but their role in E. coli remains incompletely understood.
Methods
We conducted a comprehensive whole-genome analysis of uropathogenic E. coli isolates, including a newly sequenced collection from Australian clinical samples and an independent, globally sourced validation cohort. Antimicrobial susceptibility profiles were integrated with CRISPR-Cas subtype classification, resistance gene burden, and mobile element content. Elastic net regression, adaptive lasso, and tree-based machine learning models were used to identify genomic predictors of resistance, with performance validated across both datasets.
Results
CRISPR-Cas subtype I-F was consistently associated with susceptibility to antibiotics commonly acquired through HGT, including trimethoprim and ampicillin, and linked to lower ARG and MGE burden. In contrast, Type I-E arrays, especially when co-occurring with orphan I-F arrays, were associated with increased resistance. These associations remained robust after adjusting for phylogroup, plasmid content, and genomic background, and were validated across datasets.
Conclusions
Subtype-specific CRISPR-Cas systems shape antibiotic resistance profiles in E. coli , with Type I-F functioning as a potential genomic barrier to ARG acquisition. These findings highlight CRISPR array typing as a novel biomarker for AMR risk prediction and surveillance, and suggest new opportunities for leveraging CRISPR-based mechanisms to limit resistance propagation in clinical contexts.