Beyond the Black Box: A Calibrated, Gene-Centric Pipeline for High-Precision Ciprofloxacin Resistance Prediction in Salmonella enterica

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The transition from traditional methods to genomic-based antimicrobial susceptibility testing (AST) for Salmonella requires models that are biologically interpretable, clinically calibrated, and discriminative. This study analyzed 8,759 Salmonella isolates from the NCBI Pathogen Detection database, to predict ciprofloxacin resistance through two distinct computational architectures: an alignment-free, agnostic k-mer hashing model and a feature-engineered gene-based model (Logistic Regression, Random Forest, and XGBoost). Methodological auditing revealed a theoretical collision probability of 100% when 10 million unique k-mers are mapped into a 16.7 million 2 24 feature space via MurmurHash3, leading to a complete loss of biological signal. In comparison, the gene-based models demonstrated superior performance, with XGBoost achieving a Receiver Operating Characteristic Area Under the Curve (ROC AUC) of 0.989. Furthermore, to ensure clinical reliability, post-hoc Isotonic Regression was applied, refining probability estimates to an average Brier Score of 0.0063, significantly outperforming current clinical benchmarks. Explainability analysis using SHAP identified the gyrA_D87Y mutation as the primary indicator of resistance with an importance of 0.88, while an inverse relationship with aph(3'')-Ib suggested potential collateral sensitivity trade-offs. However, LOSO validation revealed a significant performance decay in clonal, human-restricted lineages such as S. Typhi , showcasing Phylogenetic Leakage as the primary barrier to universal generalization. These findings demonstrate that while gene-based models provide high-fidelity AMR prediction, future frameworks must integrate efflux regulation and lineage-robust audits to move beyond static genomic anchors toward real-time, personalized antimicrobial stewardship.

Article activity feed