Beyond the Black Box: A Calibrated, Gene-Centric Pipeline for High-Precision Ciprofloxacin Resistance Prediction in Salmonella enterica
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The transition from traditional methods to genomic-based antimicrobial susceptibility testing (AST) for Salmonella requires models that are biologically interpretable, clinically calibrated, and discriminative. This study analyzed 8,759 Salmonella isolates from the NCBI Pathogen Detection database, to predict ciprofloxacin resistance through two distinct computational architectures: an alignment-free, agnostic k-mer hashing model and a feature-engineered gene-based model (Logistic Regression, Random Forest, and XGBoost). Methodological auditing revealed a theoretical collision probability of 100% when 10 million unique k-mers are mapped into a 16.7 million 2 24 feature space via MurmurHash3, leading to a complete loss of biological signal. In comparison, the gene-based models demonstrated superior performance, with XGBoost achieving a Receiver Operating Characteristic Area Under the Curve (ROC AUC) of 0.989. Furthermore, to ensure clinical reliability, post-hoc Isotonic Regression was applied, refining probability estimates to an average Brier Score of 0.0063, significantly outperforming current clinical benchmarks. Explainability analysis using SHAP identified the gyrA_D87Y mutation as the primary indicator of resistance with an importance of 0.88, while an inverse relationship with aph(3'')-Ib suggested potential collateral sensitivity trade-offs. However, LOSO validation revealed a significant performance decay in clonal, human-restricted lineages such as S. Typhi , showcasing Phylogenetic Leakage as the primary barrier to universal generalization. These findings demonstrate that while gene-based models provide high-fidelity AMR prediction, future frameworks must integrate efflux regulation and lineage-robust audits to move beyond static genomic anchors toward real-time, personalized antimicrobial stewardship.