From Preliminary Urinalysis to Decision Support: Machine Learning for UTI Prediction in Real-World Laboratory Data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background/Objectives: Urinary tract infections (UTIs) are frequently diagnosed empirically, often leading to over-treatment and rising antimicrobial resistance. This study aimed to develop and evaluate machine learning (ML) models that predict urine culture outcomes using routine urinaly-sis and demographic data, supporting more targeted empirical antibiotic use. Methods: A real-world dataset comprising 8,065 urinalysis records from a hospital laboratory was used to train five ensemble ML models: Random Forest, XGBoost (eXtreme Gradient Boosting), Extra Trees, Voting Classifier, and Stacking Classifier. Models were developed using 10-fold stratified cross-validation and assessed via clinically relevant metrics in-cluding specificity, sensitivity, likelihood ratios, and diagnostic odds ratio (DOR). To en-hance screening utility, threshold optimization was applied to the best-performing model (XGBoost) using the Youden index. Results: XGBoost and Random Forest demonstrated the most balanced diagnostic profiles, with DORs exceeding 21. The Voting and Stacking Classifiers achieved highest specificity (>95%) and positive likelihood ratios (>10), but exhibited lower sensitivity. Feature im-portance analysis identified positive nitrites, white blood cell count, and specific gravity as key predictors. Threshold tuning of XGBoost improved sensitivity from 70.2% to 87.9% and reduced false negatives by 82%, with an associated NPV of 96.4%. The adjusted mod-el reduced overtreatment by 56% compared to empirical prescribing. Conclusions: ML models based on structured urinalysis and demographic data can support clinical decision-making for UTIs. While high-specificity models may reduce unnecessary antibi-otic use, sensitivity trade-offs must be considered. Threshold-optimized XGBoost offers a clinically adaptable tool for empirical treatment decisions, particularly in settings lacking rapid diagnostics.

Article activity feed