An open-source, configurable machine learning pipeline for predicting blood culture outcomes from routine haematology parameters

Benjamin Ryan McFadden
Timothy John Jay Inglis
Mark Reynolds

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Bloodstream infections remain a major cause of morbidity and mortality worldwide, yet blood culture positivity rates are typically low, highlighting a need to optimise test use. Machine learning models trained on routinely available haematology parameters have shown promise for predicting blood culture outcomes. However, there is a lack of open-source software to implement methods described in the peer-reviewed literature. Results We present an open-source pipeline for training, evaluation, and reporting of binary classification models that predict blood culture outcomes from complete blood count (CBC), white blood cell differential (DIFF), and cell population data (CPD) generated by Sysmex XN-series haematology analysers. This pipeline implements four classifier types (logistic regression, decision tree, random forest, and XGBoost), two default feature spaces (19-feature CBC/DIFF and 50-feature CBC/DIFF/CPD), and feature selection methods (Boruta all-relevant selection and recursive feature elimination), along with nested cross-validation (CV) to prevent data leakage during feature selection. Trained logistic regression coefficients and decision tree rules were exported in portable formats suited to deployment in spreadsheet-based or laboratory information management system (LIMS) environments without requiring the python programming language runtime. Random forest and XGBoost models were also exported. The pipeline was fully configurable via a single JSON file, and allowed adaptation to any binary classification problem without source code modification. Automated HTML reports with embedded area under the receiver operating characteristic curves and confusion matrices were generated for both training and inference runs. Conclusions This open-source repository addresses key limitations in existing blood culture outcome prediction workflows by providing a reproducible, transparent method, and clinically deployable pipeline. Its configurable architecture, nested CV strategy, multiple feature selection methods, and export of interpretable model artefacts make it suitable for both research and clinical decision support applications.

Version published to 10.1099/acmi.0.001212.v1 on Access Microbiology
Apr 9, 2026

Predicting Iron Deficiencies Using Routine Complete Blood Cell Count Parameters: A Machine Learning Approach and Evaluation

This article has 5 authors:
1. Davide Negrini
2. Laura Pighi
3. Simone Mignolli
4. Gian Luca Salvagno
5. Giuseppe Lippi
This article has no evaluationsLatest version Apr 2, 2026
Machine Learning-Based Prediction of TPPA Confirmation Results in Blood Donor Syphilis Screening: A Large-Scale Multi-Algorithm Comparative Study

This article has 4 authors:
1. Xuelong Ge
2. Mingming Qian
3. Xiaohua Yang
4. Liwei Zhang
This article has no evaluationsLatest version Mar 20, 2026
Blood group prediction using fingerprint

This article has 4 authors:
1. Kishanjee Kumar
2. Rohit Prasad
3. Suman Kumar Ghosh
4. Sourav Mahanta
This article has no evaluationsLatest version Apr 3, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Predicting Iron Deficiencies Using Routine Complete Blood Cell Count Parameters: A Machine Learning Approach and Evaluation

Machine Learning-Based Prediction of TPPA Confirmation Results in Blood Donor Syphilis Screening: A Large-Scale Multi-Algorithm Comparative Study

Blood group prediction using fingerprint