A Deep Learning–Based Imaging Informatics Framework for Automated Detection of Plasmodium Falciparum in Blood Smear Microscopy
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Malaria remains a major global health burden, with over 249 million cases reported worldwide in 2022. Light microscopy of peripheral blood smears remains the diagnostic gold standard but is labor-intensive, operator-dependent, and prone to variability, particularly in resource-limited settings. Imaging informatics and deep learning offer the potential to automate and standardize malaria screening workflows. Objective To develop and validate a high-sensitivity convolutional neural network (CNN)–based imaging informatics model for automated classification of segmented Plasmodium falciparum–infected erythrocytes and to benchmark its diagnostic performance against a traditional Random Forest classifier trained on the full high-dimensional pixel feature space. Methods A total of 27,558 segmented erythrocyte images from the NIH Malaria Dataset were used. Images underwent preprocessing and augmentation prior to training a sequential CNN comprising three convolutional layers optimized using the Adam optimizer. For comparison, a Random Forest classifier was rigorously trained on the full pixel-level feature space without spatial feature extraction. Model performance was evaluated on an independent test set (n = 5,511) using accuracy, sensitivity, specificity, negative predictive value (NPV), and area under the receiver operating characteristic curve (AUC). Results The Random Forest classifier demonstrated near-random performance when applied to the full pixel feature space, achieving an accuracy of 49.55% and an AUC of 0.493. In contrast, the CNN achieved an accuracy of 95.50% (95% CI: 94.9–96.1), representing a 45.95% absolute improvement. The CNN demonstrated high sensitivity (96.12%), high NPV (96.07%), and excellent discriminative ability (AUC = 0.986). Conclusion This study demonstrates that deep learning–based imaging informatics substantially outperforms traditional pixel-based machine learning approaches for malaria microscopy classification. The failure of the Random Forest model highlights the necessity of spatial feature extraction in high-dimensional image data. The high sensitivity and NPV of the proposed CNN support its potential role as an automated first-pass screening tool to augment microscopy-based malaria diagnosis, particularly in high-burden and resource-constrained settings.