Comparative Analysis of Anomaly Detection Methods in Medical Data Using a Multi-metric Evaluation Framework

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Context. Medical datasets used for tuberculosis treatment monitoring often contain heterogeneous, noisy, or incomplete records resulting from data entry errors, measurement inconsistencies, or clinically rare cases. Such anomalies distort statistical patterns and significantly reduce the reliability of predictive analytics and clinical decision-support systems. Effective anomaly detection is therefore necessary to ensure the correctness of medical conclusions and treatment recommendations. Objective. The objective of this study is to develop and evaluate a methodological framework for detecting anomalies in tuberculosis treatment data that increases the accuracy, robustness, and interpretability of machine learning models used in clinical analytics. Method. Four anomaly detection algorithms based on different theoretical principles were analyzed: Isolation Forest (isolation-based), Local Outlier Factor (density-based), One-Class Support Vector Machine (kernel-based), and Elliptic Envelope (covariance-based). Each method was trained and optimized using hyperparameter tuning. Their performance was evaluated within a multi-metric assessment framework including Precision, Recall, F1-score, ROC-AUC, PR-AUC, and Matthews Correlation Coefficient. In addition, a stacking ensemble with LGBM and Logistic Regression as meta-models was proposed. Statistical significance was verified using bootstrapping and Mann–Whitney U testing. Results. After optimization, the Elliptic Envelope method demonstrated the most balanced performance among individual models (F1 = 0.8182, ROC-AUC = 0.9609, MCC = 0.8048). However, the proposed stacking ensemble significantly outperformed all standalone methods, achieving F1 = 0.96, Recall = 1.0, PR-AUC = 0.98, and MCC = 0.95. Statistical testing confirmed that the improvement provided by the ensemble is significant at α = 0.05 ( p  < 0.001). Robustness analysis further demonstrated stable ensemble performance under up to 15% artificial noise distortion. Conclusions. The study shows that anomaly detection effectiveness in medical data depends on the structural characteristics of the feature space, and no single method is universally optimal. The proposed stacking ensemble provides a statistically validated and robust improvement over individual models and is recommended for integration into clinical decision-support systems for tuberculosis monitoring. Future research will focus on extending the approach to multimodal medical datasets and implementing cost-sensitive anomaly evaluation strategies.

Article activity feed