Machine Learning-Based Symptom-Disease Prediction: A Comprehensive Analysis of Multi-Class Classification Models in Healthcare Decision Support Systems
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Healthcare decision support systems require accurate and efficient methods for disease prediction based on patient symptoms. This study presents a comprehensive analysis of machine learning approaches for multi-class disease classification using both synthetic and real healthcare datasets. We evaluate three machine learning algorithms: Logistic Regression, Random Forest, and Gradient Boosting, achieving classification accuracies of 96.5%, 96.2%, and 96.0% respectively on real clinical data. Our analysis reveals significant symptom-disease relationship patterns, with loss of taste/smell, cough, and fatigue emerging as the most predictive features in real data. The Logistic Regression model demonstrated superior performance with an AUC of 0.999, indicating exceptional discriminative ability across multiple disease classes. We provide detailed feature importance analysis, symptom correlation matrices, and demographic insights that can inform clinical decision-making processes. The real dataset exhibits realistic disease prevalence patterns with 5,000 patients across 10 disease categories and 32 symptom features. Our findings demonstrate the feasibility of automated symptom-based disease prediction systems and provide a foundation for developing clinical decision support tools. This work contributes to the growing body of literature on AI-assisted healthcare diagnostics and establishes benchmarks for future research in symptom-disease prediction models using real clinical data. In addition, we introduce a novel Adaptive Hierarchical Ensemble (AHE) model that achieves substantial computational efficiency (76.5% feature reduction) while maintaining high accuracy (93.5