Evaluation of Cascade Forests towards Interpretable ECG Arrhythmia Classification: Performance, Interpretability, and Deployment Trade-offs
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Deep learning models achieve high accuracy in electrocardiogram (ECG) arrhythmia classification but pose critical barriers to clinical deployment: opaque decision-making processes that resist clinical validation, substantial computational requirements that necessitate specialised hardware, and training instability that requires extensive expertise. This study provides a rigorous evaluation of cascade forest architectures, hierarchical tree-based ensembles that preserve interpretability while enabling representation learning, as potential alternatives for clinical ECG analysis. Results Our cascade forest implementation, combining Discrete Wavelet Transform features with a three-layer ensemble architecture, achieved 98.79% accuracy and 92.93% macro-F1 score on the MIT-BIH Arrhythmia Database. The model demonstrated statistically significant superiority over Random Forest baselines (86.83% macro-F1, Wilcoxon p = 0.032, Cohen's d = 1.24) with particularly pronounced improvements for clinically critical minority classes: Ventricular ectopic beats (F1: 0.96 vs 0.93) and Fusion beats (F1: 0.80 vs 0.63, representing 27% relative improvement). Performance approached contemporary deep learning benchmarks (99.2–99.5% accuracy range) with deficits of 0.41–0.71 percentage points. Comprehensive SHAP analysis validated framework reliance on physiologically plausible features: wavelet coefficients capturing QRS morphology, RR interval dynamics, and signal variance. Computational analysis revealed 2.79 hours of execution time requiring only standard CPU infrastructure without specialised GPU acceleration. Conclusions These findings on MIT-BIH data suggest that cascade forests warrant further investigation as interpretable alternatives for clinical arrhythmia detection, particularly in deployment contexts where transparency, computational accessibility, and reliable minority class detection outweigh marginal accuracy improvements. Multi-database validation and prospective clinical evaluation remain essential to establish generalisability beyond the evaluated benchmark.