Unraveling the Complexity Gap: A Mechanistic Investigation of Machine Learning Classification in Panic Disorder

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Machine learning (ML) models trained on socioeconomic, physiological, and behavioral markers can classify panic disorder (PD) with high accuracy. Yet the mechanisms underlying these predictions remain poorly understood, limiting clinical translation and theoretical integration. Objective: To investigate why ML models achieve strong PD classification performance by examining feature interactions, individual contributions, model complexity requirements, and socioeconomic risk gradients. Methods: Using complete-case NHANES 1999–2004 data (N = 3,144; 115 PD cases), we applied a multi-method framework including distributional analysis, dimensionality reduction (UMAP, t-SNE), decision trees, SHAP interaction analysis, and socioeconomic stratification. The primary classifier was Gradient Boosting using 11 biopsychosocial predictors. Results: Individual features showed modest discriminative power (Cohen’s d = .13–.70). SHAP identified 10 meaningful interactions, particularly body fat × age and poverty-income ratio × BMI. A shallow decision tree reached only 40.97% accuracy, indicating reliance on multidimensional interactions. Socioeconomic analysis showed a strong gradient (poorest quartile: 6.42% PD prevalence; wealthiest: 1.16%), with highest risk among low-income women. Conclusion: High PD classification accuracy emerges from synergistic biopsychosocial patterns. These results clarify why ML-based classification outperforms traditional screening, identify mechanistic pathways consistent with PD models, and highlight high-risk groups for targeted intervention.

Article activity feed