Attention-Driven Hybrid Vision Transformer Framework for Explainable Heart Disease Classification with Multi-Modal Unstructured Medical Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Globally, heart disease remains a top health concern, often detected late and complex in its myriad risk factors. In today’s data-rich clinical environment, patient medical records constitute both structured and unstructured data. The traditional predictive modeling techniques often neglect this unstructured data, missing out on vital diagnostic clues. To address this, an Attention-Driven Hybrid Vision Transformer (ADHVT) framework is proposed that holistically combines multi-modal structured and unstructured data in heart disease classification. The framework incorporates Synthetic Minority Over-sampling TEchnique (SMOTE) to balance minority class distributions in both data forms and utilizes a hybrid molecule jump attention-based vision transformer with the Discrete Wavelet Transform (DWT) to extract spatial- and frequency-domain features. The classification is carried out through a convolutional GhostNet-based flashattention architecture integrated with an inception squeeze-and-excitation neural network. To enhance the performance of the framework, optimization is carried out through the groupers and moray eels algorithm. Explainability is provided through SHapley Additive exPlanations, allowing clinicians to justify the prediction. Experimental outcomes establish an outstanding average accuracy of 99.43%, specificity of 99.48%, recall of 99.38%, and error rate of 0.57 across multiple datasets, which proves the framework’s capability to utilize both structured and unstructured clinical data for making a dependable and interpretable prediction of heart disease.

Article activity feed