An Advanced Infrared-Based Thoracic Disease Detection Framework Using Vision Transformer-Enabled Extended Convolutional Neural Networks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Chest X-ray (CXR) imaging is a prevalent and cost-effective way to find thoracic disorders such pneumonia, TB, and cardiomegaly. Even though CXRs are used a lot in medicine, it is still hard to read them correctly since it takes a lot of skill and is often sensitive to differences between readers. Deep learning-based computer-aided detection (CAD) systems have garnered interest for their capacity to aid radiologists by automating the identification and categorization of chest illnesses, therefore addressing these constraints. While deep learning has greatly improved the processing of medical images, many current models for finding chest diseases only use Convolutional Neural Networks (CNNs). These models might not fully reflect global contextual linkages, which makes them less useful. The NIH Chest X-ray dataset, which is often used to train models, has a lot of spatial information that regular CNNs might not be able to use to its full potential. There is still a pressing demand for strong diagnostic models that successfully integrate local and global feature extraction to enhance classification accuracy and generalizability. This paper offers a hybrid deep learning architecture that integrates Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to diagnose chest disorders utilizing the NIH Chest X-ray dataset, which comprises 112,120 labeled pictures from 30,805 individuals. CNNs help the model see spatial characteristics, and ViTs help it see contextual connections. Generative Adversarial Networks (GANs) are used to add more data, making it more diverse and stronger. Using NLP-derived labels for weakly-supervised learning and sophisticated augmentation using transfer learning makes the model work even better. The suggested CNN-ViT model does better than older architectures in terms of accuracy, precision, recall, and F1-score. It shows better diagnostic capacity and generalization, which means it might be very useful for automated chest illness identification in a clinical setting.