Large-scale Chest Disease Diagnosis Enabled by Multimodal LargeLanguage Models with Self-Supervised Fine-Tuning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Chest X-rays (CXR), as the most commonly used imaging modality in clinical practice, are widely applied invarious tasks including disease screening, diagnosis, and postoperative monitoring. However, existing task-specificclassification models suffer from limited applicability, high demands for labeled data quality and scale, and poorgeneralization to out-of-distribution samples. To address these challenges, we propose ChestX-GPT, a multimodallarge language model (MLLM) designed for large-scale chest disease diagnosis. ChestX-GPT is built upon the pre-trained DeepSeek architecture and introduces an innovative self-supervised fine-tuning strategy aimed at enhancingits self-learning ability and zero-shot generalization capability in sparse-text-label environments. Specifically, weconstruct the ChestX-800K dataset, which includes 800,000 chest X-ray images and sparse textual labels coveringnearly 20 types chest diseases. Based on this dataset, we design a self-supervised fine-tuning method that enables themodel to automatically learn multi-granular image-text feature representations from weakly supervised annotationsand automatically generate "Image–ROI–Description" triplet labels. Experimental results show that ChestX-GPTexcels in diagnosising and locating nearly 20 types of chest diseases, demonstrating strong interpretability and zero-shot generalization capabilities. Furthermore, the model can generate high-quality radiology summaries, showcasingits outstanding abilities in medical understanding and language generation.