Large-scale Chest Disease Diagnosis Enabled by Multimodal LargeLanguage Models with Self-Supervised Fine-Tuning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Chest X-rays (CXR), as the most commonly used imaging modality in clinical practice, are widely applied invarious tasks including disease screening, diagnosis, and postoperative monitoring. However, existing task-specificclassification models suffer from limited applicability, high demands for labeled data quality and scale, and poorgeneralization to out-of-distribution samples. To address these challenges, we propose ChestX-GPT, a multimodallarge language model (MLLM) designed for large-scale chest disease diagnosis. ChestX-GPT is built upon the pre-trained DeepSeek architecture and introduces an innovative self-supervised fine-tuning strategy aimed at enhancingits self-learning ability and zero-shot generalization capability in sparse-text-label environments. Specifically, weconstruct the ChestX-800K dataset, which includes 800,000 chest X-ray images and sparse textual labels coveringnearly 20 types chest diseases. Based on this dataset, we design a self-supervised fine-tuning method that enables themodel to automatically learn multi-granular image-text feature representations from weakly supervised annotationsand automatically generate "Image–ROI–Description" triplet labels. Experimental results show that ChestX-GPTexcels in diagnosising and locating nearly 20 types of chest diseases, demonstrating strong interpretability and zero-shot generalization capabilities. Furthermore, the model can generate high-quality radiology summaries, showcasingits outstanding abilities in medical understanding and language generation.

Article activity feed