ChestX-VQA: AI Tool for Multimodal Chest X-ray Analysis and Clinical QA
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Chest radiographs are an important aspect of medical diagnosis, but accurate interpretation often requires combining the image with relevant clinical context. This work presents a multimodal large language model (M-LLM)-based chatbot designed to perform visual question answering (VQA) by processing chest X-ray images and associated clinical text. The publicly available VQA-RAD dataset, which contains chest radiographs and corresponding question–answer pairs, is used for evaluation. The study conducts a comparative evaluation of GIT, CLIP, BLIP, FLAVA, and VLIT, focusing on overall BERTScore, readability, and response time. In addition to automatic metrics, a human assessment by medical practitioners is also carried out to evaluate the clinical relevance and accuracy of the responses. The integration of GIT and T5 yields the best performance with an overall BERTScore of 0.92. The chatbot enables users to upload chest radiographs along with clinical notes and receive clear,context-sensitive responses in the field of healthcare.