Advancing Conversational Diagnostic AI with Multimodal Reasoning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

AI systems based on Large Language Models (LLMs) have demonstrated great potential for conducting diagnostic conversations but evaluation has been largely limited to language-only interactions, deviating from the real-world requirements of remote care delivery. Instant messaging platforms can permit clinicians and patients to upload and discuss multimodal medical artifacts seamlessly in conversation, but the ability of LLMs to reason over such data while preserving other attributes of competent diagnostic conversation remains unknown. Here we advance the conversational diagnosis and management performance of the Articulate Medical Intelligence Explorer (AMIE) through a new capability to gather and interpret multimodal medical data, and reason about this precisely during consultations. Leveraging Gemini 2.0 Flash, our system implements a state-aware dialogue framework, where conversation flow is dynamically controlled by intermediate model outputs reflecting patient states and evolving diagnoses. Follow-up questions are strategically directed by uncertainty in such patient states, leading to a more structured multimodal history-taking process which emulates that of experienced clinicians. We compared AMIE to primary care physicians (PCPs) in a randomized, double-blind study of chat based consultations with 25 patient actors in the style of an Objective Structured Clinical Examination (OSCE). We constructed 105 evaluation scenarios, using common artifacts such as smartphone photos of skin conditions, ECG tracings, and PDFs of clinical documents across a diversity of conditions and demographics. We designed a rubric to assess multiple aspects of the multimodal capability of AMIE and PCPs, in addition to other clinically meaningful axes such as history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. Evaluation by 18 specialists revealed superior performance of AMIE relative to PCPs in handling and reasoning about multimodal data on 7 out of 9 axes while displaying similar superior performance in the non-multimodal metrics, including diagnostic accuracy, on 29 out of 32 axes. These results demonstrate clear progress in multimodal conversational diagnostic AI, although real-world translation necessitates further research.

Article activity feed