Advancing Conversational Diagnostic AI with Multimodal Reasoning

Ryutaro Tanno
Khaled Saab
Jan Freyberg
Chunjong Park
Tim Strother
Yong Cheng
Wei-Hung Weng
David Barrett
David Stutz
Nenad Tomasev
Anil Palepu
Valentin Liévin
Yash Sharma
Abdullah Ahmed
Elahe Vedadi
Roma Ruparel
Kimberly Kanada
Cian Hughes
Yun Liu
Geoff Brown
Yang Gao
Sean Li
S. Sara Mahdavi
James Manyika
Katherine Chou
Yossi Matias
Avinatan Hassidim
Dale Webster
Pushmeet Kohli
Ali Eslami
Joelle Barral
Adam Rodman
Vivek Natarajan
Mike Schaekermann
Tao Tu
Alan Karthikesalingam

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

AI systems based on Large Language Models (LLMs) have demonstrated great potential for conducting diagnostic conversations but evaluation has been largely limited to language-only interactions, deviating from the real-world requirements of remote care delivery. Instant messaging platforms can permit clinicians and patients to upload and discuss multimodal medical artifacts seamlessly in conversation, but the ability of LLMs to reason over such data while preserving other attributes of competent diagnostic conversation remains unknown. Here we advance the conversational diagnosis and management performance of the Articulate Medical Intelligence Explorer (AMIE) through a new capability to gather and interpret multimodal medical data, and reason about this precisely during consultations. Leveraging Gemini 2.0 Flash, our system implements a state-aware dialogue framework, where conversation flow is dynamically controlled by intermediate model outputs reflecting patient states and evolving diagnoses. Follow-up questions are strategically directed by uncertainty in such patient states, leading to a more structured multimodal history-taking process which emulates that of experienced clinicians. We compared AMIE to primary care physicians (PCPs) in a randomized, double-blind study of chat based consultations with 25 patient actors in the style of an Objective Structured Clinical Examination (OSCE). We constructed 105 evaluation scenarios, using common artifacts such as smartphone photos of skin conditions, ECG tracings, and PDFs of clinical documents across a diversity of conditions and demographics. We designed a rubric to assess multiple aspects of the multimodal capability of AMIE and PCPs, in addition to other clinically meaningful axes such as history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. Evaluation by 18 specialists revealed superior performance of AMIE relative to PCPs in handling and reasoning about multimodal data on 7 out of 9 axes while displaying similar superior performance in the non-multimodal metrics, including diagnostic accuracy, on 29 out of 32 axes. These results demonstrate clear progress in multimodal conversational diagnostic AI, although real-world translation necessitates further research.

Version published to 10.21203/rs.3.rs-6713863/v1 on Research Square
Jun 4, 2025

A conversational artificial intelligence agent for medication reconciliation and review

This article has 5 authors:
1. Rahul C Deo
2. Shinichi Goto
3. Tara Jain
4. Sarah Meier
5. Rahul Patel
This article has no evaluationsLatest version Jun 17, 2025
Zero-Shot Large Language Models for Long Clinical Text Summarization with Temporal Reasoning

This article has 9 authors:
1. Maya Kruse
2. Shiyue Hu
3. Nicholas Derby
4. Yifu Wu
5. Samantha Stonbraker
6. Bingsheng Yao
7. Dakuo Wang
8. Elizabeth Goldberg
9. Yanjun Gao
This article has no evaluationsLatest version Jul 23, 2025
Implementation of Large Language Models in Electronic Health Records

This article has 3 authors:
1. Maxime Griot
2. Jean Vanderdonckt
3. Demet Yuksel
This article has no evaluationsLatest version Jul 4, 2025

Listed in

Abstract

Article activity feed

Related articles

A conversational artificial intelligence agent for medication reconciliation and review

Zero-Shot Large Language Models for Long Clinical Text Summarization with Temporal Reasoning

Implementation of Large Language Models in Electronic Health Records