QoQ-Med3: Robust Multimodal Clinical Analysis Foundation Model with Reasoning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multimodal reasoning–based foundation models (MRFMs) hold considerable promise for addressing key challenges in medicalpractice, yet their readiness for real-world deployment remains insufficiently explored. To bridge this gap, we developedtwo MRFMs (QoQ-Med3 and QoQ-Med3-MIMIC) and systematically evaluated their (i) generalizability to previously unseenclinical modalities and tasks, (ii) transferability to held-out datasets collected across different clinical sites, and (iii) robustnessto real-world challenges like cross-site heterogeneity. Our results demonstrate that these models can learn transferablerepresentations across modalities, tasks, and heterogeneous clinical datasets. QoQ-Med3 achieves an overall balancedaccuracy of 71.3% and a F1 of 0.349, superceding all open source and closed source models including GPT-4o, with particularly pronounced gains in understudied modalities such as ultrasound and mammography. The model trained on public clinical dataonly generalized to the both the held-out MIMIC-IV and the private JHU PMAP dataset collected at Johns Hopkins Universityhospital. In addition, the extrinsic hallucination rates are reduced by 78.2 percent after training. Collectively, our findingshighlight both the potential of multimodal reasoning-based clinical foundation models and the critical next steps required tomake them robust and reliable for real-world deployment.

Article activity feed