Evaluation of a Multimodal Custom Finetuned LLM for Virtual Healthcare Consultations

Pranav Upadhyaya

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

@pranavupd's saved articles (pranavupd)

Abstract

We present a modular, privacy-conscious prototype for multimodal agency with retrieval-augmentedgeneration (RAG) for a virtual medical assistant in healthcare consultation. The system features a locallydeployed LLaMA 3.2 11B with 4-bit quantization to keep the model small yet efficient. The model directlyaccepts both images and text and has been fine-tuned using 50,000 image label pairs. The image label pairs aretaken from the MedTrinity dataset, which consists of a wide variety of medical-related image-text pairs. Themodel was fine-tuned to enhance multimodal query answering in medical contexts. Text, image, and speechinputs are all supported. Speech is transcribed via the Assembly AI transcription API. For retrieval-augmentedgeneration, ChromaDB semantically stores indexed medical documents sourced from the MedQuAD dataset,where 41,000 medicine-related question–answer pairs are stored.We evaluate the finetuned model by comparing it with the base model, both of which are compared with andwithout the support of Retrieval Augmented Generation (RAG). We assess the response via LLM as ajudgement criterion via OpenAI’s GPT-4.1. We use strict vs nonstrict evaluations of the model against theMMMU benchmark. For the MMMU dataset, we select the fields of basic medical science, clinical medicine,and diagnostic & laboratory medicine. Each field was evaluated with 30 questions per LLM variant with orwithout RAG support.

Version published to 10.31219/osf.io/p3wvs_v1 on OSF Preprints
Jun 18, 2025

A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model

This article has 16 authors:
1. Hao Chen
2. Zhe Xu
3. Ziyi LIU
4. Junlin HOU
5. Ma Jiabo
6. Cheng Jin
7. Yihui Wang
8. Zhixuan CHEN
9. Zhengyu ZHANG
10. Fuxiang HUANG
11. Zhengrui GUO
12. Fengtao ZHOU
13. Yingxue XU
14. Xi WANG
15. Ronald Chan
16. Li Liang
This article has no evaluationsLatest version Oct 1, 2025
MedRAGent: An Automatic Literature Retrieval and Screening System Utilizing Large Language Models with Retrieval-Augmented Generation

This article has 7 authors:
1. Zhuoyi Chen
2. Tianyi Liu
3. Yangrui Mo
4. Qishen Fu
5. Sibin Lei
6. Tiejun Tong
7. Xiaoyu Tang
This article has no evaluationsLatest version Sep 19, 2025
For clinical data extraction, QLoRA attains accuracy close to LoRA while requiring lower compute resources

This article has 3 authors:
1. Prabin R. Shakya
2. Ayush Khaneja
3. Kavishwar B. Wagholikar
This article has no evaluationsLatest version Oct 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Versatile Pathology Co-pilot via Reasoning Enhanced Multimodal Large Language Model

MedRAGent: An Automatic Literature Retrieval and Screening System Utilizing Large Language Models with Retrieval-Augmented Generation

For clinical data extraction, QLoRA attains accuracy close to LoRA while requiring lower compute resources