Doc Bot: The Medical LLM Fine-tuned on LLaMA 3 8B Using LoRA and Insights from the Medical Field

Abdulmalik Habaebi
Akeem Olowolayemo
Sharyar Wani

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose General-purpose large language models (LLMs) often lack the specialized accuracy required for the medical domain. This research aims to address this gap by developing and evaluating DocBot, a medical LLM, to demonstrate that domain-specific fine-tuning can significantly enhance performance even with constrained computational resources. Methods We fine-tuned the Meta LLaMa 3.1 8B model using Low-Rank Adaptation (LoRA), a parameter-efficient technique. The model was trained on a curated dataset of 2,000 patient-doctor dialogues sourced from ClinicalTrials, EMEA, and PubMed, using a single Tesla T4 GPU. Performance was evaluated against the base LLaMa model using BERTScore, BLEU, and ROUGE metrics, with responses from verified medical professionals serving as the reference. Results DocBot demonstrated significant improvements over the base LLaMa 3.1 8B model across all evaluation metrics. Specifically, DocBot achieved a higher BERTScore F1-score (83.56% vs. 81.47%), indicating enhanced semantic accuracy, fluency, and alignment with expert-generated text. The gains in precision and recall further confirm the model's superior ability to generate relevant and comprehensive medical information. Conclusion The successful development of DocBot showcases the feasibility and impact of creating domain-optimized LLMs efficiently. The results highlight the potential for specialized models to serve as reliable tools for augmenting clinical decision-making and delivering accessible medical support, particularly in resource-limited environments, paving the way for further innovation in specialized AI applications.

Version published to 10.21203/rs.3.rs-7515191/v1 on Research Square
Sep 4, 2025

Evaluation of DeepSeek-R1 and ChatGPT on the Chinese National Medical Licensing Examination: A Multi-Year Comparative Study

This article has 7 authors:
1. Xinran WANG
2. Ziwen LONG
3. Boran ZHU
4. Yan CAO
5. Hanfei TANG
6. Ke HE
7. Shu ZHANG
This article has no evaluationsLatest version Sep 9, 2025
The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP

This article has 3 authors:
1. Henry He
2. Johann Frei
3. Raphael Scheible-Schmitt
This article has no evaluationsLatest version Sep 18, 2025
The Performance of ChatGPT-4o and DeepSeek-R1 in Interpreting Thyroid Nodule Ultrasound Text Report: A Multicenter Study

This article has 6 authors:
1. Yujie Xie
2. Bing Zhan
3. Kangfan Zhang
4. Yuchen Li
5. Jiarui Liu
6. Chunping Ning
This article has no evaluationsLatest version Oct 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Evaluation of DeepSeek-R1 and ChatGPT on the Chinese National Medical Licensing Examination: A Multi-Year Comparative Study

The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP

The Performance of ChatGPT-4o and DeepSeek-R1 in Interpreting Thyroid Nodule Ultrasound Text Report: A Multicenter Study