Adapting LLMs for Biomedical Natural Language Processing: A Comprehensive Benchmark Study on Fine-Tuning Methods

Jin Li
Junjie Zhu
Shen Zhao
Yiyan Deng
Yongming Miao
Jun Xu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The application of large language models (LLMs) in biomedical natural language processing (NLP) shows promise, yet the complexity and specificity of biomedical texts often challenge general-purpose models. Fine-tuning, which customizes LLMs for domain-specific tasks, is essential to address these challenges and optimize performance. This study systematically evaluates various fine-tuning methods for adapting LLMs to biomedical NLP tasks, focusing on full fine-tuning (FFT) and parameter-efficient fine-tuning (PEFT) techniques such as LoRA, QLoRA, and P-tuning. Weassess these methods across 12 benchmark datasets from the Biomedical Literature Understanding with RelationBased Benchmarks (BLURB) corpus, using state-of-the-art LLMs of varying sizes, including LLaMA-3, FLAN-T5, Chat-GLM4, and UL2. Their performance is compared to domain-specific models like PubMedBERT and BioClinicalBERT. The results indicate that fine-tuning significantly enhances performance compared to zero-shot settings, with LoRA and QLoRA emerging as the most effective and computationally efficient approaches. FFT generally does not exhibit a distinct advantage over PEFT methods. Notably, fine-tuned LLMs outperformed domain-specific BERT-based models in most cases, underscoring the potential of LLMs for complex biomedical tasks when tailored appropriately. In contrast, few-shot experiments revealed that fine-tuning provides a more stable and effective optimization strategy than in-context learning methods for biomedical applications. This study offers a comprehensive analysis of fine-tuning techniques, shedding light on how LLMs can be effectively adapted for advanced biomedical NLP tasks, thereby contributing to more efficient and versatile models for biomedical research and clinical practice. The code for this study is publicly available on Github

Version published to 10.21203/rs.3.rs-7369550/v1 on Research Square
Nov 13, 2025

Optimizing discharge summary generation: fine-tuning LLMs by DoRA and iterative self-evaluation for enhanced medical text generation

This article has 5 authors:
1. Wenbin Li
2. Hui Feng
3. Chao Hu
4. Minpeng Xu
5. Longlong Cheng
This article has no evaluationsLatest version Nov 4, 2025
Automating Evaluation of LLM-generated Responses to Patient Questions about Rare Diseases

This article has 7 authors:
1. Min Zhao
2. Inez Y. Oh
3. Aditi Gupta
4. Sally Cohen-Cutler
5. Kathryn M. Harmoney
6. Albert M. Lai
7. Bryan A. Sisk
This article has no evaluationsLatest version Oct 7, 2025
RAG-KED Summarization: A Framework for Knowledge-Augmented Article Summarization with Large Language Models

This article has 2 authors:
1. Abdulrehman Mohsen Ahmed Zeyad
2. Arun Biradar
This article has no evaluationsLatest version Oct 30, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Optimizing discharge summary generation: fine-tuning LLMs by DoRA and iterative self-evaluation for enhanced medical text generation

Automating Evaluation of LLM-generated Responses to Patient Questions about Rare Diseases

RAG-KED Summarization: A Framework for Knowledge-Augmented Article Summarization with Large Language Models