Adapting LLMs for Biomedical Natural Language Processing: A Comprehensive Benchmark Study on Fine-Tuning Methods

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The application of large language models (LLMs) in biomedical natural language processing (NLP) shows promise, yet the complexity and specificity of biomedical texts often challenge general-purpose models. Fine-tuning, which customizes LLMs for domain-specific tasks, is essential to address these challenges and optimize performance. This study systematically evaluates various fine-tuning methods for adapting LLMs to biomedical NLP tasks, focusing on full fine-tuning (FFT) and parameter-efficient fine-tuning (PEFT) techniques such as LoRA, QLoRA, and P-tuning. Weassess these methods across 12 benchmark datasets from the Biomedical Literature Understanding with RelationBased Benchmarks (BLURB) corpus, using state-of-the-art LLMs of varying sizes, including LLaMA-3, FLAN-T5, Chat-GLM4, and UL2. Their performance is compared to domain-specific models like PubMedBERT and BioClinicalBERT. The results indicate that fine-tuning significantly enhances performance compared to zero-shot settings, with LoRA and QLoRA emerging as the most effective and computationally efficient approaches. FFT generally does not exhibit a distinct advantage over PEFT methods. Notably, fine-tuned LLMs outperformed domain-specific BERT-based models in most cases, underscoring the potential of LLMs for complex biomedical tasks when tailored appropriately. In contrast, few-shot experiments revealed that fine-tuning provides a more stable and effective optimization strategy than in-context learning methods for biomedical applications. This study offers a comprehensive analysis of fine-tuning techniques, shedding light on how LLMs can be effectively adapted for advanced biomedical NLP tasks, thereby contributing to more efficient and versatile models for biomedical research and clinical practice. The code for this study is publicly available on Github

Article activity feed