Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language models (LLMs) have shown greatpotential in medical question answering (MedQA), yet adaptingthem to biomedical reasoning remains challenging due to domainspecificcomplexity and limited supervision. In this work, westudy how prompt design and lightweight fine-tuning affect theperformance of open-source LLMs on PubMedQA, a benchmarkfor multiple-choice biomedical questions. We focus on twowidely used prompting strategies—standard instruction promptsand Chain-of-Thought (CoT) prompts—and apply QLoRA forparameter-efficient instruction tuning. Across multiple modelfamilies and sizes, our experiments show that CoT promptingalone can improve reasoning in zero-shot settings, while instructiontuning significantly boosts accuracy. However, fine-tuning onCoT prompts does not universally enhance performance and mayeven degrade it for certain larger models. These findings suggestthat reasoning-aware prompts are useful, but their benefits aremodel- and scale-dependent. Our study offers practical insightsinto combining prompt engineering with efficient finetuning formedical QA applications.

Article activity feed