LLM Reasoning Does Not Protect Against Clinical Cognitive Biases - An Evaluation Using BiasMedQA

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Cognitive biases are an important source of clinical errors. Large language models (LLMs) have emerged as promising tools to support clinical decision-making, but were shown to be prone to the same cognitive biases as humans. Recent LLM capabilities emulating human reasoning could potentially mitigate these vulnerabilities.

Methods

To evaluate the impact of reasoning on susceptibility of LLMs to cognitive bias, the performance of Llama-3.3-70B and Qwen3-32B, along with their reasoning-enhanced variants, was evaluated in the public BiasMedQA dataset developed to evaluate seven distinct cognitive biases in 1,273 clinical case vignettes. Each model was tested using a base prompt, a debiasing prompt with the instruction to actively mitigate cognitive bias, and a few-shot prompt with additional sample cases of biased responses. For each model pair, two mixed-effects logistic regression models were fitted to determine the impact of biases and mitigation strategies on performance.

Results

In neither of the two models, reasoning capabilities were able to consistently prevent cognitive bias, although both reasoning models achieved better overall performance compared to their respective base model (OR 4.0 for Llama-3.3-70B, OR 3.6 for Qwen3-32B). In Llama-3.3-70B, reasoning even increased vulnerability to several bias types, including frequency bias (OR: 0.6, p = 0.006) and recency bias (OR: 0.5, p < 0.001). In contrast, the debiasing and few-shot prompting approaches demonstrated statistically significant reductions in biased responses across both model architectures, with the few-shot strategy exhibiting substantially greater effectiveness (OR 0.1 vs. 0.6 for Llama-3.3-70B; OR 0.25 vs. 0.6 for Qwen3).

Conclusions

Our results indicate that contemporary reasoning capabilities in LLMs fail to protect against cognitive biases, extending the growing body of literature suggesting that the purported reasoning abilities which may represent sophisticated pattern recognition rather than genuine inferential cognition.

Article activity feed