The Inefficacy of Artificial Intelligence Large Language Models in Healthcare: A Clinical and Statistical Perspective

Michael Williams
Raeed Kabir
Tariq Nakhooda

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective: This perspective piece examines the role of Large Language Models (LLMs) in healthcare, arguing that despite significant investment, these models have had only a limited impact. Moreover, we argue that LLMs must replicate key phases of primary healthcare delivery to be a force multiplier, a necessary condition to address the global burden of disease. Discussion: We argue that LLMs lack the metacognitive capacity for ranked, dynamic reasoning. This is evidenced by clinically dangerous hallucinations and inability to perform unless complete information is provided. We extend clinical critiques with a statistical argument and a simulation exercise demonstrating that LLM-based diagnosis is not merely impractical but structurally incapable of converging on correct diagnoses in realistic clinical settings. Conclusion: Unless LLMs can independently collect patient history and triage, eliminate differential diagnoses, provide a treatment plan, and generate encounter notes, these models will not succeed in improving the efficiency of primary care delivery by human doctors. A different approach grounded in cognitive AI and structured reasoning is necessary. AI models should instead be seeded with weights provided by a panel of expert physicians to approximate an independent robot doctor.

Version published to 10.20944/preprints202603.2228.v2
Mar 31, 2026
Version published to 10.20944/preprints202603.2228.v1
Mar 27, 2026

The Inefficacy of Artificial Intelligence Large Language Models in Healthcare: A Clinical and Statistical Perspective

This article has 3 authors:
1. Michael Williams
2. Raeed Kabir
3. Tariq Nakhooda
This article has no evaluationsLatest version Mar 31, 2026
Rethinking Medical LLM Hallucinations: A System-Level Survey

This article has 4 authors:
1. Peyman Passban
2. Asha Matthews
3. Tanya Roosta
4. Vijay Vankadaru
This article has no evaluationsLatest version Mar 23, 2026
Auditing frontier general-purpose large language models in biomedical tasks: reasoning gains, extraction limits, and benchmark reliability

This article has 9 authors:
1. Yu Hou
2. Zaifu Zhan
3. Min Zeng
4. Yifan Wu
5. Shuang Zhou
6. Xiaoyi Chen
7. Huixue Zhou
8. Meijia Song
9. Rui Zhang
This article has no evaluationsLatest version Feb 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Inefficacy of Artificial Intelligence Large Language Models in Healthcare: A Clinical and Statistical Perspective

Rethinking Medical LLM Hallucinations: A System-Level Survey

Auditing frontier general-purpose large language models in biomedical tasks: reasoning gains, extraction limits, and benchmark reliability