Current applications and challenges in large language models for patient care: a systematic review

Felix Busch
Lena Hoffmann
Christopher Rueger
Elon HC van Dijk
Rawen Kader
Esteban Ortiz-Prado
Marcus R. Makowski
Luca Saba
Martin Hadamitzky
Jakob Nikolas Kather
Daniel Truhn
Renato Cuocolo
Lisa C. Adams
Keno K. Bressem

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

The introduction of large language models (LLMs) into clinical practice promises to improve patient education and empowerment, thereby personalizing medical care and broadening access to medical knowledge. Despite the popularity of LLMs, there is a significant gap in systematized information on their use in patient care. Therefore, this systematic review aims to synthesize current applications and limitations of LLMs in patient care.

Methods

We systematically searched 5 databases for qualitative, quantitative, and mixed methods articles on LLMs in patient care published between 2022 and 2023. From 4349 initial records, 89 studies across 29 medical specialties were included. Quality assessment was performed using the Mixed Methods Appraisal Tool 2018. A data-driven convergent synthesis approach was applied for thematic syntheses of LLM applications and limitations using free line-by-line coding in Dedoose.

Results

We show that most studies investigate Generative Pre-trained Transformers (GPT)-3.5 (53.2%, n = 66 of 124 different LLMs examined) and GPT-4 (26.6%, n = 33/124) in answering medical questions, followed by patient information generation, including medical text summarization or translation, and clinical documentation. Our analysis delineates two primary domains of LLM limitations: design and output. Design limitations include 6 second-order and 12 third-order codes, such as lack of medical domain optimization, data transparency, and accessibility issues, while output limitations include 9 second-order and 32 third-order codes, for example, non-reproducibility, non-comprehensiveness, incorrectness, unsafety, and bias.

Conclusions

This review systematically maps LLM applications and limitations in patient care, providing a foundational framework and taxonomy for their implementation and evaluation in healthcare settings.

Version published to 10.1038/s43856-024-00717-2
Jan 21, 2025
Version published to 10.1101/2024.03.04.24303733 on medRxiv
Mar 5, 2024

Evaluating Large Language Models for Translating Caries Guidelines into Clinical Decision Support

This article has 8 authors:
1. Gu Nan
2. Bingxin Fan
3. Yao Yuan
4. Xinliang Duan
5. Sichen Han
6. Zhenyong Tang
7. Jiayu Shen
8. Zilin Wang
This article has no evaluationsLatest version Jan 28, 2026
Large Language Model Biases in Healthcare: A Scoping Review and Call for an Integrated Assessment Framework

This article has 8 authors:
1. Lu He
2. D. Phuong Do
3. Vishesh Girish Shet
4. Omar Farghaly
5. Priya Deshpande
6. Praveen Madiraju
7. Jiancheng Ye
8. Molly Beestrum
This article has no evaluationsLatest version Jan 16, 2026
Harmfulness analysis of Large Language Model answers to dental questions

This article has 4 authors:
1. Martyna Mysior
2. Marek Piotr Mysior
3. Pamela Maslowski
4. Katarzyna Skośkiewicz-Malinowska
This article has no evaluationsLatest version Jan 12, 2026

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

Evaluating Large Language Models for Translating Caries Guidelines into Clinical Decision Support

Large Language Model Biases in Healthcare: A Scoping Review and Call for an Integrated Assessment Framework

Harmfulness analysis of Large Language Model answers to dental questions