Large Language Models in Healthcare Simulation Education: A Bibliometric Analysis with AI-Assisted Screening

Matthew Pears
Karan Wadhwa
Stephen R Payne
Stathis TH Konstantinidis
Chandra Shekhar Biyani

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) such as ChatGPT are rapidly reshaping healthcare education and simulation-based training in non-technical skills (NTS), yet no bibliometric analysis has mapped this landscape. We searched seven open-access databases (OpenAlex, PubMed, Europe PMC, Crossref, Semantic Scholar, CORE, DOAJ) for English-language publications from January 2020 to March 2026. From 100,277 initial records, a sequential keyword funnel yielded 830 candidate papers, which were screened by 83 independent Claude Sonnet 4.6 AI agents applying pre-specified inclusion criteria (PRISMA-trAIce compliant; Cohen’s kappa = 0.86 pre-reconciliation, 1.0 post-reconciliation). The final AI-verified corpus comprised 551 papers with a compound annual growth rate of 109%, contributions from 2,398 authors across 279 journals in 58 countries, and an h-index of 41. ChatGPT dominated the model landscape (46% of papers), with open-source models virtually absent. Virtual patient chatbots were the leading simulation modality (106 papers). Among NTS domains, communication (145 papers) and decision-making (135 papers) were most studied, whereas teamwork, leadership, situational awareness, and crisis resource management were markedly underrepresented. Only 6 urology-relevant papers were identified, none examining LLM integration within boot camp training formats. The field is growing at extraordinary pace but remains concentrated in a narrow range of NTS domains and a single proprietary model. Critical gaps persist in team-based skills training, open-source model evaluation, and specialty-specific simulation. AI-assisted bibliometric screening using multiple independent agents is feasible, reliable, and scalable, offering a replicable methodology for mapping fast-evolving research fields.

We mapped the research landscape of large language models in healthcare simulation and non-technical skills training by analysing 551 rigorously screened papers published between 2020 and 2026. Our analysis reveals a field that has exploded since the release of ChatGPT, growing at over 100% per year, but one with significant blind spots. Most research focuses on communication and clinical decision-making, while the team-based skills that prevent patient safety failures—teamwork, leadership, situational awareness, and crisis resource management—are barely studied. Almost half of all papers use a single proprietary model (ChatGPT), with open-source alternatives virtually absent. We also found that urology, despite having one of the most established simulation training programmes in surgery, has almost no research connecting large language models with simulation-based training. To conduct this analysis, we developed a novel screening approach using 83 independent AI agents, which achieved agreement with human review exceeding published benchmarks. Our open-access pipeline and dataset are freely available for other researchers to replicate or extend this work.

Version published to 10.64898/2026.06.02.26354722 on medRxiv
Jun 4, 2026

Role-Prompting in Frontier Large Language Models Influences Clinical Reasoning in Complex Medical Cases

This article has 8 authors:
1. Chintan Dave
2. Adrianna Diviero
3. Tashni Dassanayake
4. Salman J. Alshahrani
5. Anas Al Mardini
6. Widad Khadir
7. Ashaki D. Patel
8. Adithya Srivastava
This article has no evaluationsLatest version Jul 1, 2026
Uncertainty-aware extraction of clinical findings from Finnish EHRs using open large language models

This article has 5 authors:
1. Jussi Leinonen
2. Juha Knuuttila
3. Siina Pamilo
4. Samu Kurki
5. Miika Koskinen
This article has no evaluationsLatest version Jul 9, 2026
General-purpose large language models can achieve physician-level accuracy in complex medical data extraction

This article has 2 authors:
1. Manu Rajeev
2. Ananthu Narayan
This article has no evaluationsLatest version Jun 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Role-Prompting in Frontier Large Language Models Influences Clinical Reasoning in Complex Medical Cases

Uncertainty-aware extraction of clinical findings from Finnish EHRs using open large language models

General-purpose large language models can achieve physician-level accuracy in complex medical data extraction