Large Language Models in Healthcare Simulation Education: A Bibliometric Analysis with AI-Assisted Screening

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language models (LLMs) such as ChatGPT are rapidly reshaping healthcare education and simulation-based training in non-technical skills (NTS), yet no bibliometric analysis has mapped this landscape. We searched seven open-access databases (OpenAlex, PubMed, Europe PMC, Crossref, Semantic Scholar, CORE, DOAJ) for English-language publications from January 2020 to March 2026. From 100,277 initial records, a sequential keyword funnel yielded 830 candidate papers, which were screened by 83 independent Claude Sonnet 4.6 AI agents applying pre-specified inclusion criteria (PRISMA-trAIce compliant; Cohen’s kappa = 0.86 pre-reconciliation, 1.0 post-reconciliation). The final AI-verified corpus comprised 551 papers with a compound annual growth rate of 109%, contributions from 2,398 authors across 279 journals in 58 countries, and an h-index of 41. ChatGPT dominated the model landscape (46% of papers), with open-source models virtually absent. Virtual patient chatbots were the leading simulation modality (106 papers). Among NTS domains, communication (145 papers) and decision-making (135 papers) were most studied, whereas teamwork, leadership, situational awareness, and crisis resource management were markedly underrepresented. Only 6 urology-relevant papers were identified, none examining LLM integration within boot camp training formats. The field is growing at extraordinary pace but remains concentrated in a narrow range of NTS domains and a single proprietary model. Critical gaps persist in team-based skills training, open-source model evaluation, and specialty-specific simulation. AI-assisted bibliometric screening using multiple independent agents is feasible, reliable, and scalable, offering a replicable methodology for mapping fast-evolving research fields.

We mapped the research landscape of large language models in healthcare simulation and non-technical skills training by analysing 551 rigorously screened papers published between 2020 and 2026. Our analysis reveals a field that has exploded since the release of ChatGPT, growing at over 100% per year, but one with significant blind spots. Most research focuses on communication and clinical decision-making, while the team-based skills that prevent patient safety failures—teamwork, leadership, situational awareness, and crisis resource management—are barely studied. Almost half of all papers use a single proprietary model (ChatGPT), with open-source alternatives virtually absent. We also found that urology, despite having one of the most established simulation training programmes in surgery, has almost no research connecting large language models with simulation-based training. To conduct this analysis, we developed a novel screening approach using 83 independent AI agents, which achieved agreement with human review exceeding published benchmarks. Our open-access pipeline and dataset are freely available for other researchers to replicate or extend this work.

Article activity feed