What level of expertise is necessary to generate ACLS training test questions: pre-med students vs. artificial intelligence?

Sunny LoGalbo
Mark Richman
Jeffrey Wang
Illan Saji
Aliyah Traore
Hannah Oliva
Evan Wu
Alessandro Drud
Diamia Foster
Sambhat Bhan dari
Raquel Lopez Delfillo
Amanda McCann
Jennifer Coard
Camille Matthew
Barry Smith

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Introduction

In-hospital cardiac arrest carries high mortality despite standardized ACLS training. Educators face increasing time constraints in developing assessment tools for ACLS training. Two possible solutions to this problem are using pre-medical students or using artificial intelligence to generate test questions. This study compared the quality of pre-medical student-generated ACLS test questions vs. AI-generated ACLS test questions, testing the hypothesis that AI-generated questions are non-inferior to student-generated questions.

Methods

Ten pre-medical students created ACLS questions following predefined criteria, while an AI model (Northwell’s Artificial Intelligence Hub) generated comparable questions. A blinded ACLS-certified physician evaluated questions on the qualities of Alignment, Clarity, Cognitive Level, and Question Design using a standardized rubric (Likert scale:

1 = poor quality, 5 = excellent). Student’s T-test and Chi-square analysis were used to compare the quality of questions on different rubric domains within each arm (student vs. AI) and within one domain (eg, question Clarity) between arms. The Student’s T test was used when 2 comparator groups were compared (eg, Clarity of student-generated vs. AI-generated questions) within one arm. The ANOVA test was used when comparing more than 2 comparator groups (eg, Alignment vs. Clarity vs. Cognitive Level) within one arm. Statistical significance was set as a priority at p <0.05.

Results

Both student-generated and AI-generated questions were of high quality. AI-generated questions achieved the maximum score in the domains of Alignment, Clarity, and Question Design, but fell short of perfect scores in the domain of Cognitive Level (8 of 50 questions were less than 5). Student-generated questions achieved less-than-perfect scores in each domain. No significant difference was found in overall mean question scores between groups (students = 4.79, AI = 4.81; p = 0.9). However, AI-generated questions had significantly-greater Clarity (students = 4.8, AI = 5; p = .0461), while Alignment, Cognitive level, and Question Design showed no significant differences.

Conclusion

AI-generated questions demonstrated overall quality comparable to those generated by pre-medical students, supporting the potential role of AI as a scalable tool in ACLS educational assessment development. Further studies are warranted to evaluate additional AI platforms and determine optimal integration of AI in medical education assessment design.

Version published to 10.64898/2026.06.11.26354470 on medRxiv
Jun 11, 2026