Development and Prospective Validation of CPX-MATE: An End-to-End Medical Education Platform Integrating Voice-Based Virtual Patient Simulation and Automated Real-time Evaluation

Ji Woo Song
Minseong Kim
Chanhee Hong
Young Sam Kim
Junho Cho
Ji Hoon Kim
Jinwoo Myung
Arom Choi
Hanna Yoon
Stephen Gyung Won Lee
Seng Chan You
Chaeryoung Park

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Objective Structured Clinical Examination (OSCE; Clinical Performance Examination [CPX] in South Korea) is a high-stakes assessment of clinical performance, communication, and reasoning during time-limited patient encounters. As AI-enabled virtual standardized patient (VSP) simulation and automated scoring are introduced for OSCE-like training, prospective evidence is needed on how such systems perform and are perceived when embedded in real educational workflows. Methods We developed CPX with Medical students’ Assistant for Training and Evaluation (CPX-MATE), a web-based platform integrating (1) CPX with Virtual Standardized Patient (CPX-VSP), real-time voice dialogue with a VSP using speech-to-speech (STS) models, and (2) CPX with Real-Time Evaluator (CPX-RTE), automated transcription, checklist-based scoring, and feedback from encounter audio using a Speech-to-Text model and a large language model. During an emergency medicine clerkship (Nov 2025-Jan 2026), 60 senior medical students completed two 12-min CPX encounters (VSP with acute pancreatitis; HSP with ureteral stone) with immediate CPX-RTE feedback. For CPX-VSP, students were assigned to either a full-capacity or a resource-limited STS configuration (n = 30 each). Dialogue fidelity was evaluated by turn-by-turn analysis of student–VSP exchanges, classifying responses into clinically meaningful error types (tangential, oversharing, role-breaking, off-script). CPX-RTE performance was assessed by agreement (Gwet’s AC1) with professor real-time and resident video-based ratings using a 45-item checklist. Usability of CPX-VSP and CPX-RTE, with overall system usability scale (SUS), were surveyed, and mean per-session costs for CPX-VSP and CPX-RTE were calculated. Results Across 3,282 dialogue turns, overall error rates were 1.77% versus 9.43% for full-capacity versus resource-limited STS configurations (p < 0.001), driven by fewer tangential and oversharing responses; no off-script errors were observed. The mean per-session cost was $0.12 for resource-limited configuration and $0.78 for full-capacity configuration. CPX-RTE showed high agreement with human ratings (AC1 = 0.916 vs professor; 0.916 vs resident), with slightly different levels of agreement across four sections, and high usability across all domains (mean scores, 4.65–4.92), with a per-session cost of $0.17. CPX-MATE demonstrated good overall usability (median [IQR] of 77.5 [70.0–85.0]). Conclusions Embedded within a prospective clinical clerkship, CPX-MATE demonstrated operational fidelity and human-level checklist agreement as an end-to-end, voice-based AI-assisted OSCE platform. This real-world deployment supports its scalable integration as a complementary assessment tool while highlighting the importance of systematic validation and context-aware implementation in medical education.

Version published to 10.21203/rs.3.rs-8972105/v1 on Research Square
Mar 8, 2026

"A Safe First Step": Design and Evaluation of an Emotionally Expressive AI Virtual Patient for Clinical Simulation in Speech-Language Pathology Training

This article has 16 authors:
1. Yuqi Hu
2. Yujing Wang
3. Brandon Watanabe
4. Mingkai Gao
5. Zhenzhen Qin
6. Kunyi Shi
7. Zinan Zhang
8. Shreevidhya Shambanna
9. Zhuoying Xue
10. Tao Zou
11. Qiwen Xiong
12. Nia Johnson
13. Mirjana Prpa
14. Brandy Jernigan
15. Ilmi Yoon
16. Akram Bayat
This article has no evaluationsLatest version Feb 4, 2026
Using Virtual Learning Environments to Improve the Quality and Availability of Clinical Examination Education for Medical Students

This article has 2 authors:
1. David Hewitt
2. Michael Ratcliffe
This article has no evaluationsLatest version Feb 17, 2026
There’s not an APP for That: Comparing Interpretation Through an AI Voice App and Qualified Medical Interpreters in Real World Clinical Settings

This article has 4 authors:
1. Iris Feinberg
2. Heewon Lee-Laminack
3. Elizabeth L. Tighe
4. Ifedola Owoeye
This article has no evaluationsLatest version Mar 25, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

"A Safe First Step": Design and Evaluation of an Emotionally Expressive AI Virtual Patient for Clinical Simulation in Speech-Language Pathology Training

Using Virtual Learning Environments to Improve the Quality and Availability of Clinical Examination Education for Medical Students

There’s not an APP for That: Comparing Interpretation Through an AI Voice App and Qualified Medical Interpreters in Real World Clinical Settings