Balancing Safety and Educational Availability in a Large Language Model-Based Virtual Patient for Medical Interview Training: Robustness Evaluation Under Direct and Indirect Instructional Contamination

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

BACKGROUND: Large language model–based virtual patients are increasingly proposed for medical interview training. However, safety-oriented guardrails may unintentionally suppress the very dialogue needed for practice. Existing studies have focused more on realism or task accuracy than on whether an educational conversation remains usable under adversarial or noisy conditions. OBJECTIVE: This study aimed to evaluate the robustness of a large language model–based virtual patient for Japanese medical interview training, with a primary focus on educational availability: the ability to continue a clinically meaningful training dialogue while maintaining case-consistent responses. METHODS: We constructed a synthetic benchmark using 5 Japanese interview cases with slot-based ground truth and a 10-turn question protocol. The virtual patient was implemented through the OpenAI API using gpt-4o-mini (temperature 0.2) in batched mode. Six conditions were evaluated: clean, noise, direct contamination, indirect contamination, direct contamination plus defense, and indirect contamination plus defense. Each condition included 100 episodes. The primary outcome was slot F1 excluding the initial greeting (slot F1 excl. init), which estimates information recovery attributable to learner questioning rather than scripted opening information. Secondary outcomes were refusal rate, clarification request rate, and event counts for forbidden leakage, contradiction, and harm. A supplementary exploratory appendix examined threshold-based guard design using post-hoc replay on logged dialogues. RESULTS: In the primary API experiment, clean performance reached a slot F1 excl. init of 0.901 (95% CI 0.879-0.923) with a refusal rate of 0.000. Noise had little effect (0.908, 95% CI 0.887-0.929). Direct contamination did not substantially reduce performance in this configuration (0.927, 95% CI 0.915-0.939; refusal rate 0.001). In contrast, indirect contamination reduced slot F1 excl. init to 0.489 (95% CI 0.398-0.581) and increased the refusal rate to 0.047. Both defended conditions returned to near-clean levels, including indirect contamination plus defense (slot F1 excl. init 0.903, 95% CI 0.880-0.926; refusal rate 0.000). No forbidden leakage, contradiction, or harm events were detected in the primary experiment. CONCLUSIONS: For virtual patients used in medical interview training, safety should not be judged solely by the absence of harmful or forbidden outputs. Educational availability—whether the system still supports meaningful questioning and information recovery—should be treated as a first-class outcome. In this benchmark, indirect contamination of external context was the dominant failure mode, whereas a simple sanitizing defense restored performance. These findings support evaluating guard strategies in terms of both safety and educational usability before deployment in medical education.

Article activity feed