Balancing Safety and Educational Availability in a Large Language Model-Based Virtual Patient for Medical Interview Training: Robustness Evaluation Under Direct and Indirect Instructional Contamination

Yuusuke Harada

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

BACKGROUND: Large language model–based virtual patients are increasingly proposed for medical interview training. However, safety-oriented guardrails may unintentionally suppress the very dialogue needed for practice. Existing studies have focused more on realism or task accuracy than on whether an educational conversation remains usable under adversarial or noisy conditions. OBJECTIVE: This study aimed to evaluate the robustness of a large language model–based virtual patient for Japanese medical interview training, with a primary focus on educational availability: the ability to continue a clinically meaningful training dialogue while maintaining case-consistent responses. METHODS: We constructed a synthetic benchmark using 5 Japanese interview cases with slot-based ground truth and a 10-turn question protocol. The virtual patient was implemented through the OpenAI API using gpt-4o-mini (temperature 0.2) in batched mode. Six conditions were evaluated: clean, noise, direct contamination, indirect contamination, direct contamination plus defense, and indirect contamination plus defense. Each condition included 100 episodes. The primary outcome was slot F1 excluding the initial greeting (slot F1 excl. init), which estimates information recovery attributable to learner questioning rather than scripted opening information. Secondary outcomes were refusal rate, clarification request rate, and event counts for forbidden leakage, contradiction, and harm. A supplementary exploratory appendix examined threshold-based guard design using post-hoc replay on logged dialogues. RESULTS: In the primary API experiment, clean performance reached a slot F1 excl. init of 0.901 (95% CI 0.879-0.923) with a refusal rate of 0.000. Noise had little effect (0.908, 95% CI 0.887-0.929). Direct contamination did not substantially reduce performance in this configuration (0.927, 95% CI 0.915-0.939; refusal rate 0.001). In contrast, indirect contamination reduced slot F1 excl. init to 0.489 (95% CI 0.398-0.581) and increased the refusal rate to 0.047. Both defended conditions returned to near-clean levels, including indirect contamination plus defense (slot F1 excl. init 0.903, 95% CI 0.880-0.926; refusal rate 0.000). No forbidden leakage, contradiction, or harm events were detected in the primary experiment. CONCLUSIONS: For virtual patients used in medical interview training, safety should not be judged solely by the absence of harmful or forbidden outputs. Educational availability—whether the system still supports meaningful questioning and information recovery—should be treated as a first-class outcome. In this benchmark, indirect contamination of external context was the dominant failure mode, whereas a simple sanitizing defense restored performance. These findings support evaluating guard strategies in terms of both safety and educational usability before deployment in medical education.

Version published to 10.32388/2q3k8u
Mar 4, 2026

Deterministic Retrieval-Grounded Language Models for Clinical Counseling: Large-Scale Multilingual Evaluation with Cryptographically Verifiable Pipelines

This article has 1 author:
1. Panagiotis Karmiris
This article has no evaluationsLatest version Mar 17, 2026
RESPECT: A Conversational AI System for Informed Consent with Accuracy, Safety, and Stakeholder-Centered Evaluation

This article has 3 authors:
1. Salvatore Giorgi
2. Katie Ryan
3. Jane Paik Kim
This article has no evaluationsLatest version Apr 8, 2026
Effectiveness and Students’ Perception of Tutor-Guided AI Navigation in Undergraduate Medical Education in Sri Lanka: A Quasi-Experimental Study

This article has 4 authors:
1. Randombage PJS
2. Ahamed MYS
3. Kulathunga TD
4. H Wickramasekara
This article has no evaluationsLatest version Apr 3, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Deterministic Retrieval-Grounded Language Models for Clinical Counseling: Large-Scale Multilingual Evaluation with Cryptographically Verifiable Pipelines

RESPECT: A Conversational AI System for Informed Consent with Accuracy, Safety, and Stakeholder-Centered Evaluation

Effectiveness and Students’ Perception of Tutor-Guided AI Navigation in Undergraduate Medical Education in Sri Lanka: A Quasi-Experimental Study