Simulating Lay Health-Seeking Behavior with LLM Personas and Illness Vignettes: Reproducibility, Prompt Sensitivity, and Slice Dependence

Yuusuke Harada

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) are increasingly used as “synthetic respondents” to simulate human judgments and decision-making. In healthcare-adjacent settings, a key methodological risk is that simulated behavior may be sensitive to prompt framing, run-to-run stochasticity, and the slice of scenarios being tested (e.g., red-flag vs non–red-flag situations). We present a fully synthetic, non-human-subject study that simulates a layperson persona choosing a next action when experiencing an illness vignette, using a fixed action codebook (A0–A9). In a Pilot experiment (40 persona–scenario pairs; 2 prompt variants; 3 repeats), the model produced plausibly monotonic action urgency as vignette severity increased and showed moderate run-to-run agreement (mean agreement 0.617). However, prompt comparisons performed within the same batch produced perfect agreement between prompts (0/40 mismatches), indicating that within-batch paired designs can underestimate prompt sensitivity. In an isolated-prompt audit (24 pairs), the action mismatch rate between prompts varied substantially across runs (0.0% to 45.8%). Prompt sensitivity was slice-dependent: mismatch was low in mild non–red-flag scenarios (8.3%) but high in red-flag scenarios (41.7%). A stress test using a stronger rubric shifted the action distribution (JS divergence 0.130) and reduced mean urgency by 1.29 points. These findings motivate multi-run, slice-aware evaluations when using LLM personas to simulate health-seeking behavior.

Version published to 10.32388/be0zbc
Feb 27, 2026

Beyond Simulations: What 20,000 Real Conversations Reveal About Mental Health AI Safety

This article has 6 authors:
1. Caitlin Stamatis
2. Jonah Meyerhoff
3. Richard Zhang
4. Olivier Tieleman
5. Matteo Malgaroli
6. Thomas Hull
This article has no evaluationsLatest version Jan 27, 2026
A Markov Chain Monte Carlo Procedure with Simulated Annealing for Shortening Assessments

This article has 3 authors:
1. Klint Kanopka
2. Xiaoxuan Zhang
3. Daphna Harel
This article has no evaluationsLatest version Jan 8, 2026
Large Language Models for Depression Assessment: Simulating Patients and Clinicians in MADRS Administration

This article has 7 authors:
1. Russell W. Hanson
2. Tyler Maxwell Moore
3. Adam Robert Teed
4. Alexander Speer
5. Mohammad Akbari
6. Franz Hell
7. Shobi Syed Ahmed
This article has no evaluationsLatest version Feb 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Beyond Simulations: What 20,000 Real Conversations Reveal About Mental Health AI Safety

A Markov Chain Monte Carlo Procedure with Simulated Annealing for Shortening Assessments

Large Language Models for Depression Assessment: Simulating Patients and Clinicians in MADRS Administration