Evaluating Sycophancy in Frontier Models Using Persona-Driven Challenge
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Large language models (LLMs) are increasingly used for lay health queries, yet may abandon correct recommendations under pressure, a vulnerability termed sycophancy. We evaluated sycophancy across five frontier LLMs (Claude Opus 4.6, Claude Sonnet 4.6, GPT 5.4, Grok 4.1, Gemini 3 Flash) using 200 synthetic clinical vignettes, each anchored to a unanimous correct treatment baseline and challenged by nine personas representing both vulnerable and authority roles. Overall, 7.1% of responses were sycophantic, varying tenfold across personas (1.7 to 19.3%) and sixfold across LLMs (2.4 to 15.3%). Vulnerable personas elicited more sycophantic responses, with medical student highest at the highest rate (19.3%). In adjusted Generalized Estimating Equations models, vulnerable personas continued to be independent predictors of sycophantic responses, which is a reversal of the expected authority gradient. In adjusted GEE models, persona and LLM were both independent predictors for sycophantic responses. Persona driven sycophancy evaluation should be integrated into pre deployment safety assessment of clinical LLMs.