Evaluating Sycophancy in Frontier Models Using Persona-Driven Challenge

Nimay Sanjay Hazare
Neha Goel
Clara Yu
Shamay Agaron
Aniket Sharma
Prathamesh Parchure
Dhaval Patel
Prem Timsina
Ben Kaplan
Joshua Lampert
Aditi Vakil
Patricia Kovatch
Bruce Darrow
Benjamin S Glicksberg
Alexander Charney
Girish N Nadkarni
Ankit Sakhuja

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) are increasingly used for lay health queries, yet may abandon correct recommendations under pressure, a vulnerability termed sycophancy. We evaluated sycophancy across five frontier LLMs (Claude Opus 4.6, Claude Sonnet 4.6, GPT 5.4, Grok 4.1, Gemini 3 Flash) using 200 synthetic clinical vignettes, each anchored to a unanimous correct treatment baseline and challenged by nine personas representing both vulnerable and authority roles. Overall, 7.1% of responses were sycophantic, varying tenfold across personas (1.7 to 19.3%) and sixfold across LLMs (2.4 to 15.3%). Vulnerable personas elicited more sycophantic responses, with medical student highest at the highest rate (19.3%). In adjusted Generalized Estimating Equations models, vulnerable personas continued to be independent predictors of sycophantic responses, which is a reversal of the expected authority gradient. In adjusted GEE models, persona and LLM were both independent predictors for sycophantic responses. Persona driven sycophancy evaluation should be integrated into pre deployment safety assessment of clinical LLMs.

Version published to 10.64898/2026.05.17.26353406 on medRxiv
May 20, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed