Patient language, not chatbot identity, drives the clinical quality of consumer health advice across six languages

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

People increasingly ask consumer chatbots for medical advice outside English, and have little way to judge an answer except by how it reads. We asked whether clinical quality survives the patient’s language. Four chatbots (ChatGPT, Claude, Gemini, DeepSeek) answered 21 forum-derived patient questions in six languages; two language-matched physicians per language, blinded to chatbot identity, scored 504 responses. The patient’s language explained far more of the variance in clinical quality than which chatbot answered (composite η 2 0.275 versus 0.035), while empathy was language-invariant and did not separate dangerous answers from safe ones (AUC 0.49). Degradation persisted among responses physicians judged fluent, which places it in clinical content rather than language ability. Catastrophic-safety ratings ranged 4.3-fold across languages and tracked each language’s digital-resource level rather than national income. A warm, fluent answer can be reassuring and wrong, and which it is depends on the language it is asked in.

Article activity feed