Accuracy Assessment of Chinese Large Language Models in Psoriasis Management: A Multicenter Expert Consensus Study
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Psoriasis patients in China face significant challenges due to insufficient disease knowledge and limited access to medical resources, creating a need for reliable educational tools. Objectives This multicenter consensus study aimed to systematically evaluate the consultation quality of mainstream Chinese large language models (LLMs) for psoriasis patient education. Methods "365 Questions on Psoriasis" was jointly compiled by 109 Chinese psoriasis experts. Using an expert assessment methodology, nine dermatologists curated 40 high-frequency clinical questions from the book across five domains (etiology, triggers, treatment, management, psychosocial impact). Four Chinese LLMs (DeepSeek-R1, DeepSeek-V3, GLM-4, Qwen-3) were evaluated through double-blind scoring on a 10-point Likert scale assessing accuracy, completeness, clarity, and safety. Results Performance varied significantly, with mean scores ranging from 5.95 to 9.88 (SD: 0-3.05). Qwen-3 achieved the highest average score (9.12), while GLM-4 showed the greatest inconsistency. All responses avoided dangerous content, and 87.5% proactively emphasized the necessity of consulting a physician. However, 12.5% of responses deviated from evidence-based guidelines, particularly on complex topics like biologics and management. Conclusions Chinese LLMs show substantial potential for psoriasis education by providing generally safe information and appropriately directing users to doctors. However, current limitations exist, including performance inconsistency and occasional deviations from guidelines on specialized topics, indicating they are not yet replacements for professional medical.