Assessing Empathy Capability in Large Language Models: Superior and Balanced Empathy Compared to Humans
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background With the growing prevalence of Large Language Models (LLMs) in daily interactions, their capacity for human-like social skills has become a critical area of research. While many LLMs are designed to be empathetic, a systematic, psychologically-grounded assessment of their empathetic capabilities compared to humans is lacking. To address this gap, this study provides a comprehensive evaluation of the empathy levels of current leading LLMs. Methods We developed a scenario-based conversational empathy scale, assessing two core dimensions: cognitive empathy and emotional empathy. This instrument was administered to a group of prominent LLMs and a human baseline group of young adults. Text-based responses generated by both groups for each scenario were systematically scored by another group of participants, and the resulting empathy scores were then statistically compared to identify performance differences. Results The results indicate that LLMs generally demonstrated higher empathy levels than human participants. The LLMs ranked in the following order of empathy, from highest to lowest: ChatGPT4.0, ChatGPT3.5, Tongyi Qianwen, iFlytek Spark, ERNIE Bot. Additionally, only ChatGPT4.0 showed higher emotional empathy than cognitive empathy. Notably, unlike human participants, LLMs exhibited consistent empathy across different scenarios. Conclusions This study concludes that current LLMs possess a high level of empathy, often exceeding that of average young adults. The balanced nature of their empathy and the particularly high emotional intelligence of models like ChatGPT-4.0 highlight their sophisticated social abilities. These findings provide a clear benchmark for AI empathy and offer a psychological foundation for optimizing the design of empathetic AI systems.