Skill but not Effort Drive GPT Overperformance over Humans in Cognitive Reframing of Negative Scenarios
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recent advancements in large language models (LLMs), such as GPT, have led to their implementation in tasks involving emotional support. However, LLM performance has not been compared to humans in both quality and the type of content produced. We examined this question by focusing on the skill of reframing negative situations to reduce negative emotions, also known as cognitive reappraisal. We trained both humans (N= 601) and GPT-4 to reframe negative vignettes (Nreappraisals = 4195) and compared their performance using human raters (N = 1744). GPT-4 outperformed humans on three of the four examined metrics. We investigated whether the gap was driven by effort or skill by incentivizing participants to produce better reappraisals, which led to increased time spent on reappraisals but did not decrease the gap between humans and GPT-4. Content analysis suggested that high-quality reappraisals produced by GPT-4 were associated with being more semantically similar to the emotional scenarios, which pointed to the fact that GPT-4's success is predicated on tuning into the specific scenario. Results were in the opposite direction for humans, whose reappraisals were rated higher when being more semantically different from the emotional scenario, suggesting that human success is predicated on generalizing away from the specific situations. These results help us understand the nature of emotional support by LLMs and how it compares to humans.