Skill but not Effort Drive GPT Overperformance over Humans in Cognitive Reframing of Negative Scenarios

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Increasingly, people use language models for emotional support, and understanding the quality of that support is crucial. This project (N = 3,740) focused on one of the pillars of emotional support and the most researched emotion regulation strategies, called cognitive reappraisal, which is the ability to extract multiple meanings from an emotional situation and to choose a framing that reduces its emotional intensity. In a first conservative test, we compared trained humans and GPT-4's ability for reappraisal using made-up vignettes and third-party human raters (Study 1), showing that GPT-4 outperformed humans. In Study 2 we then investigated whether the gap was driven by effort or skill by incentivizing participants to produce better reappraisals, which led to increased time spent on reappraisals but did not decrease the gap between humans and GPT-4. To examine how the perception that support came from AI affected evaluations, in Study 3 we provided participants with an AI label of the source of reappraisal, which reduced their evaluation of the effectiveness of reappraisal, but GPT-4 was still considered more effective. In Study 4, to ensure that differences were sustained in real-time interactions, we had participants share negative emotional situations with either a human or GPT-4 and receive reappraisal, replicating GPT-4's superior performance compared to humans. We conducted language analysis to identify differences between humans and GPT’s reappraisals, finding differences in language complexity, which explain some differences in evaluation. These results help us understand the nature of emotional support by LLMs and how it compares to humans.

Article activity feed