How well do Large Language Models perform as team members? Testing teamwork capabilities of LLMs

Patrick Cooper
Jessica L. Irons
Melanie J McGrath
Andreas Duenser

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We employed a modified battery of cognitive and social tests from psychological sciences intended to reflect key components of teamwork. These tests were given to five instruction-tuned large language models (LLMs) to examine the capabilities of LLMs to contribute to teamwork. Our results revealed concrete strengths and shortcomings of LLMs to engage in teaming behaviours. Language models performed well on action phases of teamwork, where a team engages in specific work tasks such as monitoring and coordination to meet team goals. In contrast, LLMs lacked the abstract reasoning and planning required to meet transition phases of teamwork. These moments are critical for self-reflection of past performance and planning for upcoming tasks in teams. Finally, LLMs had unique behaviours for interpersonal aspects of teamwork, predominately applying an integrating motivational style but failing to acknowledge negative emotional states of teammates. These findings reveal how ‘off-the-shelf’ LLMs may contribute to contemporary human-AI teams and highlight the utility of experimental psychology in elucidating behaviours of non-embodied agents.

Version published to 10.31234/osf.io/qyfrw_v2 on OSF Preprints
Mar 18, 2025
Version published to 10.31234/osf.io/qyfrw_v1 on OSF Preprints
Nov 15, 2024

Evaluating Personality Traits of Large Language Models Through Scenario-based Interpretive Benchmarking

This article has 1 author:
1. Alessandro Berti
This article has no evaluationsLatest version Apr 5, 2025
Evaluating Personality Traits of Large Language Models Through Scenario-Based Interpretive Benchmarking

This article has 1 author:
1. Alessandro Berti
This article has no evaluationsLatest version Apr 8, 2025
Evaluating Personality Traits of Large Language Models Through Scenario-based Interpretive Benchmarking

This article has 1 author:
1. Alessandro Berti
This article has no evaluationsLatest version Apr 9, 2025

Listed in

Abstract

Article activity feed

Related articles

Evaluating Personality Traits of Large Language Models Through Scenario-based Interpretive Benchmarking

Evaluating Personality Traits of Large Language Models Through Scenario-Based Interpretive Benchmarking

Evaluating Personality Traits of Large Language Models Through Scenario-based Interpretive Benchmarking