Comparing a Human’s and a Multi-Agent System’s Thematic Analysis: Assessing Qualitative Coding Consistency

Sebastian Simon
Sreecharan Sankaranarayanan
Elham Tajik
Conrad Borchers
Bahar shahrokhian
Francesco Balzan
Sebastian Strauß
Sree Aurovindh Viswanathan
Amine Hatun Ataş
Mia Čarapina
Li Liang
Berkan Celik

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large Language Models (LLMs) have demonstrated fluency in text generation and reasoning tasks. Consequently, the field has probed the ability of LLMs to automate qualitative analysis, including inductive thematic analysis (iTA), previously achieved through human reasoning only. Studies using LLMs for iTA have yielded mixed results so far. LLMs have successfully been used for isolated steps of iTA in hybrid setups. With recent advances in multi-agent systems (MAS) enabling complex reasoning and task execution through multiple, cooperating LLM agents, the first results point towards the possibility of automating sequences of the iTA process. However, previous work especially lacks methodological standards for assessing the reliability and validity of LLM-derived iTA themes and outcomes. Therefore, in this paper, we propose a method for assessing the quality of automated iTA systems based on consistency with human coding on a benchmark dataset. We present criteria for benchmark datasets and the comparison of human and AI-generated iTA analysis. We demonstrate the use of both in an expert blind review on two iTA outputs: one iTA conducted by domain experts, and another fully automated with a MAS built on the Claude 3.5 Sonnet LLM. Results indicate a high level of consistency and contribute evidence that complex qualitative analysis methods common in AIED research can be carried out by MAS.

Version published to 10.35542/osf.io/ez8wc_v1 on OSF Preprints
May 13, 2025

Six fallacies in substituting large language models for human participants

This article has 1 author:
1. Zhicheng Lin
This article has no evaluationsLatest version Jun 23, 2025
Structured Reasoning with Large Language Models

This article has 1 author:
1. Srihari Tanmay Karthik Tadala
This article has no evaluationsLatest version May 28, 2025
Human Researchers are Superior to Large Language Models in Writing a Systematic Review in a Comparative Multitask Assessment

This article has 8 authors:
1. Martina Sollini
2. Cristiano Pini
3. Alexandra Lazar
4. Fabrizia Gelardi
5. Gaia Ninatti
6. Matteo Bauckneht
7. Arturo Chiti
8. Margarita Kirienko
This article has no evaluationsLatest version Jun 13, 2025

Listed in

Abstract

Article activity feed

Related articles

Six fallacies in substituting large language models for human participants

Structured Reasoning with Large Language Models

Human Researchers are Superior to Large Language Models in Writing a Systematic Review in a Comparative Multitask Assessment