Machine-Assisted Topic Analysis of Large-Scale Health Experience Data: Identifying Sociodemographic Differences and Evaluating Bias in Large Language Models

Paulina Bondaronek
Emma Ward
Emma Beecham
Eric Zhang
Yuqing Huang
Julia Ive
Felix Naughton
Honghan Wu
Cecilia Vindrola-Padros

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Introduction

Large-scale free-text data with socio-demographic information can capture nuanced accounts of lived experience that are difficult to detect in structured measures. However, manual qualitative analysis is difficult to scale, while automated approaches may obscure subgroup variation or introduce bias. This is especially relevant for large language models (LLMs), whose use in qualitative health research is increasing despite limited evaluation in socio-demographically stratified analysis.

Objectives

This study examined how socio-demographic differences in health and wellbeing experiences were manifested in a large-scale free-text dataset, and evaluated how different AI-assisted analytic approaches identified these differences. Specifically, it aimed to: (1) identify socio-demographic differences using Machine-Assisted Topic Analysis (MATA); (2) compare MATA outputs with topic modelling combined with LLM-based topic interpretation; and (3) examine potential bias in LLM-based analysis.

Methods

We analysed 2,177 valid free-text responses from the UK COVID-19 Well-being Tracker, a longitudinal survey of adults recruited during the pandemic. Responses described factors influencing health behaviours, mood, and wellbeing over time. Data were preprocessed and stratified by gender, socioeconomic status (SES), age. MATA combined topic modelling, using Latent Dirichlet Allocation, with human-led qualitative interpretation of topic keywords and representative responses. The same topic model outputs were then interpreted using an LLM for comparison. Potential LLM bias was assessed using a demographic label-swap crossover design, with bias evaluated through Jaccard lexical similarity, VADER sentiment, and NRC emotion analysis. Grounded Review and Assessment of Computational Evidence (GRACE) was used to evaluate the AI outputs.

Results

MATA identified meaningful socio-demographic thematic differences in pandemic-related mood and wellbeing across gender, SES, and age. Common themes included disruption, adaptation, uncertainty, routine, and the influence of work, relationships, and health on wellbeing. Male-stratified topics emphasised routines, habits, and coping with external pressures, whereas female-stratified topics were more relational and reflective, focusing on connection, isolation, family wellbeing, and anxiety. Lower SES narratives included practical strain, financial pressure, and loss of control, while higher SES narratives more often reflected adjustment, autonomy, and meaning-making. Older adults described health, gratitude, and family connection, whereas younger adults emphasised work-related stress and competing demands. LLM-based interpretation broadly reproduced the high-level subgroup patterns identified through MATA, but outputs were more generalised, less conceptually differentiated, and showed greater thematic overlap. Bias analysis showed systematic shifts in vocabulary, sentiment, and emotional tone when demographic labels were swapped, suggesting a risk of representational bias.

Conclusions

MATA identified meaningful socio-demographic differences while retaining interpretative depth at scale. LLM-based topic interpretation showed utility for rapid thematic summarisation, but produced less conceptually differentiated outputs and was sensitive to demographic framing. The analysis also identified “LLM speak”, where outputs appeared coherent but relied on abstract, generalised, and overlapping interpretations. Human oversight, structured qualitative appraisal, and explicit bias evaluation are necessary when using LLMs to analyse socially stratified free-text health data.

Author summary

This study explores how Artificial Intelligence can help researchers analyse very large collections of free-text health experiences without losing important human meaning. We analysed responses from more than 2,000 adults in the United Kingdom who described how the COVID-19 pandemic affected their wellbeing, routines, relationships, and daily lives over time.

We applied a method called Machine-Assisted Topic Analysis, which combines computational topic modelling with human qualitative interpretation, to identify differences across gender, age, and socioeconomic groups. We then compared these findings with analyses generated using a large language model.

Both approaches identified broad patterns in the data, including disruption, uncertainty, adaptation, and the importance of work, family, and health. However, the large language model produced more generalised and overlapping interpretations, whereas the human-led approach retained greater nuance and clearer distinctions between social groups. We also found that changing demographic labels, such as “men” or “women”, systematically altered the language and emotional tone generated by the model, suggesting a risk of representational bias.

Our findings suggest that Artificial Intelligence may be useful for rapid summarisation of large-scale health experience data, but that human oversight remains necessary to preserve nuance and support fair interpretation of socially diverse experiences. As large language models become more widely used in health research and service evaluation, careful evaluation of potential bias will be important to avoid reproducing overly simplified or stereotyped accounts of different social groups.

Version published to 10.64898/2026.05.20.26353755 on medRxiv
May 22, 2026

Using AI to support rapid qualitative data analysis of survey and interview data in public health: a proof-of-concept study

This article has 9 authors:
1. Tom White
2. Riinu Pae
3. Carina Hörst
4. Avelie Stuart
5. Rachel Abbey
6. Elena Skryabina
7. Samantha Brooks
8. Paulina Bondaronek
9. Lorenzo Cattarino
This article has no evaluationsLatest version Jun 22, 2026
Personalizing Suicide Risk Assessment: Machine Learning Extraction of Cross-Modal Interactions Between Psychosocial and Demographic Factors in Veterans ¹

This article has 11 authors:
1. Maxwell Levis
2. Brian Shiner
3. Monica Dimambro
4. Luke Rozema
5. Siamack Ayandeh
6. Alos Diallo
7. Yefan Zhou
8. Siting Li
9. Weiyi Wu
10. Jiang Gui
11. Joshua Levy
This article has no evaluationsLatest version Jun 18, 2026
Performance of Google NotebookLM for AI-assisted data extraction and consensus statement generation in a heterogenous systematic review on inflammatory bowel disease, obesity, and cardiometabolic comorbidities: A Methodological Report

This article has 11 authors:
1. Sami Samaan
2. Jalpa Devi
3. Matthew Vincent
4. Shannon Coombs
5. Priya Sehgal
6. Mouhand Mouhamed
7. Victoria Rai
8. Amanda M. Johnson
9. Andres J. Yarur
10. Edward L. Barnes
11. Parakkal Deepak
This article has no evaluationsLatest version Jun 26, 2026

Discuss this preprint

Listed in

Abstract

Introduction

Objectives

Methods

Results

Conclusions

Author summary

Article activity feed

Related articles

Using AI to support rapid qualitative data analysis of survey and interview data in public health: a proof-of-concept study

Personalizing Suicide Risk Assessment: Machine Learning Extraction of Cross-Modal Interactions Between Psychosocial and Demographic Factors in Veterans 1

Performance of Google NotebookLM for AI-assisted data extraction and consensus statement generation in a heterogenous systematic review on inflammatory bowel disease, obesity, and cardiometabolic comorbidities: A Methodological Report

Personalizing Suicide Risk Assessment: Machine Learning Extraction of Cross-Modal Interactions Between Psychosocial and Demographic Factors in Veterans ¹