Using AI to support rapid qualitative data analysis of survey and interview data in public health: a proof-of-concept study

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Large language models (LLMs) are increasingly used to support a growing range of analytical and operational tasks in public health, and show promise for assisting qualitative text analysis at scale. While they have demonstrated utility in structured natural language processing tasks, their role in more interpretive approaches such as thematic analysis remains less clear. Thematic analysis requires careful, often time-intensive engagement with qualitative data, which can be challenging when datasets are large or when policy teams must deliver rapid insights. This study examines whether a pragmatic LLM-assisted workflow can support early-stage thematic analysis of large public health datasets in a way that is systematic, transparent, and compatible with human-led analytical oversight.

Methods

We developed and evaluated a proof-of-concept workflow for LLM-assisted semantic coding and theme identification in a public health case study involving qualitative survey and debrief data. The LLM generated codes and themes using a consistent prompting structure. We compared those themes with those produced through human-only thematic analysis and with outputs from a topic-modelling based approach. A convergence coding matrix was used to classify alignment as agreement, complementarity, disagreement or silence. To evaluate the reliability of the LLM’s theme classification, we manually labelled 500 codes and calculated accuracy metrics.

Results

The LLM-generated themes showed broad alignment with the human analysis at the level of higher-order themes, with agreement observed for 73% of manual themes. Complementary LLM themes were found for a further 18%, while only one subtheme showed dissonance and a small number (8.3%) had no clear match. Compared with the topic modelling approach, the LLM produced a broader and more detailed thematic structure. In the theme assignment task, the LLM achieved an overall F1 score of 0.68, with individual theme scores between 0.34 and 0.80, indicating moderate consistency with humans and stronger performance for clearly defined themes.

Conclusions

These findings suggest that LLM-assisted thematic analysis has promise as a pragmatic proof-of-concept approach for rapid, higher-level qualitative sensemaking in public health, particularly when datasets are large and timely insight is needed. Its use should remain bounded to surface-level exploratory analysis and embedded within human-led workflows.

Article activity feed