Development and Validation of Large Language Model Rating Scales for Automatically Transcribed Psychological Therapy Sessions

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Traditional rating scales have shaped psychological research, but are resource-intensive and can burden participants. Large Language Models (LLMs) offer a tool to assess latent constructs in text. This study introduces LLM rating scales, a rating method that uses LLM responses instead of human ratings. Its application is demonstrated based on the development and validation of an LLM rating scale measuring patient engagement in transcripts of psychological therapies. Automatically transcribed videos of 1,131 sessions from 155 patients were analyzed using DISCOVER, a software framework for local multimodal human behavior analysis. The Llama 3.1 (8B) LLM rated 120 items on patient engagement, with the top eight items averaged to create a total engagement score ranging from 0 (low) to 100 (high). Psychometric properties were evaluated using the original sample, bootstrap resampling, and test folds. The LLM rating scale demonstrated normal distribution, strong reliability (ω = .953) and acceptable fit (CFI = .968, SRMR = .022), except RMSEA = .108. Validity was supported by significant correlations with engagement determinants (e.g., motivation, r = .413), processes (e.g., between-session efforts, r = .390), and outcomes (e.g., symptoms, r = −.304). Results remained robust across bootstrap and cross-validation analyses accounting for the hierarchical data structure. The LLM rating scale exhibited strong psychometric properties for measuring patient engagement, demonstrating the potential of the LLM rating scale approach as a psychological assessment tool. Importantly, this automated approach uses interpretable items, ensuring clear understanding of measured constructs, while supporting local implementation and protecting confidential data.

Article activity feed