TrAIngles: an LLM-Based Automatic Scoring Tool for Theory of Mind Assessments Across the Life Span
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Understanding how Theory of Mind (ToM), the ability to attribute mental states to others, develops and differs across individuals requires robust and easily applicable measures. Tasks involving open ended verbal responses such as the Triangles Task offer valid indicators of ToM, but their manual scoring is time-consuming and demands substantial training. In this study, we test the validity and reliability of TrAIngles, an automatic scoring software for the Triangles Task developed through fine-tuning of large language models (LLMs). TrAIngles was trained on a dataset of N ~ 11,000 responses collected from 581 Italian-speaking participants aged between 8 and 81 years. LLM-based ToM scores showed high accuracy and reliability metrics (Cohen’s kappa, ICC, Krippendorff’s alpha, Spearman’s rho) comparable to human ratings and consistently exceeding commonly accepted thresholds for inter-rater agreement (i.e., > .80). Performance remained robust across age groups (children, adolescents, young adults, adults, and older adults). Our findings demonstrate that reliable and valid ToM indexes can be obtained from the Triangles Task using this new automated scoring system based on LLMs. TrAIngles thus provides an innovative tool for lifespan ToM assessments relying on open-ended responses, highlighting the potential of training automated scoring systems on archival, human-rated datasets. TrAIngles is openly available for use both as code and via a user-friendly graphical interface.