Wearable and Interview-based Assessment of Psychological Risk in Alzheimer’s Caregivers: Machine Learning vs. Large Language Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Spousal caregivers of individuals with Alzheimer’s disease and related dementias frequently experience elevated perceived stress, caregiver burden, and loneliness, which are associated with adverse health outcomes. Early identification is therefore critical for timely intervention. Existing approaches commonly rely on wearable sensor data and standardized psychological questionnaires, while recent multimodal methods aim to improve prediction by integrating behavioral and linguistic information.
In this study, we explored three modality configurations, wearable-derived features, interview-based text, and their combination, to classify caregiver psychological risk using the Perceived Stress Scale (PSS), Zarit Burden Interview, and UCLA Loneliness Scale. We compared traditional machine learning models and large language models (LLMs) (Gemini 2.0, Llama 4, and GPT-4o) under psychometrician-centered and caregiver-centered prompting strategies.
Traditional machine learning models performed better under multimodal settings, while LLMs achieved stronger performance with Interview-Only input. We further demonstrate that PSS was the most predictable construct and prompting strategies substantially influenced LLM performance.
Author summary
People caring for spouses with Alzheimer’s disease and related dementias often experience high levels of stress, caregiver burden, and loneliness, all linked to adverse psychological and physical health outcomes. Early identification of caregivers at heightened psychological risk is essential for timely support. We evaluated three data modalities, wearable-derived features, interview-based text, and their combination, to classify caregiver risk using the Perceived Stress Scale, Zarit Burden Interview, and UCLA Loneliness Scale. Traditional machine learning models and large language models (LLMs) (Gemini 2.0, Llama 4, GPT-4o) were compared under multiple prompting strategies.
Our findings showed that traditional machine learning approaches performed best when combining wearable-derived behavioral features with interview-derived linguistic features, while LLMs were more effective for analyzing interview-based text. PSS was the most predictable construct, while caregiver burden and loneliness were more difficult to detect. Prompting choices significantly influenced LLM performance, and Gemini 2.0 showed the most stable overall results. These findings highlight the importance of aligning model choice with data modality when developing digital health tools for caregiver risk identification.