Distinguishing the Language of Mental and Physical Health: A Sequential Evaluation with Model Preregistration of Automated Clinical Visit Interviews

Oscar Nils Erik Kjell
Scott Feltman
H. Andrew Schwartz
Adithya V Ganesan
Whitney R. Ringwald
Sean Clouston
Melissa Anne Carr
Benjamin Luft
Roman Kotov

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background. Mental and physical health are correlated, yet are described differently in natural language. Standard symptom checklists may limit individuals from fully describing how they experience health problems. This study examines the distinction between how individuals linguistically describe mental and physical health.Method. We used the Sequential Evaluation with Model Pre-registration (SEMP) design to analyze language from automated, open-ended clinical interviews with 9/11 first responders. We developed language-based models assessing Mental and Physical Component Summary (M/PCS) scores from automated interview transcripts (N=1290). We then evaluated these models on held-out prospective data (N = 310) against M/PCS, medical record diagnoses, and healthcare expenditures.Results. Language-based assessment correlated significantly with M/PCS scores (r = .36 and r = .37, respectively, all p<.001), exceeding SEMP pre-registered thresholds (r>.315, and r>.348, respectively). Analyses revealed both shared and distinct linguistic markers for mental and physical health, with language reflecting emotional, somatic, and functional themes. The language-based assessments demonstrated external validity through significant convergence with clinical diagnoses (AUCs=.74 and .65 for mental and physical health, respectively). Furthermore, language-based assessments exhibited incremental validity; when added to models containing traditional self-reports, they explained significant unique variance in both mental (ΔR²=.063, p<.001) and physical (ΔR²=.032, p=.010) healthcare costs.Conclusions. Natural language from automated interviews captures general health through emotional, somatic, and functional themes. Validated against clinical diagnoses and healthcare costs, these models demonstrate that open-ended language provides interpretable insights beyond traditional rating scales, potentially augmenting clinical evaluations and understanding.

Version published to 10.31234/osf.io/wx6ca_v1 on OSF Preprints
Mar 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed