Demonstrating High Validity of a New AI-Language Assessment of PTSD: A Sequential Evaluation with Model Pre-registration
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
BACKGROUND: Modern Artificial Intelligence (AI) has shown promise in identifyingpsychopathology based on the language used by patients, providing a scalable method forobtaining relevant behavioral markers. However, no existing models for assessing posttraumaticstress disorder (PTSD) have successfully demonstrated out-of-sample replicability. We developa language-based AI model for PTSD and rigorously evaluate replicability in a prospectivesample.METHODS: Participants from the Stony Brook World Trade Center (WTC) Health andWellness Program described their lives in an automated interview during a clinical monitoringvisit. The language was analysed using AI to assess PTSD CheckList (PCL) for total symptomseverity score and four symptom subscales and validated against medical record PTSDdiagnosis.To yield realistic accuracy estimates in this cross-sectional study, we propose the SequentialEvaluation with Model Pre-registration design, consisting of an iterative, two-phase pre-registration paradigm. The first pre-registration specifies the data split, the model development,and the initial hypotheses. The second pre-registration specifies the exact pre-trained models,data cleaning procedures, and the refined hypotheses.RESULTS: The data split included a development (N=1437) and a prospective (N=346) dataset.Within the prospective sample, the pre-registered models produced scores that significantlycorrelated with their targets: PCL total (r=.38, p-value<.001) and the four subscales (r=.28–.37,p-value<.001). The pre-registered model for PCL total showed a robust association with PTSDdiagnosis (AUC=.76), significantly outperforming demographics (AUC=.61, p-value=.006),WTC attack exposures (AUC=.61, p-value=.007) and a validated depression language model(AUC=.60, p-value<.001).CONSLUSIONS: We developed new AI-language assessments of PTSD symptom severity.Within a clinical setting and over prospectively collected participant data, the assessmentsreplicated with high convergent validity with self-report and high external validity againstdiagnosis in medical records. Analyses of observable behavioral markers in automated clinicalinterview language can produce robust psychiatric assessments, overcoming limitations foundin traditional assessments.