Uncovering the structure of psychopathology through large scale text analysis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The Hierarchical Taxonomy of Psychopathology (HiTOP) is thought to be scientifically and clinically useful because it aligns with the natural structure of psychopathology. However, because HiTOP is based on analyses of standardized symptom criteria and close-ended survey items, it may just reflect the structure of available measures. This study provides a novel test of HiTOP’s generalizability by comparing it to the structure of psychopathology derived from the natural language people use when talking about their mental health. For our analysis, we obtained 87,007 naturalistic descriptions of mental health from >55,625 unique users on 27 online Reddit forums (i.e., subreddits). We used text embeddings in large language models to represent common themes of posts in the subreddits, each dedicated to a different diagnosis or maladaptive trait. Semantic similarity of posts across subreddits was quantified with a cosine similarity matrix. We then fit exploratory factor models to this matrix and examined the hierarchical structure with a bass-akwards approach. Results showed the natural language structure of psychopathology closely corresponds to the HiTOP model, with factors resembling internalizing, detachment, thought disorder, substance use, antagonism, and somatoform spectra at the lowest level of the hierarchy. The higher-order factors diverged from HiTOP, including a linguistic p-factor that reflects variation in attribution of problems to internal vs. external sources. Overall, the emergence of HiTOP spectra from natural language offers unique evidence for its validity and clinical utility.

Article activity feed