Automated Detection Of Clinical High Risk Population Of Schizophrenia: Assessing The Generalizability Of NLP And LLM-Based Methods
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background and Hypothesis : Research has indicated that linguistic features can be used for the early detection of schizophrenia. Given that traditional clinician-based assessment can be labor intensive and time-consuming, more research has turned towards the usage of automated means to extract and analyze the linguistic features of schizophrenia. However, most of these existing studies have chiefly focused on deploying the LLMs with no comparison against more well-established NLP-based methods. As a result, there is less insight on the utility of using LLMs and whether the benefits of using of LLMs for analysis outweighs the costs. Moreover, given LLM’s prompt sensitivity, there is also a lack of research investigating how different prompt engineering methods affect the different model’s output across different settings. Another longstanding open question within the field pertains to how best to objectively assess prodromal psychotic symptoms and how best to analyze the different transcripts. In this study, we systematically assess the efficacy of large language models (LLMs) and natural-language processing (NLP) methods to perform automated linguistic analysis of clinical high risk (CHR) psychotic symptoms. We seek to understand the reliability of using LLMs to analyze patient transcripts for the early identification of CHR individuals in comparison against more established NLP-based methods. Study Design : We trained models using a large international dataset of 374 patients, of which 331 are clinically high risk (CHR) and 43 are community controls (CC). Two types of interviews were conducted: an open-ended and a semi-structured interview based on the Positive SYmptoms and Diagnostic Criteria for the CAARMS [73] Harmonized with the SIPS [74] (PSYCHS) protocol [32]. Trained research assistants carried out these interviews which were audio and video-recorded across different sites prior to October 13 2024. We used two different feature extraction methods, the principal component analysis (PCA) and feature selection (FS), and conducted experiments using four different machine learning (ML) models and two large language models (LLMs), namely Llama and Qwen. For each of the LLM, we used three different prompting strategies: a neutral prompt, an NLP based prompt and a PSYCHS interview-based prompt to better understand each LLM performance under different reasoning setting. Results : Across both the open and PSYCHS-based transcripts, the NLP combined with ML-based methods, which relies on objective quantifiable metrics, demonstrated fairly consistent results within a range of 0.60 − 0.90. This is in contrast to LLM-based methods, which provided highly variable results depending on the interview format and prompt used, with the lowest being 0.320 and the highest being 0.880 across all experimental settings. In general, both categories of methods seem to produce more accurate results using the PSYCHS-based transcripts. Llama generally performs better than text-based methods, which require semantic reasoning (e.g. the PSYCHS based prompt), and yielded the highest accuracy and F1 of 0.880 and 0.930 when used on the PSYCHS-based interview transcripts. On the other hand, Qwen generally performed better than numerical-reasoning based tasks (e.g. the NLP-based prompt) and performed the best across the PSYCHS-based interview transcripts with an accuracy and F1 of 0.880 and 0.930. Conclusions : Overall, we find that NLP-based methods are more reliable and consistent. LLM-based methods are highly variable and do not demonstrate sufficient reliability. Their output differs greatly depending on the input transcript and prompt type provided. We suggest that more emphasis should be placed on developing interpretable and clinically grounded methods to automate linguistic analysis of schizophrenia. Further experiments need to be conducted before deploying such models for high-stakes use cases and for identifying more precise and automated methods to understand how clinical features of schizophrenia are expressed linguistically.