Exploring the Generalizability and Explainability of LLMs in Detecting Suicidal Ideation: The Impact of Data Heterogeneity

Rong Huang
Longdi Xian
Christopher Chi Wai Cheng
Jie Chen
Kit Ying Chan
Calvin Lam
Joey W Y Chan
Steven W H Chau
Ngan Yin Chan
Bei Huang
Yun-Kwok Wing
Tim M H Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objectives With the recent advancement of artificial intelligence (AI) and large language models (LLMs), the use of text analysis to detect suicidal ideation can be a promising tool. However, the performance of such detection system could be influenced by the language use difference caused by individuals’ alexithymic characteristics (difficulties in expressing emotion with unique language pattern), resulting in the subgroup disparity. The current study aims to explore the capability of a detection system on a clinical sample of heterogeneous language use (i.e., systematic difference in language use as influenced by patient characteristics and the language context). Methods AI models (classifiers) were trained with 5-fold cross-validation using clinical transcripts of 299 individuals (n = 193 with major depressive disorder and 106 controls without psychiatric problems) to detect suicidal ideation. More specifically, the topic-general classifier was trained using full clinical transcripts while the topic-specific classifiers (i.e., factorization models) were trained using specific sections of the clinical transcripts, focusing on either mood-related or suicide-specific topics. The performance of the classifiers was assessed in both groups (alexithymia and non-alexithymia) and whole sample. Mediation analyses were conducted to further investigate the role of language features in explaining the subgroup disparity. Results Results showed subgroup disparity in topic-general classifier between alexithymia and non-alexithymia groups at which alexithymia group was associated with a decreased likelihood of true detection of suicidal ideation (OR = 0.31, p < .001) and unique language features, such as family-related words (p = .02), played a mediating/explanatory role. Furthermore, topic-specific classifiers demonstrated superior performance (AUC = 0.96) compared to topic-general classifier (AUC = 0.83) and the subgroup disparity was largely reduced. Conclusion Models trained on a heterogeneous clinical population may not be equitably effective in detecting suicidal ideation in patient groups with and without alexithymia. The development of a factorization model is pertinent to enhance generalizability and equity, especially when patient characteristics are inaccessible or confidential for model training. Meanwhile, clinicians should interpret model predictions with caution due to the influence that patient characteristics might have on the model performance.

Version published to 10.21203/rs.3.rs-7657467/v1 on Research Square
Nov 12, 2025

Inter-rater Reliability of an LLM in Predicting Depression Among Indian Adults

This article has 3 authors:
1. Shivangi Verma
2. Ashwani Pundeer
3. Soniya Vats
This article has no evaluationsLatest version Nov 7, 2025
Using Generative AI for the Objective Assessment of Language in Healthcare

This article has 7 authors:
1. James O'Sullivan
2. Pilar Garces
3. Eduardo A. Aponte
4. Julian Tillmann
5. Christopher Chatham
6. Florian Lipsmeier
7. David Nobbs
This article has no evaluationsLatest version Nov 4, 2025
Predictive models for active suicidal ideation in cognitive decline: identifying risk factors

This article has 4 authors:
1. Eva Vidovič
2. Jernej Rudi Finžgar
3. Anja Kokalj Palandacic
4. Polona Rus Prelog
This article has no evaluationsLatest version Sep 30, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Inter-rater Reliability of an LLM in Predicting Depression Among Indian Adults

Using Generative AI for the Objective Assessment of Language in Healthcare

Predictive models for active suicidal ideation in cognitive decline: identifying risk factors