INSIGHTFUL: Insight Generation through Clinical Annotation, Analysis, and Modeling of Suicide-Related Factors towards Understanding and Lifesaving
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objective
Suicide is a critical medical and public health challenge, particularly among individuals with mental illnesses in safety-net hospitals. To uncover insights about suicidality embedded in unstructured clinical notes, we propose to annotate, analyze, and model a corpus for suicidality understanding and lifesaving.
Methods
A multidisciplinary panel developed an annotation guideline to capture four key suicide-related factors: Suicidal Ideation (SI), Suicide Attempt (SA), Exposure to Suicide (ES), and Non-Suicidal Self-Injury (NSSI). We created an annotated corpus of 500 notes through a clinically validated annotation process and performed cohort analysis to characterize demographic and suicidal distributions. A large language model was deployed for automatic classification.
Results
The annotated corpus was created with a Cohen’s Kappa of 0.95 and further de-identified for data sharing. Most notes (79.4%) contained one (34.4%) or more (45%) suicide-related labels, with SI and SA co-occurrence as the most frequent combination (35.6%), which demonstrates significant overlap. The cohort was characterized with a mean age of 33.4, 51.7% male, and 75.8% singles. Prevalent stressors included unemployment (24.2%), homelessness (12.0%), limited healthcare access (5.4%), and legal challenges (5.0%). We identified four key insights to improve documenting suicidality, including implicitness, confliction, ambiguity, and definition coverage incompleteness. The baseline model achieved a micro-averaged F1 score of 0.70, demonstrating satisfying performance in multi-label classification.
Conclusion
The near-perfect inter-annotator agreement underscores the proposed annotation process and data quality. Cohort analysis highlights the distribution and documentation insights of suicidality. Data modeling demonstrates the potential of insight generation via AI-powered methods for mining large-scale clinical notes.