Life Events Extraction From Healthcare Notes for Veteran Acute Suicide Prediction

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background/Aims

Predictive models of suicide risk have focused on predictors extracted from structured data found in electronic health records (EHR), with limited consideration of predisposing life events (LE) expressed in unstructured clinical text such as housing instability and marital troubles. Additionally, there has been limited work in large-scale analysis of natural language processing (NLP) derived predictors for suicide risk and integration of extracted LE into longitudinal models of suicide risk. This study aims to expand upon previous research, demonstrating how high-performance computing (HPC) and machine learning technologies such as language models (LM) can be used to annotate and integrate 8 LE across all Veterans Health Administration (VHA) unstructured clinical text data with enriched performance metrics.

Materials/Methods

VHA-wide clinical text from January 2000 to January 2022 were pre-processed and analyzed using HPC. Data-driven lexicon curation was performed for each LE by scaling a nearest-neighbor search over a precomputed index with LM embeddings. Data parallelism was applied to a rule-based annotator to extract LE, followed by random forest for improved positive predictive value (PPV). NLP results were analyzed and then integrated and compared to a baseline statistical model predicting risk for a combined outcome (suicide death, suicide attempt and overdose).

Results

First-time LE mentions, with a PPV of 0.8 or higher, showed a temporal correlation to suicide-related events (SRE) (suicide ideation, attempt and/or death). A significant increase of LE occurrences was observed starting 2.5 months prior to an SRE. Predictive models integrating NLP-derived LE show an improved AUC of 0.81 vs. a 0.79 obtained with the baseline and novel patient identification of up to 57%.

Discussion

Our analysis shows that: 1) performance metrics, specifically PPV, improved significantly from previous work and outperform related works; 2) the mentions of LE in the unstructured data increase as time to a SRE approaches; 3) LE identified from the notes in the weeks prior to a SRE were not associated with administrative bias caused by outreach; and 4) LE improved the AUC of predictive models and identified novel patients at risk for suicide.

Conclusion

The resulting person-period longitudinal data demonstrated that NLP-derived LE served as acute predictors for suicide-related events. NLP integration into predictive models may help improve clinician decision support. Future work is necessary to better define these LE.

Article activity feed