Portability of an artificial intelligence model for self-harm detection across hospital settings
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Adequate self-harm surveillance is a key part of suicide prevention efforts. Our prior work has demonstrated the efficacy of an artificial intelligence model for detecting self-harm in emergency department triage notes. This model was developed based on data from a single hospital, raising the question about the model’s robustness to different contexts. Here, we aim to validate the model prospectively and externally to understand its portability across hospital settings.
Methods
Our self-harm classification model was developed and tested using triage notes from a large metropolitan hospital in Melbourne, Australia from 2012 to 2017. The model combined extensive text pre-processing with a Gradient Boosting classifier that used 644 selected features. In this study, we assessed the portability of both model components. We performed prospective validation using 329,655 triage notes from the same hospital collected over the following four years. For external validation, we used 316,877 triage notes from 2012 to 2021 from a regional hospital located 150km outside Melbourne.
Results
On the initial test set, the model achieved an area under the precision-recall curve (PR AUC) of 0.86, positive predictive value (PPV) of 0.81, and sensitivity of 0.80. Prospectively, the performance remained stable with PR AUC of 0.84, PPV of 0.76, and sensitivity of 0.76. Externally, the model showed a diminished ability to discern self-harm cases with an overall classification metric PR AUC of 0.77, PPV of 0.57, and sensitivity of 0.83. The text normalisation component of the model was equally effective across the datasets.
Conclusions
At the metropolitan hospital, the self-harm detection model is sufficiently performant for both epidemiological and potential clinical uses. At the regional hospital, the text normalisation pipeline is effective, but the machine learning classifier may need to be re-trained locally to produce more accurate results.