Portability of an artificial intelligence model for self-harm detection across hospital settings

Vlada Rozova
Katrina Witt
Mike Conway
Jo Robinson
Karin Verspoor

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Adequate self-harm surveillance is a key part of suicide prevention efforts. Our prior work has demonstrated the efficacy of an artificial intelligence model for detecting self-harm in emergency department triage notes. This model was developed based on data from a single hospital, raising the question about the model’s robustness to different contexts. Here, we aim to validate the model prospectively and externally to understand its portability across hospital settings.

Methods

Our self-harm classification model was developed and tested using triage notes from a large metropolitan hospital in Melbourne, Australia from 2012 to 2017. The model combined extensive text pre-processing with a Gradient Boosting classifier that used 644 selected features. In this study, we assessed the portability of both model components. We performed prospective validation using 329,655 triage notes from the same hospital collected over the following four years. For external validation, we used 316,877 triage notes from 2012 to 2021 from a regional hospital located 150km outside Melbourne.

Results

On the initial test set, the model achieved an area under the precision-recall curve (PR AUC) of 0.86, positive predictive value (PPV) of 0.81, and sensitivity of 0.80. Prospectively, the performance remained stable with PR AUC of 0.84, PPV of 0.76, and sensitivity of 0.76. Externally, the model showed a diminished ability to discern self-harm cases with an overall classification metric PR AUC of 0.77, PPV of 0.57, and sensitivity of 0.83. The text normalisation component of the model was equally effective across the datasets.

Conclusions

At the metropolitan hospital, the self-harm detection model is sufficiently performant for both epidemiological and potential clinical uses. At the regional hospital, the text normalisation pipeline is effective, but the machine learning classifier may need to be re-trained locally to produce more accurate results.

Version published to 10.1101/2025.07.10.25331160 on medRxiv
Jul 11, 2025

Considerations for evaluating the practical utility of machine learning in suicide risk estimation: the role of cost and equity

This article has 5 authors:
1. Christopher Kitchen
2. Anas Belouali
3. Paul S Nestadt
4. Holly C Wilcox
5. Hadi Kharrazi
This article has no evaluationsLatest version Dec 30, 2025
Machine learning models for predicting work-related sickness absence due to mental disorders using national surveillance data in Brazil

This article has 2 authors:
1. Beatriz Queiroz Reis
2. Letícia Martins Raposo
This article has no evaluationsLatest version Dec 23, 2025
Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

This article has 10 authors:
1. Olanrewaju Eniade
2. Ezekiel Ukwenga
3. Uchenna Akuka
4. Opeyemi Adeniyi
5. Elonna Obak
6. Omolola Adeagbo
7. Peter Babatunde Olaitan
8. Rita Ayanbolade Olowe
9. Tolulope Opakunle
10. Olugbenga Adekunle Olowe
This article has no evaluationsLatest version Jan 25, 2026

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusions

Article activity feed

Related articles

Considerations for evaluating the practical utility of machine learning in suicide risk estimation: the role of cost and equity

Machine learning models for predicting work-related sickness absence due to mental disorders using national surveillance data in Brazil

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria