Investigating Algorithmic Bias in Machine Learning Prediction Models of Suicide Attempts in Multiple Clinical Settings by Race/Ethnicity and Gender

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Importance: Machine learning models reflect the training data, and may thus learn and perpetuate healthcare disparities. Objective: To evaluate whether performance of a validated machine learning model predicting suicide attempts varies by race/ethnicity or gender from electronic health records (EHRs). Design: In this prognostic study, we re-analyzed previously validated landmark prediction models predicting suicide attempts 18 months after a healthcare visit. Prediction models were estimated with regularized Cox regression models in three cohorts: (1) general outpatient; (2) psychiatric emergency department (ED); and (3) psychiatric inpatient. Model performance (area under the curve [AUC], sensitivity, positive predictive value [PPV]) was evaluated independently across race/ethnicity and gender in all three cohorts, and at the intersection of race/ethnicity and gender in the general outpatient cohort. Setting: EHR data were from the Research Patient Data Registry at Mass General Brigham. Participants: Individuals ages 15–85 years seen in at least 1 of 3 clinical settings from Jan 1, 2016–Dec 31, 2018: general outpatient (N=1,210,222), psychiatric ED (N=13,098), and psychiatric inpatient (N=7,825).Main Outcomes and Measures: The primary outcome was suicide attempt determined by validated ICD codes during 18 months after a randomly sampled “landmark visit” in one of the three settings. Results: When considering gender alone, models showed consistently stronger performance for male vs. female patients. When considering race/ethnicity alone, results were equivocal: in general outpatient, models had higher AUC for White than Hispanic patients. However, in the psychiatric ED, AUC was highest for Asian patients. When considering the intersection of race/ethnicity and gender in general outpatient, models provided better performance for White men than Hispanic and White women across all metrics. There were also gender differences within racial/ethnic groups, with higher PPV for Black men than Black women, and Hispanic men than Hispanic women, suggesting gender differences largely drove these differences. Conclusions and Relevance: We observed modest evidence for disparities in suicide prediction models by gender, and limited evidence of disparities by race/ethnicity alone. More consistent patterns of bias emerged at the intersection of race/ethnicity and gender. Future work should replicate these findings in larger diverse samples to ensure fair deployment of models.

Article activity feed