Performance Drift in a Mortality Prediction Algorithm during the SARS-CoV-2 Pandemic

Ravi B. Parikh
Yichen Zhang
Corey Chivers
Katherine R. Courtright
Jingsan Zhu
Caleb M. Hearn
Amol S. Navathe
Jinbo Chen

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (ScreenIT)

Abstract

Research Objective

Health systems use clinical predictive algorithms to allocate resources to high-risk patients. Such algorithms are trained using historical data and are later implemented in clinical settings. During this implementation period, predictive algorithms are prone to performance changes (“drift”) due to exogenous shocks in utilization or shifts in patient characteristics. Our objective was to examine the impact of sudden utilization shifts during the SARS-CoV-2 pandemic on the performance of an electronic health record (EHR)-based prognostic algorithm.

Study Design

We studied changes in the performance of Conversation Connect, a validated machine learning algorithm that predicts 180-day mortality among outpatients with cancer receiving care at medical oncology practices within a large academic cancer center. Conversation Connect generates mortality risk predictions before each encounter using data from 159 EHR variables collected in the six months before the encounter. Since January 2019, Conversation Connect has been used as part of a behavioral intervention to prompt clinicians to consider early advance care planning conversations among patients with ≥10% mortality risk. First, we descriptively compared encounter-level characteristics in the following periods: January 2019–February 2020 (“pre-pandemic”), March-May 2020 (“early-pandemic”), and June–December 2020 (“later-pandemic”). Second, we quantified changes in high-risk patient encounters using interrupted time series analyses that controlled for pre-pandemic trends and demographic, clinical, and practice covariates. Our primary metric of performance drift was false negative rate (FNR). Third, we assessed contributors to performance drift by comparing distributions of key EHR inputs across periods and predicting later pandemic utilization using pre-pandemic inputs.

Population Studied

237,336 in-person and telemedicine medical oncology encounters.

Principal Findings

Age, race, average patient encounters per month, insurance type, comorbidity counts, laboratory values, and overall mortality were similar among encounters in the pre-, early-, and later-pandemic periods. Relative to the pre-pandemic period, the later-pandemic period was characterized by a 6.5-percentage-point decrease (28.2% vs. 34.7%) in high-risk encounters (p<0.001). FNR increased from 41.0% (95% CI 38.0-44.1%) in the pre-pandemic period to 57.5% (95% CI 51.9-63.0%) in the later pandemic period. Compared to the pre-pandemic period, the early and later pandemic periods had higher proportions of telemedicine encounters (0.01% pre-pandemic vs. 20.0% early-pandemic vs. 26.4% later-pandemic) and encounters with no preceding laboratory draws (17.7% pre-pandemic vs. 19.8% early-pandemic vs. 24.1% later-pandemic). In the later pandemic period, observed laboratory utilization was lower than predicted (76.0% vs 81.2%, p<0.001). In the later-pandemic period, mean 180-day mortality risk scores were lower for telemedicine encounters vs. in-person encounters (10.3% vs 11.2%, p<0.001) and encounters with no vs. any preceding laboratory draws (1.5% vs. 14.0%, p<0.001).

Conclusions

During the SARS-CoV-2 pandemic period, the performance of a machine learning prognostic algorithm used to prompt advance care planning declined substantially. Increases in telemedicine and declines in laboratory utilization contributed to lower performance.

Implications for Policy or Practice

This is the first study to show algorithm performance drift due to SARS-CoV-2 pandemic-related shifts in telemedicine and laboratory utilization. These mechanisms of performance drift could apply to other EHR clinical predictive algorithms. Pandemic-related decreases in care utilization may negatively impact the performance of clinical predictive algorithms and warrant assessment and possible retraining of such algorithms.

SciScore for 10.1101/2022.02.28.22270996: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Ethics	IRB: Ethics: The University of Pennsylvania Institutional Review Board approved this study with a waiver of informed consent, classifying this study as quality improvement. Consent: Ethics: The University of Pennsylvania Institutional Review Board approved this study with a waiver of informed consent, classifying this study as quality improvement.
Sex as a biological variable	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.

Table 2: Resources

No key resources detected.

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Version published to 10.1101/2022.02.28.22270996 on medRxiv
Mar 1, 2022

Characterizing and Predicting End-of-Life Patient Trajectories Using Routine Clinical Data

This article has 10 authors:
1. Jonah Bosserhoff
2. Julius Keyl
3. Tim Lenfers
4. Dagmar Führer-Sakel
5. Marc Wichert
6. Frederick Klauschen
7. Martin Schuler
8. Sylvia Hartmann
9. Philipp Keyl
10. Jens Kleesiek
This article has no evaluationsLatest version Aug 13, 2025
Diagnostic Codes in AI prediction models and Label Leakage of Same-admission Clinical Outcomes

This article has 5 authors:
1. Bashar Ramadan
2. Ming-Chieh Liu
3. Michael C. Burkhart
4. William F. Parker
5. Brett K. Beaulieu-Jones
This article has no evaluationsLatest version Aug 13, 2025
Enhancing Clinical Decision-Making in Rapid Response Systems: Integrating AI-Based Predictions and Early Pattern Clustering

This article has 4 authors:
1. Hyunsun Lim
2. Kyung Hyun Lee
3. Jung Hwa Hong
4. Sae Hwan An
This article has no evaluationsLatest version Jul 18, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Research Objective

Study Design

Population Studied

Principal Findings

Conclusions

Implications for Policy or Practice

Article activity feed

Related articles

Characterizing and Predicting End-of-Life Patient Trajectories Using Routine Clinical Data

Diagnostic Codes in AI prediction models and Label Leakage of Same-admission Clinical Outcomes

Enhancing Clinical Decision-Making in Rapid Response Systems: Integrating AI-Based Predictions and Early Pattern Clustering