A NOVEL METHOD FOR HANDLING PRE-EXISTING CONDITIONS IN PREDICTION MODELS FOR COVID-19 DEATH

Glen H. Murata
Allison E. Murata
Heather M. Campbell
Benjamin H. Mcmahon
Jenny T. Mao

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

Objective

To derive a predicted probability of death (PDeathDx) based upon complete sets of ICD-10 codes assigned to patients prior to their diagnosis of COVID-19. PDeathDx is intended for use as a summary metric for pre-existing conditions in multivariate models for COVID-19 death.

Methods

Cases were identified through the COVID-19 Shared Data Resource (CSDR) of the Department of Veterans Affairs. The diagnosis required at least one positive nucleic acid amplification test (NAAT). The primary outcome was death within 60 days of the first positive test. We retrieved all diagnoses entered into the electronic medical record for visits, on problem lists, and at the time of hospital discharge if they were at least 14 days prior to the NAAT. ICD-9 codes were converted to ICD-10 equivalents using a crosswalk provided by the Centers for Medicare/Medicaid Services. ICD-10 codes were converted to their category diagnoses defined as all columns to the left of the decimal point. Each patient was considered to have or not have each category diagnosis prior to the NAAT. A computer program calculated the number of cases for each category diagnosis, the relative risk (RR) of death, and its confidence interval (CI) using a Bonferroni adjustment for multiple comparisons. RRs were re-centered by subtracting 1 so that high-risk conditions had a positive value while protective conditions had a negative one. Diagnoses found to be significant were entered into a logistic model for death in a stepwise fashion. Each patient was assigned (RR-1) to each category diagnosis if they had the condition or 0 otherwise. The resulting model was used to derive PDeathDx for each patient and the area under its receiver operating characteristic (ROC) curve calculated. Single variable logistic models were also derived for age at diagnosis, the Charlson 2-year (Charl2Yr) and lifetime (CharlEver) scores, and the Elixhauser 2-year (Elix2Yrs) and lifetime (ElixEver) scores. Stata was used to compare the ROCs for PDeathDx and each of the other metrics.

Results

On September 30, 2021 there were 347,220 COVID-19 patients in the CSDR. 18,120 patients (5.33%) died within 60 days of their diagnosis. After consolidating ICD-9 and ICD-10 codes, 29,162,710 separate diagnoses were given to the subjects representing 41,341 ICD-10 codes. This set was reduced to 1,890 category diagnoses assigned to the group for the first time on 19,184,437 occasions. Of the 1,890 category diagnoses, 425 involved >= 100 subjects and had a lower boundary for the CI >= 1.50 (a high-risk condition) or upper boundary <= 0.80 (a protective condition). Stepwise logistic regression showed that 153 were statistically significant, independent predictors of death. PDeathDx was slightly less powerful than age as a discriminator (ROC = 0.811 +/- 0.002 vs 0.812 +/- 0.001, respectively; P < 0.001) but was superior to the Charl2Yr (ROC = 0.727 +/- 0.002; P < 0.001), CharlEver (ROC = 0.753 +/- 0.002; P <= 0.001), Elix2Yr (ROC = 0.694 +/- 0.002; P < 0.001); and ElixEver (ROC = 0.731 +/- 0.002; P < 0.001). Univariate analysis and multivariate modeling showed that many of the most high-risk conditions are under-represented or not included in the Charlson Index. These include hypertension, dementia, degenerative neurologic disease, or diagnoses associated with severe physical disability.

Conclusions

Our method for handling pre-existing conditions in multivariate analysis has many advantages over conventional comorbidity indices. The approach can be applied to any condition or outcome, can use any categorical predictors including medications, creates its own condition weights, handles rare as well as protective conditions, and returns actionable information to providers. The latter include the specific ICD-10 groups, their contribution to the risk, and their rank order of importance. Finally, PDeathDx is equivalent to age as a discriminator of outcomes and outperforms 4 other comorbidity scores. If validated by others, this approach provides an alternative and more robust approach to handling comorbidities in multivariate models.

ScreenIT
Jan 28, 2022
SciScore for 10.1101/2022.01.22.22269694: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected.
Sex as a biological variable not detected.
Randomization not detected.
Blinding not detected.
Power Analysis not detected.
Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
The major limitation of our approach is that it handles all pre-existing diagnoses – not just the most recent ones. Thus, a person with chronic renal failure (CRF) who undergoes a transplant and regains normal renal function will still be included in the …
SciScore for 10.1101/2022.01.22.22269694: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Ethics not detected.
Sex as a biological variable not detected.
Randomization not detected.
Blinding not detected.
Power Analysis not detected.
Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
The major limitation of our approach is that it handles all pre-existing diagnoses – not just the most recent ones. Thus, a person with chronic renal failure (CRF) who undergoes a transplant and regains normal renal function will still be included in the analysis of CRF. Of course, our conclusions are limited to patients with characteristics like the veteran population. Further studies should be done on other populations and disease states before the method should be widely applied. If validated by others, our method could provide a more robust alternative to comorbidity scores for handling pre-existing conditions in multivariate models.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.
Read the original source
Version published to 10.1101/2022.01.22.22269694 on medRxiv
Jan 24, 2022
Version published to 10.1093/biomethods/bpac017
Jan 1, 2022

Diagnostic accuracy of an algorithm identifying US veterans with Inclusion Body Myositis from the Corporate Data Warehouse

This article has 1 author:
1. Vladimir M. Liarski
This article has no evaluationsLatest version Dec 22, 2025
Early Risk Stratification in Hospitalized Community-Acquired UTI: An 8-Item Bedside Score for Bacteremia and 30-Day Mortality

This article has 2 authors:
1. Cihan Semet
2. Yusuf Görgülü
This article has no evaluationsLatest version Jan 1, 2026
A Preliminary Prognostic Model for Predicting Poor Prognosis in COVID-19 Integrating Lung Epithelial Injury (KL-6) with Routine Care Markers

This article has 7 authors:
1. Yunlai Liang
2. Kun Wang
3. Lu Long
4. Qizhuo Hou
5. Wenze Yu
6. Kangkang Huang
7. Bin Yi
This article has no evaluationsLatest version Feb 3, 2026

Ethics	not detected.
Sex as a biological variable	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Objective

Methods

Results

Conclusions

Article activity feed

Related articles

Diagnostic accuracy of an algorithm identifying US veterans with Inclusion Body Myositis from the Corporate Data Warehouse

Early Risk Stratification in Hospitalized Community-Acquired UTI: An 8-Item Bedside Score for Bacteremia and 30-Day Mortality

A Preliminary Prognostic Model for Predicting Poor Prognosis in COVID-19 Integrating Lung Epithelial Injury (KL-6) with Routine Care Markers