Development and validation of a clinical and genetic model for predicting risk of severe COVID-19

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Clinical and genetic risk factors for severe COVID-19 are often considered independently and without knowledge of the magnitudes of their effects on risk. Using SARS-CoV-2 positive participants from the UK Biobank, we developed and validated a clinical and genetic model to predict risk of severe COVID-19. We used multivariable logistic regression on a 70% training dataset and used the remaining 30% for validation. We also validated a previously published prototype model. In the validation dataset, our new model was associated with severe COVID-19 (odds ratio per quintile of risk=1.77, 95% confidence interval [CI]=1.64, 1.90) and had excellent discrimination (area under the receiver operating characteristic curve=0.732, 95% CI=0.708, 0.756). We assessed calibration using logistic regression of the log odds of the risk score, and the new model showed no evidence of over- or under-estimation of risk (α=−0.08; 95% CI=−0.21, 0.05) and no evidence or over- or under-dispersion of risk (β=0.90, 95% CI=0.80, 1.00). Accurate prediction of individual risk is possible and will be important in regions where vaccines are not widely available or where people refuse or are disqualified from vaccination, especially given uncertainty about the extent of infection transmission among vaccinated people and the emergence of SARS-CoV-2 variants of concern.

Key results

  • Accurate prediction of the risk of severe COVID-19 can inform public heath interventions and empower individuals to make informed choices about their day-to-day activities.

  • Age and sex alone do not accurately predict risk of severe COVID-19.

  • Our clinical and genetic model to predict risk of severe COVID-19 performs extremely well in terms of discrimination and calibration.

Article activity feed

  1. SciScore for 10.1101/2021.03.09.21253237: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    We used Plink version 1.920,21 to extract SNP data from the UK Biobank imputation dataset that we had previously downloaded.
    Plink
    suggested: (PLINK, RRID:SCR_001757)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    A limitation of this study is that, through necessity, we used hospitalization as a proxy for COVID-19 severity and the outcome measure may have been misclassified for some participants. This would have attenuated the observed associations and it is possible that some risk factors have been omitted unnecessarily. Nevertheless, we are confident in the variables retained. We were also unable to develop models for other important endpoints such as intensive care admission or death. The progression of the COVID-19 pandemic has seen people experience chronic symptoms, and some of these people will have had only a mild original infection.5 Identifying people who are at increased risk of chronic disease is an obvious direction for future research. Another direction for future research is to investigate whether our model for the prediction of severe COVID-19 is applicable for the new SARS-CoV-2 variants of concern, which have been reported to have increased transmissibility, virulence and antigenicity and cause more severe disease.3,4 Further validation of our new model is required in independent datasets, especially those in which the SARS-CoV-2 variant has been characterized. Clear benefits of our new model for predicting risk of severe COVID-19 are that the required clinical data is simple to collect and that the genetic information is amenable to high-throughput genotyping, with rapid turnaround that is essential for the present pandemic. In the light of the uncertainty of the fu...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.