Using Machine Learning Algorithms to Develop a Clinical Decision-Making Tool for COVID-19 Inpatients

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Background: Within the UK, COVID-19 has contributed towards over 103,000 deaths. Although multiple risk factors for COVID-19 have been identified, using this data to improve clinical care has proven challenging. The main aim of this study is to develop a reliable, multivariable predictive model for COVID-19 in-patient outcomes, thus enabling risk-stratification and earlier clinical decision-making. Methods: Anonymised data consisting of 44 independent predictor variables from 355 adults diagnosed with COVID-19, at a UK hospital, was manually extracted from electronic patient records for retrospective, case–control analysis. Primary outcomes included inpatient mortality, required ventilatory support, and duration of inpatient treatment. Pulmonary embolism sequala was the only secondary outcome. After balancing data, key variables were feature selected for each outcome using random forests. Predictive models were then learned and constructed using Bayesian networks. Results: The proposed probabilistic models were able to predict, using feature selected risk factors, the probability of the mentioned outcomes. Overall, our findings demonstrate reliable, multivariable, quantitative predictive models for four outcomes, which utilise readily available clinical information for COVID-19 adult inpatients. Further research is required to externally validate our models and demonstrate their utility as risk stratification and clinical decision-making tools.

Article activity feed

  1. SciScore for 10.1101/2021.02.15.21251752: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    BlindingHealthcare staff who had historically recorded patient information on the EPR during clinical assessment were, of course at the time, blind to the outcomes and hypotheses of this study.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    After removing the duplicates, the CT positive and RT-PCR swab positive cases were populated to a Microsoft Excel spreadsheet.
    Microsoft Excel
    suggested: (Microsoft Excel, RRID:SCR_016137)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The biggest limitation to our study is that since the study was conducted in a single centre, not only is the n number limited, but the data only reflects the demographics of the surrounding population. Within the UK, geographical location and socio-economic factors are heavily influencing death rates [53] and, thus, our data may have limited generalizability to the wider UK population by not accounting for these factors. Furthermore, our data has only been collected from hospital inpatients and therefore our model is not generalizable to the wider community either. Secondly, data was not always available, or accurate, for all patients. This was sometimes due to a lack of documentation, usually if the attending physicians at the time did not deem the information relevant, or if the information was not available, especially in patients who were cognitively impaired without any next of kin to provide collateral histories. Moreover, not all investigations, such as CT scans, were required for every patient and may have not been done due to the limited resources available in the NHS. Subsequently, only patients deemed to have abnormal results would have been the patients to receive the investigation. Also, with regards to palliation, some patients clearly had different treatment goals to others. For example, this would suggest that patients with severe COVID-19 who could have had ventilatory support may have not had it because their treatment goal was palliation instead. All these...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.