A predictive internet-based model for COVID-19 hospitalization census

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

The COVID-19 pandemic has strained hospital resources and necessitated the need for predictive models to forecast patient care demands in order to allow for adequate staffing and resource allocation. Recently, other studies have looked at associations between Google Trends data and the number of COVID-19 cases. Expanding on this approach, we propose a vector error correction model (VECM) for the number of COVID-19 patients in a healthcare system (Census) that incorporates Google search term activity and healthcare chatbot scores. The VECM provided a good fit to Census and very good forecasting performance as assessed by hypothesis tests and mean absolute percentage prediction error. Although our study and model have limitations, we have conducted a broad and insightful search for candidate Internet variables and employed rigorous statistical methods. We have demonstrated the VECM can potentially be a valuable component to a COVID-19 surveillance program in a healthcare system.

Article activity feed

  1. SciScore for 10.1101/2020.11.15.20231845: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    We performed twelve different queries from 02/21/20 to 08/01/20 for Google Trends’ “Charlotte NC” metro designation (county-level data is unavailable) using a list of terms obtained based on our prior beliefs and expertise.
    Google Trends’
    suggested: None

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    This study had several limitations. First, in terms of data collection, Google’s designation of the Charlotte NC metro area does not perfectly spatially align with Atrium Health’s core market. Also, Facebook and Apple Map is biased towards users who have enabled their location history on their mobile devices in order to be detected. Second, the time series in this study were not collected using any probabilistic sampling design; rather, they were collected using convenience sampling. Hence, we should be cautious about generalizability of our results. Third, when working with data pulled from the Internet, there is always the chance that the data could be made unavailable or be altered in some way, thus threatening the durability of such models. We were fortunate in that one of our two important Internet variables was from Atrium Health’s own public-facing Microsoft Azure HealthBot, at least in part mitigating this risk for our model. Lastly, perhaps the biggest limitation is that the relationships we have observed in this research could change at any point in the future so that our model is no longer predictive. Stated another way, because these time series are nonstationary, they might not stay in sync over long periods of time as their cross-correlations change. We initially considered other simpler time series regression models (e.g., autoregressive distributed lag model). However, this approach requires time series under consideration to all be stationary, which ours were...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.