Distributional challenges regarding data on death and incidences during the SARS-CoV-2 pandemic up to July 2020

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

COVID-19 is a major global crisis with unpredictable consequences. Many scientists have struggled to make forecasts about its impact. Especially, appropriate preparations for a second wave are needed not to move in a costly panic mode again. It is necessary to get ideas about worst case scenarios regarding incidences, hospitalization, or use of ICU resources. They can be described in terms of extreme quantiles (95%, 99%, 99.9%) of specific distributions that supposedly formalize the data mechanism behind future observations.

Therefore, distributional issues do matter. Cirillo and Taleb argue that a natural and empirically correct framework for assessing and managing real risk in pandemics is provided by extreme value theory dealing with extrema and not averages. We explore this idea in more detail.

In this paper we discuss the fat-tail patterns in the distribution of the global COVID-19 data by analyzing data from 66 countries worldwide. We also explore their relevance at a lower, regional scale perspective (national, federal state), which is in our opinion more relevant for planning measures against the epidemic spread. For this we analyze data from the German federal state of Bavaria.

We conclude that fat-tail patterns are seen in global data, possibly reflecting the respective heterogeneity between different countries regarding incidences and fatalities during the ongoing epidemic. However, the disease activity at regional level seems to be better described by classical Poisson based models. To bridge the gap between regional and global phenomena we refer to mixtures of slim-tail distributions that may create fat-tail features.

Especially in the beginning of a pandemic acting according to the “better safe than sorry” principle and taking extreme forecasts as the basis for the decisions might be justified. However, as the pandemic continues and control measures are partially lifted, there is a need for a careful discussion how to choose relevant distributions and their respective quantiles for future resource planning in order not to cause more harm as the pandemic itself.

Article activity feed

  1. SciScore for 10.1101/2020.07.24.20161257: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our analysis suffers from limitations regarding the data quality, of which underreporting of cases and deaths might be the most prominent one. We do not apply nowcasting to correct for reporting artefacts in the data21. We also do not consider testing strategies to adjust incidence reporting. Our model does not allow to implement complex intervention strategies. We simply assume that the next wave follows the same dynamics as the first wave and ignore alternative approaches like analysing excess deaths during the epidemic22. Therefore, the predictions we are providing are illustrative and may not reflect future scenarios. It is not our goal to provide a new prediction model but to emphasize that MCMC based Bayesian prediction is a very suitable tool to address to probability of extreme outcomes. To conclude, in this paper we investigated both global and regional COVID-19 data. We found fat-tail properties in the global data but within a country or even smaller scale of federal states we believe that this phenomenon is currently not seen. The Gini index provides an additional tool to investigate heterogeneity in incidences and deaths between countries, which was found to be high. We showed an example for a prediction of a second wave using a Poisson based model combined with Bayesian methods, where we emphasized to take a look at the extreme quantiles of the predictive distribution and the duration of time, during which a critical amount of incidences or deaths might be report...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.