An improved methodology for estimating the prevalence of SARS-CoV-2

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Since the identification of Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in China in December 2019, there have been more than 17 million cases of the disease in 216 countries worldwide. Comparisons of prevalence estimates between different communities can inform policy decisions regarding safe travel between countries, help to assess when to implement (or remove) disease control measures and identify the risk of over-burdening healthcare providers. Estimating the true prevalence can, however, be challenging because officially reported figures are likely to be significant underestimates of the true burden of COVID-19 within a community. Previous methods for estimating the prevalence fail to incorporate differences between populations (such as younger populations having higher rates of asymptomatic cases) and so comparisons between, for example, countries, can be misleading. Here, we present an improved methodology for estimating COVID-19 prevalence. We take the reported number of cases and deaths (together with population size) as raw prevalence for the population. We then apply an age-adjustment to this which allows the age-distribution of that population to influence the case-fatality rate and the proportion of asymptomatic cases. Finally, we calculate the likely underreporting factor for the population and use this to adjust our prevalence estimate further. We use our method to estimate the prevalence for 166 countries (or the states of the United States of America, hereafter referred to as US state) where sufficient data were available. Our estimates show that as of the 30 th July 2020, the top three countries with the highest estimated prevalence are Brazil (1.26%, 95% CI: 0.96 – 1.37), Kyrgyzstan (1.10%, 95% CI: 0.82 – 1.19) and Suriname (0.58%, 95% CI: 0.44 – 0.63). Brazil is predicted to have the largest proportion of all the current global cases (30.41%, 95%CI: 27.52 – 30.84), followed by the USA (14.52%, 95%CI: 14.26 – 16.34) and India (11.23%, 95%CI: 11.11 – 11.24). Amongst the US states, the highest prevalence is predicted to be in Louisiana (1.07%, 95% CI: 1.02 – 1.12), Florida (0.90%, 95% CI: 0.86 – 0.94) and Mississippi (0.77%, 95% CI: 0.74 – 0.81) whereas amongst European countries, the highest prevalence is predicted to be in Montenegro (0.47%, 95% CI: 0.42 - 0.50), Kosovo (0.35%, 95% CI: 0.29 - 0.37) and Moldova (0.28%, 95% CI: 0.23 - 0.30). Our results suggest that Kyrgyzstan (0.04 tests per predicted case), Brazil (0.04 tests per predicted case) and Suriname (0.29 tests per predicted case) have the highest underreporting out of the countries in the top 25 prevalence. In comparison, Israel (34.19 tests per predicted case), Bahrain (19.82 per predicted case) and Palestine (9.81 tests per predicted case) have the least underreporting. The results of this study may be used to understand the risk between different geographical areas and highlight regions where the prevalence of COVID-19 is increasing most rapidly. The method described is quick and easy to implement. Prevalence estimates should be updated on a regular basis to allow for rapid fluctuations in disease patterns.

Article activity feed

  1. SciScore for 10.1101/2020.08.04.20168187: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.