Tracking excess mortality across countries during the COVID-19 pandemic with the World Mortality Dataset

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This is an important and timely manuscript looking at excess mortality across 89 countries and territories over the course of the COVID-19 pandemic. This manuscript will be of interest to demographers and epidemiologists, and also more broadly to the public health community.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Comparing the impact of the COVID-19 pandemic between countries or across time is difficult because the reported numbers of cases and deaths can be strongly affected by testing capacity and reporting policy. Excess mortality, defined as the increase in all-cause mortality relative to the expected mortality, is widely considered as a more objective indicator of the COVID-19 death toll. However, there has been no global, frequently updated repository of the all-cause mortality data across countries. To fill this gap, we have collected weekly, monthly, or quarterly all-cause mortality data from 103 countries and territories, openly available as the regularly updated World Mortality Dataset. We used this dataset to compute the excess mortality in each country during the COVID-19 pandemic. We found that in several worst-affected countries (Peru, Ecuador, Bolivia, Mexico) the excess mortality was above 50% of the expected annual mortality (Peru, Ecuador, Bolivia, Mexico) or above 400 excess deaths per 100,000 population (Peru, Bulgaria, North Macedonia, Serbia). At the same time, in several other countries (e.g. Australia and New Zealand) mortality during the pandemic was below the usual level, presumably due to social distancing measures decreasing the non-COVID infectious mortality. Furthermore, we found that while many countries have been reporting the COVID-19 deaths very accurately, some countries have been substantially underreporting their COVID-19 deaths (e.g. Nicaragua, Russia, Uzbekistan), by up to two orders of magnitude (Tajikistan). Our results highlight the importance of open and rapid all-cause mortality reporting for pandemic monitoring.

Article activity feed

  1. Evaluation Summary:

    This is an important and timely manuscript looking at excess mortality across 89 countries and territories over the course of the COVID-19 pandemic. This manuscript will be of interest to demographers and epidemiologists, and also more broadly to the public health community.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1, Reviewer #2 and Reviewer #3 agreed to share their names with the authors.)

  2. Reviewer #1 (Public Review):

    The manuscript describes the World Mortality Dataset, which estimates excess mortality across 89 countries and territories around the globe attributable to the COVID-19 pandemic. The method is clearly described and appropriately simple without being too simple, as it incorporates both time trends and period-specific baseline effects.

    I have few specific comments on this paper which is mainly descriptive but very valuable.

    My main comment is on the interpretation of excess deaths. From a causal perspective, the notion of excess deaths is

    Observed deaths in COVID period=
    Expected deaths in COVID period (a) -
    Deaths averted due to COVID (eg less flu due to NPIs, less traffic death, ) (b)+
    Deaths directly caused by COVID (ie in people who were infected) (c)+
    Deaths indirectly caused by COVID (starvation from lockdown, untreated cancer) (d)+
    Net death from confounders (other events that were particular to that time period and caused or prevented deaths -- eg wars) (e)
    + Random variation.

    The main thing I would like to see is more contextualization of the "undercount" to note something like this conceptual structure, explain what should make us think that the very few examples of (e) that are in the analysis really are the main ones, and perhaps some seasonal comparisons of the undercounts so that plausible hypotheses can be proposed for which factors are at play. Otherwise, a very helpful piece of work that will likely generate many others.

  3. Reviewer #2 (Public Review):

    The authors set out to estimate excess mortality in a large set of countries globally, and this has generated a unique impression of the mortality impact of this pandemics that were in some countries missed in the official counts. In the process they have generated a central, frequently updated repository of the all-cause mortality data across countries that is a wonderful tool for all epidemiologists to follow the development in near real time. Such data have long been available in Europe (EuroMoMo) but worldwide the publication of weekly or monthly allcause mortality data have been scarce. So all in all, this work is incredibly important and rather extraordinary. A great research tool for researchers in the field. They truly fill a gap with their collection of weekly, monthly, or quarterly all-cause mortality data from 89 countries and territories, which are openly available and will be regularly-updated: the World Mortality Data. And for this reason the paper is both original and of great importance to understand the COVID-19 crisis at a global level, and should be published as soon as possible. The database is already in use by Our World in Data, the Economist and the Financial Times.

    The strength of the paper is the demonstration of very substantial excess mortality in several world countries like Peru, Russia, Brazil, Bolivia, and Bulgaria. This was missed so far at the country level, although such reports had been seen from select cities like Manaus, Brazil. Also, it provides several interesting metrics, such as incidence of excess deaths, and elevation above a baseline of expected deaths, and finally the uncercount ratio of these estimates compared to official data. That the top countries underreport by a factor 10 to 100 is nicely documented. Finally, it is commendable that the authors in figure 4 demonstrates the time series coincidence of reported and excess deaths.

    Also, the authors discuss the finding of undercount ratios of as low as 0,5 in some countries such as France. The interesting discussion that ensues about the meaning of excess mortality estimates when both reductions and increases may be expected due to lockdowns (fewer accidents, suicides) and due to large epidemic sizes (poor care due to overfilled hospitals), and also other effects such as heat waves and disappeared influenza epidemics. I think the authors should discuss their thinking by also looking at what IHME has put out in this regard very recently, see here:
    IHME on Excess Mortality http://www.healthdata.org/special-analysis/estimation-excess-mortality-due-covid-19-and-scalars-reported-covid-19-deaths

    A few critical points about the methodology for assessing and reporting excess mortality from these data. The conclusion reached in the paper is nevertheless solid: some countries like Peru, Russia and Brazil have gone through a particularly deadly experience with COVID-19 so that as many as 0,5% of their entire population have died over a couple of pandemic waves. And much of this mortality is not always reflected in the official reports: the true death toll may be 1.6x greater than the reported numbers of death. And in some countries the mortality reporting only captures about 1/10 of excess mortality. Unfortunately, many countries do not have national vital statistics data with week, month or quarterly detail, and are not represented in the mortality database.

    Now to the criticism:

    1. Work is not connected to the vast literature on the topic. The authors are out-of-field statisticians and seem unaware of the literature in this domain. They had generate a baseline of expected mortality based on past years time series data, as one would do when estimating excess mortality for influenza. In this way their approach is a bit similar to that used by Murray et al (Murray, Lancet 2006) to estimate the 1918 pandemic excess mortality above an annual baseline of surrounding years for a number of countries. The authors should consider at least including a reference for excess mortality estimation for each of the past influenza virus pandemics, and ponder whether it is possible to do the same that was done in these analyses to create a baseline of expected deaths that did NOT include winter-seasonal epidemic diseases like influenza (see the collected works of Olson et al, Viboud et al, Chowell et al, Olson et al, Simonsen et al, for the pandemics of 1918, 1957, 1968 and 2009). See also the latest thinking on the problem of sorting out true excess deaths from the disappeared traffic accidents, increased mental health deaths, and other complications by IHME (see link below).

    2. No attempt to correct baselines for seasonal influenza. The authors use past years and generate a baseline that includes mid-winter seasonal influenza mortality. By doing so, the excess mortality estimates in the present manuscript represent excess above what is normal in a season. Thus, as the authors comment on, the excess mortality estimates are affected by the too high baseline which includes mortality due to influenza, RSV and other respiratory viruses that are now largely not circulating during the COVID-19 pandemic. Particularly, the "disappeared" influenza burden in 2020-2021 results in a meaningful underestimation of the true COVID-19 excess mortality. This problem of removing seasonal influenza from the baseline has actually been worked out by epidemiologists using various statistical approaches (sometimes harmonic terms, sometimes using influenza virus data from the WHO as predictors) in the field of epidemiology the literature mentioned above, but the entire literature of excess mortality estimation is missing from the reference list. One that I am very familiar with (!) is Simonsen et al, Plos Med 2014 - but there are many many more similar published papers computing excess mortality for seasonal and recent pandemic influenza out there (look for Viboud, Chowell, Goldstein, Paget, Olson.....). I suggest you simply discuss this situation, and makee reference to this - plus suggest others to work out ways to remove influenza from the baseline, for example incorporate WHOs seasonal influenza timeseries database data (FluNet.org) in the excess mortality regression models (to identify and remove excess mortality during influenza periods).

    3. Varying COVID-19 study time for different countries. Another problem with the way they report the excess mortality is in the difference in follow-up time. Some countries have data up to March 2021, while others only until last summer. This should be dealt with in the estimates, for example by comparing countries with complete year 2000 data. It probably cannot be helped that some countries publish their data late, but the authors should highlight these issues of comparison between countries in the text.

    4. About the finding of a 1.6x higher excess mortality than reported deaths. It seems important to say that this is a finding for countries with national vital statistics in near-real time, so things may be very different in countries where such data to not exist.

    5. Figure 4. Can you explain the time shift between the reported and excess deaths in the United States? Must be a data issue. Also, would be better to chose line colors or width so that one can distinguish the two in black and white.

  4. Reviewer #3 (Public Review):
    This manuscript introduces the World Mortality Dataset, and provides estimates for 'excess' mortality for 89 countries and territories across the world over the course of the COVID-19 pandemic. These data are crucial for tracking the 'true' burden of the pandemic, and is a monumental effort on the part of the authors in collating data from many different sources. This dataset fills a gap in this field by adding countries to several existing sources of mortality data such as the Human Mortality Database.

    While the conclusions of the paper are generally supported by the data and analysis, there are a few major concerns that need to be addressed, particularly when making comparisons across countries:

    1. The main metric used in the paper is excess mortality, which is defined as the difference in observed mortality in 2020 and the baseline expected mortality for 2020 based on historical data from 2015 - 2019. The model adequately controls for known seasonal trends in mortality as well as a longer time trend. One of the main concerns in comparing excess mortality rates across countries is that countries have substantially different population age distributions and age is strongly associated with COVID-19 and other mortality; thus, age-adjusted measures are superior measures for comparing mortality risk across countries. Comparing 'crude' excess mortality rates can be misleading. While the authors may not be able to collect this data for all countries, age-adjusted mortality rates should be estimated for at least the subset of countries for which data is available (such as the majority of European countries). The authors do address this limitation and compute the P-scores. However, showing age-adjusted rates for comparison across countries, where possible, would greatly improve the conclusions of the paper.

    2. The second major concern related to the comparability of data across countries is that, as the authors acknowledge in Section 2.2, the data quality across countries. The consequences of varying levels of data quality, however, is not clear, particularly when making comparisons across countries. At the very least, a discussion of what undercounting of deaths in general might mean when making cross country comparisons would be helpful.

  5. SciScore for 10.1101/2021.01.27.21250604: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The number for Taiwan was absent in the World Bank dataset, so we used the 23,568,378 value from Wikipedia.
    Wikipedia
    suggested: (Wikipedia, RRID:SCR_004897)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    The main limitation of our work is that up until now, we were only able to collect data from 77 nations out of∼200, with particularly sparse coverage in Africa (Figure 1). Furthermore, for some of these countries the latest data points are from June 2020. The COVID-19 pandemic has shown that extrapolating mortality data into the future can be risky: for example, excess mortality in Czechia stayed around zero up until September 2020, but has risen very quickly since then (Figure 2). Another caveat is that excess mortality can have non-covid contributions if a country experienced drastic events in 2020 or 2021 unrelated to the pandemic. Three examples in our data are the August 2020 heat wave in Belgium (excess mortality of ∼1500) and the 2020 Nagorno-Karabakh war between Armenia and Azerbaijan, which cost, by official counts,∼3000 lives on each side. We made a correction to our excess mortality estimates to account for that (see Methods), but could have possibly missed some other similar events in other countries. Some countries may possibly report incomplete mortality numbers (e.g. covering only part of the country) which would make the excess mortality estimate during the COVID-19 outbreak incomplete. Importantly, the early pre-outbreak 2020 data for all countries in our dataset matched well to the baseline obtained from the historic 2015–2019 data, indicating that the data are self-consistent and the excess mortality estimates are not inflated. The World Mortality Dataset i...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.