Predictors of human-infective RNA virus discovery in the United States, China and Africa, an ecological study

Read the full article See related articles



The variation in the pathogen type as well as the spatial heterogeneity of predictors make the generality of any associations with pathogen discovery debatable. Our previous work confirmed that the association of a group of predictors differed across different types of RNA viruses, yet there have been no previous comparisons of the specific predictors for RNA virus discovery in different regions. The aim of the current study was to close the gap by investigating whether predictors of discovery rates within three regions—the United States, China and Africa—differ from one another and from those at the global level.


Based on a comprehensive list of human-infective RNA viruses, we collated published data on first discovery of each species in each region. We used a Poisson boosted regression tree (BRT) model to examine the relationship between virus discovery and 33 predictors representing climate, socio-economics, land use, and biodiversity across each region separately. The discovery probability in three regions in 2010–2019 was mapped using the fitted models and historical predictors.


The numbers of human-infective virus species discovered in the United States, China and Africa up to 2019 were 95, 80 and 107 respectively, with China lagging behind the other two regions. In each region, discoveries were clustered in hotspots. BRT modelling suggested that in all three regions RNA virus discovery was best predicted by land use and socio- economic variables, followed by climatic variables and biodiversity, though the relative importance of these predictors varied by region. Map of virus discovery probability in 2010– 2019 indicated several new hotspots outside historical high-risk areas. Most new virus species since 2010 in each region (6/6 in the United States, 19/19 in China, 12/19 in Africa) were discovered in high risk areas as predicted by our model.


The drivers of spatiotemporal variation in virus discovery rates vary in different regions of the world. Within regions virus discovery is driven mainly by land-use and socio- economic variables; climate and biodiversity variables are consistently less important predictors than at a global scale. Potential new discovery hotspots in 2010–2019 are identified. Results from the study could guide active surveillance for new human-infective viruses in local high risk areas.


Darwin Trust of Edinburgh; European Union.

Article activity feed

  1. Evaluation Summary:

    This study will be of interest to readers in the field of virus discovery. This study attempts to identify predictors of human-infective RNA virus discovery and predict high risk areas in a recent period in the United States, China and Africa using an ecological modelling framework. The study has potential to inform future discovery efforts for human-infective viruses . However it is not clear that key claims of the manuscript are currently fully supported.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

  2. Reviewer #1 (Public Review):

    This study attempts to identify predictors of human-infective RNA virus discovery and predict high risk areas in a recent period in the United States, China and Africa using a ecological modelling framework.

    According to findings from their previous study published in 2020, the main predictors for virus discovery at the global scale were GDP-related i.e. and they concluded that this may largely have driven by research effort rather than the underlying biology. In the current study, they have attempted to focus on more restricted and homogenous regions where they suspect research effort is less heterogeneous to an attempt to identify predictors more associated with virus biology. The study is relevant in the current context and identification of areas at threat of emerging viral pathogens. However I'm not certain that the design and data (and inherent biases in virus discovery) may impact these findings/predictions and also whether the more distal covariates/predictors used truly capture viral biology and emergence in space-time.

  3. Reviewer #2 (Public Review):

    The last two decades have seen considerable research efforts in identifying global hotspots and drivers of RNA virus emergence for guiding surveillance and control efforts. A recent study by the same authors (Zhang et al. 2020) used machine learning methods and a well-curated list of discovery sites of human RNA viruses to show that previously learnt patterns of virus discoveries at a global scale may be driven by socio-economic (GDP/research effort-related) rather than underlying biology. In this manuscript, the authors extend this work through a separate analysis of three relatively homogeneous regions (US, China and Africa) to identity variation in virus discovery rates between regions, but there was consistency in variables (land-use and socio-economics) in all three regions. They also identify potential new discovery hotspots in 2010-2019. This paper is in line with a series of data-driven studies that aimed to identify variables that can be useful for improving surveillance and control against emerging viruses.

    I have no particular concerns with the data, analysis and results presented in this manuscript. It appears to follow their recent work performed on a global scale.