Variation in the ACE2 receptor has limited utility for SARS-CoV-2 host prediction

Curation statements for this article:
  • Curated by eLife

    eLife logo

    Evaluation Summary:

    This important study shows that methods currently used to predict which animals species might be at risk of infection by SARS-CoV-2, by looking at features of the host cell receptor the virus binds to, are fundamentally flawed, with exceptionally strong support for this conclusion. Much work on the potential host range of SARS-CoV-2 has focused on measuring the susceptibility of different species' ACE2 receptors to sarbecovirus entry and extending predictions to other unmeasured species based on ACE2 sequence features. Mollentze and colleagues show that ACE2 sequences are not more than a proxy for generic species relationships. In other words, species phylogeny alone can provide equivalent predictive power, allowing for predictions of mammalian susceptibility to sarbecovirus infection for the many species for which ACE2 sequences are not known yet.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Transmission of SARS-CoV-2 from humans to other species threatens wildlife conservation and may create novel sources of viral diversity for future zoonotic transmission. A variety of computational heuristics have been developed to pre-emptively identify susceptible host species based on variation in the angiotensin-converting enzyme 2 (ACE2) receptor used for viral entry. However, the predictive performance of these heuristics remains unknown. Using a newly compiled database of 96 species, we show that, while variation in ACE2 can be used by machine learning models to accurately predict animal susceptibility to sarbecoviruses (accuracy = 80.2%, binomial confidence interval [CI]: 70.8–87.6%), the sites informing predictions have no known involvement in virus binding and instead recapitulate host phylogeny. Models trained on host phylogeny alone performed equally well (accuracy = 84.4%, CI: 75.5–91.0%) and at a level equivalent to retrospective assessments of accuracy for previously published models. These results suggest that the predictive power of ACE2-based models derives from strong correlations with host phylogeny rather than processes which can be mechanistically linked to infection biology. Further, biased availability of ACE2 sequences misleads projections of the number and geographic distribution of at-risk species. Models based on host phylogeny reduce this bias, but identify a very large number of susceptible species, implying that model predictions must be combined with local knowledge of exposure risk to practically guide surveillance. Identifying barriers to viral infection or onward transmission beyond receptor binding and incorporating data which are independent of host phylogeny will be necessary to manage the ongoing risk of establishment of novel animal reservoirs of SARS-CoV-2.

Article activity feed

  1. Evaluation Summary:

    This important study shows that methods currently used to predict which animals species might be at risk of infection by SARS-CoV-2, by looking at features of the host cell receptor the virus binds to, are fundamentally flawed, with exceptionally strong support for this conclusion. Much work on the potential host range of SARS-CoV-2 has focused on measuring the susceptibility of different species' ACE2 receptors to sarbecovirus entry and extending predictions to other unmeasured species based on ACE2 sequence features. Mollentze and colleagues show that ACE2 sequences are not more than a proxy for generic species relationships. In other words, species phylogeny alone can provide equivalent predictive power, allowing for predictions of mammalian susceptibility to sarbecovirus infection for the many species for which ACE2 sequences are not known yet.

    (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 agreed to share their name with the authors.)

  2. Reviewer #1 (Public Review):

    Mollentze et al. seek to understand how models to predict species susceptibility to sarbecovirus infection/spillover could be improved and extended. The authors assemble a useful dataset identifying and prioritizing evidence of susceptibility of all animals tested thus far across four classes of experimental study. They appropriately address many questions on data inclusion and bias. Some details of data inclusion and weighting could be more carefully considered, for example the inclusion of native reservoir Rhinolophus bats for which virus:ACE2 arms races drive different dynamics of susceptibility/exclusion than for other species where sarbecovirus spillover is more novel or transient. There is also conflation of ACE2-based metrics like cell culture heterologous susceptibility in the training dataset when true susceptibility is the intended predictive feature may be further confounding model performance. However, as the authors point out in a nicely written Discussion, details of data availability (e.g., ACE2 sequence) are much more limiting in light of observations of widespread animal susceptibility, and that perhaps models of individual species susceptibility (especially based on ACE2 sequence alone) will be better complemented with more careful details of ecological and epidemiological relevance.

  3. Reviewer #2 (Public Review):

    Through a dissection of machine learning approaches to predict potential host species, the authors find the ACE2 receptor used by the virus to enter host cells can be useful, but is actually recapitulating the host phylogeny. As the host phylogeny is often more readily available than information of host receptor sequences or properties, the authors suggest it can be used to predict potential zoonotic sources of SARS-CoV-2. They go on to suggest that when used in conjunction with data on species distributions this can be a powerful and efficient way to identify potential host species, although further data on virus shedding and sustained transmission is needed to increase the practical implications of implementing this work.

    I found the work to be of high quality and found several questions I had were answered by the subsequent paragraph which to me highlights the study is carefully thought through and well written.

    I felt the main strength of the paper was dissecting their machine learning approach, and in particular demonstrating that ACE2 information seemingly recapitulates the host phylogeny. This suggests that although binding and entering host cells via interactions with ACE2 is of clear importance to CoV's infecting host cells, other factors (which are captured by the host phylogeny) are as good, if not better, at predicting overall susceptibility. The discussion is thoughtful, covering the limitations and future directions and implications of this work.

    The study has few weaknesses, there is sound rationale for combining data on natural, experimental and cell culture infections but this could be better justified. Trying to capture suitable features of ACE2 is challenging, and they have done a solid job of this but more description and justification of the ACE2 data used in models would improve the clarity of the manuscript.

  4. SciScore for 10.1101/2022.05.16.492068: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Data on ACE2 orthologs were obtained from the NCBI Gene database (GeneID: 59272; list of all orthologs downloaded 16 March 2022).
    NCBI Gene
    suggested: None
    GeneID
    suggested: (GeneID, RRID:SCR_021639)
    Evaluating phylogenetic clustering: Amino acid sequences for all ACE2 orthologs were downloaded from the NCBI protein database and aligned using the E-INS-i option of MAFFT version 7.471 (Katoh and Standley, 2013).
    NCBI
    suggested: (NCBI, RRID:SCR_006472)
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    We also obtained a phylogeny reflecting the estimated divergence dates of all amniotes from TimeTree version 4 (Kumar et al., 2017).
    TimeTree
    suggested: (TimeTree, RRID:SCR_021162)
    We also summarised the amino acids at each variable alignment position using the physicochemical properties hydrophobicity, polarity, net charge, and Van der Waals volume (‘AA properties’, 1841 features), with values for individual amino acids obtained from the AAindex database (accessions JURD980101, ZIMJ680103, KLEP840101, FAUJ880103) (Fauchère et al., 1988; Juretić et al., 1998; Kawashima et al., 1999; Klein et al., 1984; Zimmerman et al., 1968).
    AAindex
    suggested: None

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    By cataloguing available data on sarbecovirus host range, we show that while a variety of ACE2-based approaches produce relatively sensitive and specific predictions, these predictions largely derive from strong correlations with host phylogeny, and limitations and biases in their input data limit their actionability. Models based on host phylogeny alone perform equivalently, enabling scalable prediction across nearly all mammals, but imply that in the absence of additional metrics of inter-species exposures, vast numbers of species would need to be surveyed with limited geographic or taxonomic focus. ACE2 sequences showed surprising accuracy for predicting sarbecovirus host range, both in the formally-trained models produced here and in earlier heuristics. However, various lines of evidence suggest the predictive power of ACE2-based models derives primarily from phylogenetic correlation. First, susceptibility to sarbecovirus infection was highly conserved, clustering in patterns consistent with the evolutionary history of potential host species. Second, ACE2 is also evolutionarily conserved, and a phylogeny derived from ACE2 sequences was highly congruent with one reflecting broader evolutionary history. Third, commonly-used approaches for representing ACE2 sequence differences are largely equivalent in the amount of susceptibility information carried (Figure 2 & Supplementary figure S4), despite measuring very different aspects of ACE2 sequence variation and its interaction...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.