Predicting hosts based on early SARS-CoV-2 samples and analyzing the 2020 pandemic

This article has been Reviewed by the following groups

Read the full article

Abstract

The SARS-CoV-2 pandemic has raised concerns in the identification of the hosts of the virus since the early stages of the outbreak. To address this problem, we proposed a deep learning method, DeepHoF, based on extracting viral genomic features automatically, to predict the host likelihood scores on five host types, including plant, germ, invertebrate, non-human vertebrate and human, for novel viruses. DeepHoF made up for the lack of an accurate tool, reaching a satisfactory AUC of 0.975 in the five-classification, and could make a reliable prediction for the novel viruses without close neighbors in phylogeny. Additionally, to fill the gap in the efficient inference of host species for SARS-CoV-2 using existing tools, we conducted a deep analysis on the host likelihood profile calculated by DeepHoF. Using the isolates sequenced in the earliest stage of the COVID-19 pandemic, we inferred that minks, bats, dogs and cats were potential hosts of SARS-CoV-2, while minks might be one of the most noteworthy hosts. Several genes of SARS-CoV-2 demonstrated their significance in determining the host range. Furthermore, a large-scale genome analysis, based on DeepHoF’s computation for the later pandemic in 2020, disclosed the uniformity of host range among SARS-CoV-2 samples and the strong association of SARS-CoV-2 between humans and minks.

Article activity feed

  1. SciScore for 10.1101/2021.03.21.436312: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Phylogenetic analysis and single nucleotide polymorphisms analysis: In this study, we applied Clustal Omega software [40]
    Clustal Omega
    suggested: (Clustal Omega, RRID:SCR_001591)
    (version 1.2.4) for multiple sequence alignment and RAxML software [41
    RAxML
    suggested: (RAxML, RRID:SCR_006086)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Overcoming the limitation of sequence similarity-based methods to disclose the host information of novel viruses, DeepHoF demonstrated the practicality to SARS-CoV-2 in the 2020 pandemic. Using 17 SARS-CoV-2 isolates sequenced in the earliest stage of COVID-19 detection, DeepHoF evaluated the host likelihood scores on humans and non-human vertebrates for SARS-CoV-2. Filling the gap in predicting the host species for any novel virus that remained unsolved using the tools which were state of the art, we further analyzed the host likelihood score profile to further infer the specific hosts of SARS-CoV-2. The hosts determined by DeepHoF can be either reservoirs or susceptible middle hosts, which are not discriminated in this study. We found minks, bats, dogs and cats could be potential hosts of SARS-CoV-2, while minks might be one of the most noteworthy animal hosts. Due to mutations, the host likelihood score profiles of the isolates in the long period of the later pandemic had slightly varied, but followed normal distribution where those of the early 17 isolates locate in the center. As a consequence, the host range inferred with the profiles of the isolates during the pandemic was consistent with the inference using the early samples. Additionally, based on the model, we further found three genes (S gene, ORF7b and ORF1ab) and two genes (ORF1ab and ORF8) were significant in determining the host likelihood score on human and the host range for SARS-CoV-2, respectively. The gene...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

  2. SciScore for 10.1101/2020.01.21.914044: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variablenot detected.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.