Predicting hosts based on early SARS-CoV-2 samples and analyzing the 2020 pandemic
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
The SARS-CoV-2 pandemic has raised concerns in the identification of the hosts of the virus since the early stages of the outbreak. To address this problem, we proposed a deep learning method, DeepHoF, based on extracting viral genomic features automatically, to predict the host likelihood scores on five host types, including plant, germ, invertebrate, non-human vertebrate and human, for novel viruses. DeepHoF made up for the lack of an accurate tool, reaching a satisfactory AUC of 0.975 in the five-classification, and could make a reliable prediction for the novel viruses without close neighbors in phylogeny. Additionally, to fill the gap in the efficient inference of host species for SARS-CoV-2 using existing tools, we conducted a deep analysis on the host likelihood profile calculated by DeepHoF. Using the isolates sequenced in the earliest stage of the COVID-19 pandemic, we inferred that minks, bats, dogs and cats were potential hosts of SARS-CoV-2, while minks might be one of the most noteworthy hosts. Several genes of SARS-CoV-2 demonstrated their significance in determining the host range. Furthermore, a large-scale genome analysis, based on DeepHoF’s computation for the later pandemic in 2020, disclosed the uniformity of host range among SARS-CoV-2 samples and the strong association of SARS-CoV-2 between humans and minks.
Article activity feed
-
-
SciScore for 10.1101/2021.03.21.436312: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Phylogenetic analysis and single nucleotide polymorphisms analysis: In this study, we applied Clustal Omega software [40] Clustal Omegasuggested: (Clustal Omega, RRID:SCR_001591)(version 1.2.4) for multiple sequence alignment and RAxML software [41 RAxMLsuggested: (RAxML, RRID:SCR_006086)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Overcoming the limitation of sequence similarity-based methods to disclose the host information of novel viruses, DeepHoF demonstrated the …
SciScore for 10.1101/2021.03.21.436312: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources Phylogenetic analysis and single nucleotide polymorphisms analysis: In this study, we applied Clustal Omega software [40] Clustal Omegasuggested: (Clustal Omega, RRID:SCR_001591)(version 1.2.4) for multiple sequence alignment and RAxML software [41 RAxMLsuggested: (RAxML, RRID:SCR_006086)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Overcoming the limitation of sequence similarity-based methods to disclose the host information of novel viruses, DeepHoF demonstrated the practicality to SARS-CoV-2 in the 2020 pandemic. Using 17 SARS-CoV-2 isolates sequenced in the earliest stage of COVID-19 detection, DeepHoF evaluated the host likelihood scores on humans and non-human vertebrates for SARS-CoV-2. Filling the gap in predicting the host species for any novel virus that remained unsolved using the tools which were state of the art, we further analyzed the host likelihood score profile to further infer the specific hosts of SARS-CoV-2. The hosts determined by DeepHoF can be either reservoirs or susceptible middle hosts, which are not discriminated in this study. We found minks, bats, dogs and cats could be potential hosts of SARS-CoV-2, while minks might be one of the most noteworthy animal hosts. Due to mutations, the host likelihood score profiles of the isolates in the long period of the later pandemic had slightly varied, but followed normal distribution where those of the early 17 isolates locate in the center. As a consequence, the host range inferred with the profiles of the isolates during the pandemic was consistent with the inference using the early samples. Additionally, based on the model, we further found three genes (S gene, ORF7b and ORF1ab) and two genes (ORF1ab and ORF8) were significant in determining the host likelihood score on human and the host range for SARS-CoV-2, respectively. The gene...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
-
SciScore for 10.1101/2020.01.21.914044: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Than…
SciScore for 10.1101/2020.01.21.914044: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement not detected. Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable not detected. Table 2: Resources
No key resources detected.
Results from OddPub: Thank you for sharing your data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- No funding statement was detected.
- No protocol registration statement was detected.
-
-
-
-