Single-nucleotide conservation state annotation of the SARS-CoV-2 genome
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Given the global impact and severity of COVID-19, there is a pressing need for a better understanding of the SARS-CoV-2 genome and mutations. Multi-strain sequence alignments of coronaviruses (CoV) provide important information for interpreting the genome and its variation. We apply a comparative genomics method, ConsHMM, to the multi-strain alignments of CoV to annotate every base of the SARS-CoV-2 genome with conservation states based on sequence alignment patterns among CoV. The learned conservation states show distinct enrichment patterns for genes, protein domains, and other regions of interest. Certain states are strongly enriched or depleted of SARS-CoV-2 mutations, which can be used to predict potentially consequential mutations. We expect the conservation states to be a resource for interpreting the SARS-CoV-2 genome and mutations.
Article activity feed
-
-
-
-
SciScore for 10.1101/2020.07.13.201277: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources We additionally used the PHAST software29 to learn PhastCons and PhyloP scores from the vertebrate CoV sequence alignment that we generated from the 119-way alignment as described above. PHASTsuggested: (PHAST, RRID:SCR_003204)Masking bases: For all but one downstream analysis, we masked problematic genomic positions listed in UCSC Genome Browser track ‘Problematic Sites’ (accessed on Sept 7, 2020) as they are likely affected by sequencing errors, low coverage, contamination, homoplasy, or hypermutability16,30,31. UCSC Genome Browsersuggested: (UCSC Genome Browser, RRID:SCR_005780)Results …
SciScore for 10.1101/2020.07.13.201277: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources We additionally used the PHAST software29 to learn PhastCons and PhyloP scores from the vertebrate CoV sequence alignment that we generated from the 119-way alignment as described above. PHASTsuggested: (PHAST, RRID:SCR_003204)Masking bases: For all but one downstream analysis, we masked problematic genomic positions listed in UCSC Genome Browser track ‘Problematic Sites’ (accessed on Sept 7, 2020) as they are likely affected by sequencing errors, low coverage, contamination, homoplasy, or hypermutability16,30,31. UCSC Genome Browsersuggested: (UCSC Genome Browser, RRID:SCR_005780)Results from OddPub: Thank you for sharing your code.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-