A missense variant effect prediction and annotation resource for SARS-CoV-2

Abstract

The COVID19 pandemic is a global crisis severely impacting many people across the world. An important part of the response is monitoring viral variants and determining the impact they have on viral properties, such as infectivity, disease severity and interactions with drugs and vaccines. In this work we generate and make available computational variant effect predictions for all possible single amino-acid substitutions to SARS-CoV-2 in order to complement and facilitate experiments and expert analysis. The resulting dataset contains predictions from evolutionary conservation and protein and complex structural models, combined with viral phosphosites, experimental results and variant frequencies. We demonstrate predictions’ effectiveness by comparing them with expectations from variant frequency and prior experiments. We then identify higher frequency variants with significant predicted effects as well as finding variants measured to impact antibody binding that are least likely to impact other viral functions. A web portal is available at sars.mutfunc.com , where the dataset can be searched and downloaded.

SciScore for 10.1101/2021.02.24.432721: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
A custom reference database was generated based on the NCBI virus coronavirus genomes dataset (NCBI Resource Coordinators, 2018), which includes sequences from a large range of coronaviruses.	NCBI Resource Coordinators suggested: None
Models were examined in turn and any position not covered by a higher priority model was added to the FoldX analysis pipeline.	FoldX suggested: (FoldX, RRID:SCR_008522)
It was filtered to exclude problematic sites using VCFTools, based on the annotation at https://github.com/W-L/ProblematicSites_SARS-CoV2/blob/master/problematic_sites_sarsCov2.vcf.	VCFTools suggested…

SciScore for 10.1101/2021.02.24.432721: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
A custom reference database was generated based on the NCBI virus coronavirus genomes dataset (NCBI Resource Coordinators, 2018), which includes sequences from a large range of coronaviruses.	NCBI Resource Coordinators suggested: None
Models were examined in turn and any position not covered by a higher priority model was added to the FoldX analysis pipeline.	FoldX suggested: (FoldX, RRID:SCR_008522)
It was filtered to exclude problematic sites using VCFTools, based on the annotation at https://github.com/W-L/ProblematicSites_SARS-CoV2/blob/master/problematic_sites_sarsCov2.vcf.	VCFTools suggested: (VCFtools, RRID:SCR_001235)
The SARS-CoV-2 genome was sourced from Ensembl (Yates et al., 2020) and Tabix indexed.	Ensembl suggested: (Ensembl, RRID:SCR_002344)

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
No funding statement was detected.
No protocol registration statement was detected.

Read the original source

A missense variant effect prediction and annotation resource for SARS-CoV-2

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed