Epistatic models predict mutable sites in SARS-CoV-2 proteins and epitopes

Abstract

During the COVID pandemic, new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants emerge and spread, some being of major concern due to their increased infectivity or capacity to reduce vaccine efficiency. Anticipating mutations, which might give rise to new variants, would be of great interest. We construct sequence models predicting how mutable SARS-CoV-2 positions are, using a single SARS-CoV-2 sequence and databases of other coronaviruses. Predictions are tested against available mutagenesis data and the observed variability of SARS-CoV-2 proteins. Interestingly, predictions agree increasingly with observations, as more SARS-CoV-2 sequences become available. Combining predictions with immunological data, we find an overrepresentation of mutations in current variants of concern. The approach may become relevant for potential outbreaks of future viral diseases.

SciScore for 10.1101/2021.12.11.472202: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Protein domains were detected using the HMMER suite ((49), version 3.1b2) and the HMM profiles from Pfam.	HMMER suggested: (Hmmer, RRID:SCR_005305) Pfam suggested: (Pfam, RRID:SCR_004726)
A global database including distant species was built by combining Uniref90, ViPR, NCBI viral genomes and MERS coronavirus database, and used to train the DCA and IND models.	ViPR suggested: (vipR, RRID:SCR_010685)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

It also has its limitations, …

SciScore for 10.1101/2021.12.11.472202: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Protein domains were detected using the HMMER suite ((49), version 3.1b2) and the HMM profiles from Pfam.	HMMER suggested: (Hmmer, RRID:SCR_005305) Pfam suggested: (Pfam, RRID:SCR_004726)
A global database including distant species was built by combining Uniref90, ViPR, NCBI viral genomes and MERS coronavirus database, and used to train the DCA and IND models.	ViPR suggested: (vipR, RRID:SCR_010685)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

It also has its limitations, most importantly their dependence on the availability of sufficiently large and diverged sequence ensembles. In fact, we observe that a greater number of sequences usually increases the performance of the approach (Fig. 4C and S10). However, it is important to note that the inclusion of more divergent sequences might not always be the best strategy as the model might capture constraints that are not relevant for the specific SARS-CoV-2 context. This trade-off will be explored in future work. Our approach can be extended in several ways. One is to include how different domains might constrain the variability of other domains. However, according to our analysis in the previous section, inter-domain epistasis seems to play only a minor role, even if more sequence data might be needed to better estimate the influence of inter-domain or inter-protein epistasis. Another is to model constraints due to specific virus-host interaction, which is currently out of our scope, as we do not consider host sequences in the MSAs. Indeed, we observe the correlation of experimental binding to ACE2 and our predictions (Pearson’s r = 0.27) can be fully explained through the protein expression (Pearson’s r partial correlation controlled by expression = −0.02). In an attempt to explore this issue, we built co-alignments of receptor-binding domains with homologs of ACE2 present in the hosts of other coronaviruses. Since the binding mechanism between RBD and ACE2 homologs ...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Epistatic models predict mutable sites in SARS-CoV-2 proteins and epitopes

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

An HLA Association With COVID-19 Vaccine Reactogenicity Correlates With Fewer SARS-CoV-2 Infections and Monocyte Activation

Global Genomic Surveillance Reveals Pre-EUA Fixation of Pemivibart (VYD2311) Escape Constellations in SARS-CoV-2

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

An HLA Association With COVID-19 Vaccine Reactogenicity Correlates With Fewer SARS-CoV-2 Infections and Monocyte Activation

Global Genomic Surveillance Reveals Pre-EUA Fixation of Pemivibart (VYD2311) Escape Constellations in SARS-CoV-2

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.