A graph-based approach identifies dynamic H-bond communication networks in spike protein S of SARS-CoV-2

Abstract

Corona virus spike protein S is a large homo-trimeric protein embedded in the membrane of the virion particle. Protein S binds to angiotensin-converting-enzyme 2, ACE2, of the host cell, followed by proteolysis of the spike protein, drastic protein conformational change with exposure of the fusion peptide of the virus, and entry of the virion into the host cell. The structural elements that govern conformational plasticity of the spike protein are largely unknown. Here, we present a methodology that relies upon graph and centrality analyses, augmented by bioinformatics, to identify and characterize large H-bond clusters in protein structures. We apply this methodology to protein S ectodomain and find that, in the closed conformation, the three protomers of protein S bring the same contribution to an extensive central network of H-bonds, has a relatively large H-bond cluster at the receptor binding domain, and a cluster near a protease cleavage site. Markedly different H-bonding at these three clusters in open and pre-fusion conformations suggest dynamic H-bond clusters could facilitate structural plasticity and selection of a protein S protomer for binding to the host receptor, and proteolytic cleavage. From analyses of spike protein sequences we identify patches of histidine and carboxylate groups that could be involved in transient proton binding.

SciScore for 10.1101/2020.06.23.164947: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
The resulting sequences were realigned with MAFFT using SARS-CoV-2 as reference.	MAFFT suggested: (MAFFT, RRID:SCR_011811)
Set-D contains human sequences of ACE2 from the 1000 human Genome project (Auton and Brooks, 2015); for this set, we used the Ensembl project (Hunt et al., 2018) and extracted the different protein haplotypes existing in the 1000 Genome Project for ACE2 using the GRCh38 human genome assembly as reference.	Ensembl suggested: (Ensembl, RRID:SCR_002344)
Computations of the electrostatic potential surface: were performed with the Adaptive Poisson Boltzmann Solver, APBS (Baker …

SciScore for 10.1101/2020.06.23.164947: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
The resulting sequences were realigned with MAFFT using SARS-CoV-2 as reference.	MAFFT suggested: (MAFFT, RRID:SCR_011811)
Set-D contains human sequences of ACE2 from the 1000 human Genome project (Auton and Brooks, 2015); for this set, we used the Ensembl project (Hunt et al., 2018) and extracted the different protein haplotypes existing in the 1000 Genome Project for ACE2 using the GRCh38 human genome assembly as reference.	Ensembl suggested: (Ensembl, RRID:SCR_002344)
Computations of the electrostatic potential surface: were performed with the Adaptive Poisson Boltzmann Solver, APBS (Baker et al., 2001), in PyMol 2.0 (Schrödinger, 2015).	PyMol suggested: (PyMOL, RRID:SCR_000305)
As computations of average H-bond graphs require the same number of amino acid residues in the graphs to be averaged, where needed we used Modeller 9.21 (Marti-Renom et al., 2000) to construct coordinates for missing amino acid residues.	Modeller suggested: (MODELLER, RRID:SCR_008395)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on pages 19, 20, 29 and 23. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
No funding statement was detected.
No protocol registration statement was detected.

Read the original source

A graph-based approach identifies dynamic H-bond communication networks in spike protein S of SARS-CoV-2

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

A retroelement-derived mammalian ARC protein exhibits selective RNA recognition and nucleic acid chaperone functions

Structure-based computational screening and molecular dynamics reveal potential inhibitors of Norovirus VP1 and RdRp Proteins: an in-silico study

Structural and biochemical basis of ROC-dependent activation of LRRK2

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A retroelement-derived mammalian ARC protein exhibits selective RNA recognition and nucleic acid chaperone functions

Structure-based computational screening and molecular dynamics reveal potential inhibitors of Norovirus VP1 and RdRp Proteins: an in-silico study

Structural and biochemical basis of ROC-dependent activation of LRRK2