Evolutionary velocity with protein language models predicts evolutionary dynamics of diverse proteins

Version published to 10.1016/j.cels.2022.01.003

Apr 1, 2022

SciScore for 10.1101/2021.06.07.447389: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
We also used the TAPE transformer model [14] (obtained from https://github.com/songlab-cal/tape) trained on the Pfam database release 32.0 [31].	Pfam suggested: (Pfam, RRID:SCR_004726)
In practice, x(a) and x(b) can disagree in length, so we first perform a global pairwise sequence alignment using the pairwise2 module in the Biopython Python package version 1.76 with a uniform substitution matrix and alignment parameters meant to discourage the introduction of sequence gaps (following the Biopython recommendations, we use a match score of 5, a mismatch penalty of −4, a gap-open penalty of …

SciScore for 10.1101/2021.06.07.447389: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
We also used the TAPE transformer model [14] (obtained from https://github.com/songlab-cal/tape) trained on the Pfam database release 32.0 [31].	Pfam suggested: (Pfam, RRID:SCR_004726)
In practice, x(a) and x(b) can disagree in length, so we first perform a global pairwise sequence alignment using the pairwise2 module in the Biopython Python package version 1.76 with a uniform substitution matrix and alignment parameters meant to discourage the introduction of sequence gaps (following the Biopython recommendations, we use a match score of 5, a mismatch penalty of −4, a gap-open penalty of −4, and a gap-extension penalty of −0.1).	Biopython suggested: (Biopython, RRID:SCR_007173)
We used the scipy version 1.4.1 Python package to compute correlations and statistical tests.	scipy suggested: (SciPy, RRID:SCR_008058) Python suggested: (IPython, RRID:SCR_001658)
Once these vectors are computed, we use the streamplot and quiver plot functionality of the matplotlib Python package version 3.3.3 to visualize evo-velocity.	matplotlib suggested: (MatPlotLib, RRID:SCR_008624)
To project evo-velocity into sequence space, we first construct a multiple sequence alignment of all M sequences using MAFFT version 7.475.	MAFFT suggested: (MAFFT, RRID:SCR_011811)
We obtained a phylogenetic tree of all NP sequences considered in the evo-velocity analysis by first aligning sequences with MAFFT followed by approximate maximum-likelihood tree construction using FastTree version 2.1 using a JTT+CAT model.	FastTree suggested: (FastTree, RRID:SCR_015501)
We obtained four SIVcpz Gag sequences with high-quality, manual annotation from UniProt (https://www.uniprot.org/) [54].	https://www.uniprot.org/ suggested: (Universal Protein Resource, RRID:SCR_002380)
We then performed a multiple sequence alignment with MAFFT and performed phylogenetic reconstruction on the alignment with PhyML using a JTT model with gamma-distributed among-site rate variation and empirical state frequencies.	PhyML suggested: (PhyML, RRID:SCR_014629)
Serpins evo-velocity analysis: We obtained 22,737 serpin sequences from UniProt.	UniProt suggested: (UniProtKB, RRID:SCR_004426)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Evolutionary velocity with protein language models predicts evolutionary dynamics of diverse proteins

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

The Evolution of the AlphaFold Architecture

Molecular Evolution of the <i>Fusion</i> (<i>F</i>) Genes in Human Metapneumovirus Genotype B

Protein Language Models Rescue Variant Pathogenicity Prediction in Intrinsically Disordered Regions Through Synergistic Integration with Structure-Based Methods