Evolutionary velocity with protein language models predicts evolutionary dynamics of diverse proteins

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

No abstract available

Article activity feed

  1. SciScore for 10.1101/2021.06.07.447389: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    We also used the TAPE transformer model [14] (obtained from https://github.com/songlab-cal/tape) trained on the Pfam database release 32.0 [31].
    Pfam
    suggested: (Pfam, RRID:SCR_004726)
    In practice, x(a) and x(b) can disagree in length, so we first perform a global pairwise sequence alignment using the pairwise2 module in the Biopython Python package version 1.76 with a uniform substitution matrix and alignment parameters meant to discourage the introduction of sequence gaps (following the Biopython recommendations, we use a match score of 5, a mismatch penalty of −4, a gap-open penalty of −4, and a gap-extension penalty of −0.1).
    Biopython
    suggested: (Biopython, RRID:SCR_007173)
    We used the scipy version 1.4.1 Python package to compute correlations and statistical tests.
    scipy
    suggested: (SciPy, RRID:SCR_008058)
    Python
    suggested: (IPython, RRID:SCR_001658)
    Once these vectors are computed, we use the streamplot and quiver plot functionality of the matplotlib Python package version 3.3.3 to visualize evo-velocity.
    matplotlib
    suggested: (MatPlotLib, RRID:SCR_008624)
    To project evo-velocity into sequence space, we first construct a multiple sequence alignment of all M sequences using MAFFT version 7.475.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    We obtained a phylogenetic tree of all NP sequences considered in the evo-velocity analysis by first aligning sequences with MAFFT followed by approximate maximum-likelihood tree construction using FastTree version 2.1 using a JTT+CAT model.
    FastTree
    suggested: (FastTree, RRID:SCR_015501)
    We obtained four SIVcpz Gag sequences with high-quality, manual annotation from UniProt (https://www.uniprot.org/) [54].
    https://www.uniprot.org/
    suggested: (Universal Protein Resource, RRID:SCR_002380)
    We then performed a multiple sequence alignment with MAFFT and performed phylogenetic reconstruction on the alignment with PhyML using a JTT model with gamma-distributed among-site rate variation and empirical state frequencies.
    PhyML
    suggested: (PhyML, RRID:SCR_014629)
    Serpins evo-velocity analysis: We obtained 22,737 serpin sequences from UniProt.
    UniProt
    suggested: (UniProtKB, RRID:SCR_004426)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.