Tuning intrinsic disorder predictors for virus proteins

Gal Almog
Abayomi S Olabode
Art F Y Poon

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (ScreenIT)

Abstract

Many virus-encoded proteins have intrinsically disordered regions that lack a stable, folded three-dimensional structure. These disordered proteins often play important functional roles in virus replication, such as down-regulating host defense mechanisms. With the widespread availability of next-generation sequencing, the number of new virus genomes with predicted open reading frames is rapidly outpacing our capacity for directly characterizing protein structures through crystallography. Hence, computational methods for structural prediction play an important role. A large number of predictors focus on the problem of classifying residues into ordered and disordered regions, and these methods tend to be validated on a diverse training set of proteins from eukaryotes, prokaryotes, and viruses. In this study, we investigate whether some predictors outperform others in the context of virus proteins and compared our findings with data from non-viral proteins. We evaluate the prediction accuracy of 21 methods, many of which are only available as web applications, on a curated set of 126 proteins encoded by viruses. Furthermore, we apply a random forest classifier to these predictor outputs. Based on cross-validation experiments, this ensemble approach confers a substantial improvement in accuracy, e.g., a mean 36 per cent gain in Matthews correlation coefficient. Lastly, we apply the random forest predictor to severe acute respiratory syndrome coronavirus 2 ORF6, an accessory gene that encodes a short (61 AA) and moderately disordered protein that inhibits the host innate immune response. We show that disorder prediction methods perform differently for viral and non-viral proteins, and that an ensemble approach can yield more robust and accurate predictions.

Version published to 10.1093/ve/veaa106
Jan 1, 2021

SciScore for 10.1101/2020.10.27.357954: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
In addition, we calculated the accuracy, specificity and sensitivity for each predictor from the contingency table of DisProt residue labels and dichotomized predictions.	DisProt suggested: None
Specifically, we used the random forest method implemented in the scikit-learn (version 0.23.1	scikit-learn suggested: (scikit-learn, RRID:SCR_002577)
) Python module [38], which employs a set of de-correlated decision trees and averages their respective outputs to obtain an ensemble prediction [39].	Python suggested: (IPython, RRID:SCR_001658)

Results from OddPub: Thank you for sharing your code and data.

R…

SciScore for 10.1101/2020.10.27.357954: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
In addition, we calculated the accuracy, specificity and sensitivity for each predictor from the contingency table of DisProt residue labels and dichotomized predictions.	DisProt suggested: None
Specifically, we used the random forest method implemented in the scikit-learn (version 0.23.1	scikit-learn suggested: (scikit-learn, RRID:SCR_002577)
) Python module [38], which employs a set of de-correlated decision trees and averages their respective outputs to obtain an ensemble prediction [39].	Python suggested: (IPython, RRID:SCR_001658)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Version published to 10.1101/2020.10.27.357954v1 on bioRxiv
Oct 27, 2020

Categorizing prediction modes within low-pLDDT regions of AlphaFold2 structures

This article has 4 authors:
1. Christopher J Williams
2. Vincent B Chen
3. David C Richardson
4. Jane S Richardson
Reviewed by Arcadia Science

This article has 7 evaluationsAppears in 1 listLatest version Jun 7, 2025Latest activity Jun 20, 2025
Predicting the Evolutionary and Functional Landscapes of Viruses with a Unified Nucleotide-Protein Language Model: LucaVirus

This article has 16 authors:
1. Yuan-Fei Pan
2. Yong He
3. Yu-Qi Liu
4. Yong-Tao Shan
5. Shu-Ning Liu
6. Xue Liu
7. Xiaoyun Pan
8. Yinqi Bai
9. Zan Xu
10. Zheng Wang
11. Jieping Ye
12. Edward C. Holmes
13. Bo Li
14. Yao-Qing Chen
15. Zhao-Rong Li
16. Mang Shi
This article has no evaluationsLatest version Jun 20, 2025
Atomic resolution ensembles of intrinsically disordered and multi-domain proteins with Alphafold

This article has 4 authors:
1. Vincent Schnapka
2. Tatiana Morozova
3. Samiran Sen
4. Massimiliano Bonomi
This article has no evaluationsLatest version Jun 21, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Related articles

Categorizing prediction modes within low-pLDDT regions of AlphaFold2 structures

Predicting the Evolutionary and Functional Landscapes of Viruses with a Unified Nucleotide-Protein Language Model: LucaVirus

Atomic resolution ensembles of intrinsically disordered and multi-domain proteins with Alphafold