Prediction of Anti‐Freezing Proteins From Their Evolutionary Profile

Nishant Kumar
Shubham Choudhury
Nisha Bajiya
Sumeet Patiyal
Gajendra P. S. Raghava

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (Arcadia Science)

Abstract

Prediction of antifreeze proteins (AFPs) holds significant importance due to their diverse applications in healthcare. An inherent limitation of current AFP prediction methods is their reliance on unreviewed proteins for evaluation. This study evaluates, proposed and existing methods on an independent dataset containing 80 AFPs and 73 non‐AFPs obtained from Uniport, which have been already reviewed by experts. Initially, we constructed machine learning models for AFP prediction using selected composition‐based protein features and achieved a peak AUROC of 0.90 with an MCC of 0.69 on the independent dataset. Subsequently, we observed a notable enhancement in model performance, with the AUROC increasing from 0.90 to 0.93 upon incorporating evolutionary information instead of relying solely on the primary sequence of proteins. Furthermore, we explored hybrid models integrating our machine learning approaches with BLAST‐based similarity and motif‐based methods. However, the performance of these hybrid models either matched or was inferior to that of our best machine‐learning model. Our best model based on evolutionary information outperforms all existing methods on independent/validation dataset. To facilitate users, a user‐friendly web server with a standalone package named “AFPropred” was developed ( https://webs.iiitd.edu.in/raghava/afpropred ).

Version published to 10.1002/pmic.202400157
Sep 20, 2024
Arcadia Science
Jun 21, 2024

Once the model was evaluated, we chose our top-performing model for further analysis, in which we integrated the evolutionary features with composition-based features and the ML score with the BLAST score and named the hybrid methods

It's a bit confusing to me why you would carry out your model selection procedure without using the intended feature-set. My worry would be that certain models might perform better or worse with different feature types and that you might be missing that here. Could you elaborate on this?

Read the original source
Arcadia Science
Jun 21, 2024

Feature selection techniques

Did you also consider exploring the use of regularization? I Would suggest looking into the use of L1 or L2 regularization to reduce the contribution of some of your features to mitigate the potential for overfitting.

Read the original source
Arcadia Science
Jun 21, 2024

PSSM-400

Is there a particular reason you chose this specific PSSM profile over the alternatives?

Read the original source
Arcadia Science
Jun 21, 2024

The reliability of a method depends on the quality of the dataset used for training and evaluation.

Along these lines, it's important to make significant efforts to identify sources of bias in your training dataset and mitigate their potential impact on predictions. This is true for the training set, but it's similarly true for the test (referred to here as the validation) set - if the test set is biased or imbalanced with respect to some relevant biological feature, then the resultant prediction accuracies may not reflect true model performance.

What I would like to see explored more thoroughly here is whether there are taxonomic biases in this curated set of proteins used in training and testing of your model? If, for instance, some species/taxonomic groups are disproportionately represented in both your training and validation …

The reliability of a method depends on the quality of the dataset used for training and evaluation.

Along these lines, it's important to make significant efforts to identify sources of bias in your training dataset and mitigate their potential impact on predictions. This is true for the training set, but it's similarly true for the test (referred to here as the validation) set - if the test set is biased or imbalanced with respect to some relevant biological feature, then the resultant prediction accuracies may not reflect true model performance.

What I would like to see explored more thoroughly here is whether there are taxonomic biases in this curated set of proteins used in training and testing of your model? If, for instance, some species/taxonomic groups are disproportionately represented in both your training and validation sets, potentially leading to elevated prediction accuracies.

Read the original source
Arcadia Science
Jun 21, 2024

two datasets, the main and the validation

This is a bit confusing - common terminology to use here would be to refer to this as the training dataset (subsequently subdivided into the K-folds for cross-validation), and the latter as the test set, rather than the validation set as you've done here.

Read the original source
Arcadia Science
Jun 21, 2024

The major limitations of the existing methods is their dataset, as these methods have been evaluated on unreviewed data.

Can you elaborate on this some? As is it's not clear whether all of the listed studies above have evaluated their methods on unreviewed data, and for those that have, what the sources of the unreviewed data are.

Read the original source
Version published to 10.1101/2024.04.28.591577v1 on bioRxiv
Apr 30, 2024

Protein-protein interaction prediction in the pre- and post-AlphaFold era: the 8th CAPRI evaluation

This article has 6 authors:
1. Marc Lensink
2. Nessim Raouraoua
3. Guillaume Brysbaert
4. Sameer Velankar
5. Shoshana Wodak
6. Alexandre M.J.J. Bonvin
This article has no evaluationsLatest version May 26, 2025
Benchmarking of Quantum SVM and Classical ML Algorithms for Prediction of Therapeutic Proteins

This article has 3 authors:
1. Purva Tijare
2. Naman Kumar Mehta
3. Gajendra P. S. Raghava
This article has no evaluationsLatest version May 7, 2025
Exploring Protein Patterns, Cavity Interactions, and Therapeutic Insights in Cancer

This article has 3 authors:
1. Paloma Tejera-Nevado
2. Belén Otero-Carrasco
3. Alejandro Rodríguez-González
This article has no evaluationsLatest version Jun 6, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Related articles

Protein-protein interaction prediction in the pre- and post-AlphaFold era: the 8th CAPRI evaluation

Benchmarking of Quantum SVM and Classical ML Algorithms for Prediction of Therapeutic Proteins

Exploring Protein Patterns, Cavity Interactions, and Therapeutic Insights in Cancer