The influence of reduced amino acid alphabets on prediction orthologous protein thermostability

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Sequence feature plays a vital role in determining protein thermostability. Considering the advantages of reduced amino acid alphabets (RAAs) in reducing data complexity and retaining key sequence information, we evaluate the performance of 672 RAAs on prediction orthologous protein thermostability. By calculating the Amino Acid Composition, Dipeptide Composition, Tripeptide Composition of reduced sequence features and building random forest regression models, we find that 10 RAAs based on the fuzzy clustering algorithm are suitable to predict the thermostability difference of orthologous protein pairs and significantly improve the prediction efficiency. Further, the melting temperature difference Δ T m caused by point mutation is predicted, and it is found that the RAA of EQ-H-K-DN-IL-P-T-FY-M-R-S-W-A-C-G-V could fit the tiny thermostability change caused by point mutation. Our work showcases that the reduction methods based on fuzzy clustering can effectively retain the key sequence features that affect protein thermostability, resulting in reducing the computational complexity and increasing the prediction accuracy.

Article activity feed