Navigating the Unstructured by Evaluating AlphaFold’s Efficacy in Predicting Missing Residues and Structural Disorder in Proteins
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study explored the difference between predicted structure confidence and disorder detection in protein, focusing on regions with undefined structures detected as missing segments in X-ray crystallography and Cryo-EM data. Recognizing the importance of these ‘unstructured’ regions for protein functionality, we examined the alignment of numerous protein sequences with their resolved or not structures. The research utilized a comprehensive PDB dataset, classifying residues into ‘modeled’, ‘hard missing’ and ‘soft missing’ based on their visibility in structural data. By analysis, key features were firstly determined, including confidence score pLDDT from Al-phaFold2, an advanced AI-based tool, and IUPred, a conventional disorder prediction method. Our analysis reveals that "hard missing" residues often reside in low-confidence regions, but are not exclusively associated with disorder predictions. It was assessed how effectively individual key features can distinguish between structured and unstructured data, as well as the potential benefits of combining these features for advanced machine learning applications. This approach aims to uncover varying correlations across different experimental methodologies in the latest structural data. By analyzing the relationships between predictions and experimental structures, we can more effectively identify structural targets within proteins, guiding experimental designs toward areas of potential functional significance, whether they exhibit high stability or crucial unstructured regions.