InteracTor: A new integrative feature extraction toolkit for improved characterization of protein structural properties

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Understanding the structural and functional diversity of protein families is crucial for elucidating their biological roles. Traditional analyses often focus on primary and secondary structures, which include amino acid sequences and local folding patterns like alpha helices and beta sheets. However, primary and secondary structures alone may not fully represent the complex interactions within proteins. To address this limitation, we developed a new algorithm (InteracTor) to analyze proteins by extracting features from their three-dimensional (3D) structures. The toolkit extracts interatomic interaction features such as hydrogen bonds, van der Waals interactions, and hydrophobic contacts, which are crucial for understanding protein dynamics, structure, and function. Incorporating 3D structural data and interatomic interaction features provides a more comprehensive understanding of protein structure and function, potentially enhancing downstream predictive modeling capabilities. By using the extracted features in Mutual Information scoring (MI), Principal Component Analysis (PCA), t-distributed Stochastic Neighbor Embedding (t-SNE), Uniform Manifold Approximation and Projection (UMAP), and hierarchical clustering analysis as use cases, we identified clear separations among protein structural families, highlighting distinct functional aspects. Our analysis revealed that interatomic interaction features were more informative than protein secondary structure features, providing insights into potential structural and functional properties. These findings underscore the significance of considering tertiary structure in protein analysis, offering a robust framework for future studies aiming at enhancing the capabilities of models for protein function prediction and drug discovery.

Article activity feed

  1. Among the 12 most highly ranked features across protein families are hydrogen bonds (MI=0.775), total surface tension (MI=0.763), london dispersion forces (MI=0.758), repulsive interactions (MI=0.722), internal tension (MI=0.708), ASA (MI=0.694), hydrophobic contacts (MI=0.561), TG frequency (MI=0.562), internal hydrophobicity (MI=0.561), VN frequency (MI=0.556), total hydrophobicity (MI=0.539), and GG frequency (MI=0.509).

    This is really interesting! I think it could also be interesting to see if any of the features (these or others) correlate or if any features could be predictive of others?

  2. Here we present InteracTor, a new toolkit for the extraction of three types of protein feature encodings: interaction features, physicochemical features, and compositional features.

    This is super cool! I can't wait to try it out!

  3. Extract atom, residue, and sequence information from PDB file (Figure 1A): This step involves parsing the Protein Data Bank (PDB) file to obtain the atomic types, 3D coordinates, and the amino acid sequence of the protein

    I'm curious if you can use this with structures predicted by AlphaFold or ESMFold. Related to that, I'm curious if you need to do any sort of pre-processing of the structures (mostly for AlphaFold and ESMFold structures because they're known to not always have optimal side chain placement).