Improving Protein Interaction Prediction in GenPPi: A Novel Interaction Sampling Approach Preserving Network Topology

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background: Computational prediction of protein-protein interactions (PPIs) is crucial for understanding cell biology and drug development, offering an alternative to costly experimental methods. The original GenPPi software advanced ab initio PPI network prediction from bacterial genomes, but was limited by its reliance on high sequence similarity. This work introduces GenPPi 1.5 to enhance these predictive capabilities. Results: GenPPi 1.5 incorporates a Random Forest (RF) algorithm, trained on 60 biophysical features from amino acid propensity indices, to classify protein similarity even in low sequence identity scenarios (targeting >65% identity). To manage computational complexity from the increased interactions generated by the RF model, especially in extensive conserved phylogenetic profiles, we developed and integrated the Reduced Interaction Sampling (RIS) algorithm. RIS stochastically samples interactions within these profiles, optimizing performance for complete genome analysis. Extensive simulations across various configurations validated the methodology. RF integration significantly broadened GenPPi's predictive power; application to Buchnera aphidicola showed up to 62% overlap with STRING database interactions. Analysis of RIS demonstrated that while introducing some randomness, critical node identification remains robust, particularly for Top N values greater than 100, indicating minimal compromise to network integrity. Conclusion: The combination of Machine Learning (RF) and the RIS algorithm in GenPPi 1.5 represents a significant advancement. It overcomes the high-similarity dependency of the previous version while efficiently handling complex genomes. GenPPi 1.5 provides a robust and scalable alignment-free PPI prediction solution, enabling users to train custom models tailored to specific genomic contexts. GenPPi is freely available on our website (https://genppi.facom.ufu.br/), its source code is hosted on GitHub (https://github.com/santosardr/genppi), and it can be easily installed via the Python Package Index using the command pip install genppi-py.

Article activity feed