Improving Protein Interaction Prediction in GenPPi: A Novel Interaction Sampling Approach Preserving Network Topology

Alisson William
Iury Godoy
Carlos Marquez
Lucas Silva
Matheus Prado
Natanael Avila
Anderson Rodrigues dos Santos

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background: Computational prediction of protein-protein interactions (PPIs) is crucial for understanding cell biology and drug development, offering an alternative to costly experimental methods. The original GenPPi software advanced ab initio PPI network prediction from bacterial genomes, but was limited by its reliance on high sequence similarity. This work introduces GenPPi 1.5 to enhance these predictive capabilities. Results: GenPPi 1.5 incorporates a Random Forest (RF) algorithm, trained on 60 biophysical features from amino acid propensity indices, to classify protein similarity even in low sequence identity scenarios (targeting >65% identity). To manage computational complexity from the increased interactions generated by the RF model, especially in extensive conserved phylogenetic profiles, we developed and integrated the Reduced Interaction Sampling (RIS) algorithm. RIS stochastically samples interactions within these profiles, optimizing performance for complete genome analysis. Extensive simulations across various configurations validated the methodology. RF integration significantly broadened GenPPi's predictive power; application to Buchnera aphidicola showed up to 62% overlap with STRING database interactions. Analysis of RIS demonstrated that while introducing some randomness, critical node identification remains robust, particularly for Top N values greater than 100, indicating minimal compromise to network integrity. Conclusion: The combination of Machine Learning (RF) and the RIS algorithm in GenPPi 1.5 represents a significant advancement. It overcomes the high-similarity dependency of the previous version while efficiently handling complex genomes. GenPPi 1.5 provides a robust and scalable alignment-free PPI prediction solution, enabling users to train custom models tailored to specific genomic contexts. GenPPi is freely available on our website (https://genppi.facom.ufu.br/), its source code is hosted on GitHub (https://github.com/santosardr/genppi), and it can be easily installed via the Python Package Index using the command pip install genppi-py.

Version published to 10.1101/2024.08.16.608332 on bioRxiv
Aug 19, 2024

The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026
Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome

This article has 7 authors:
1. Valentina Carbonari
2. Annamaria Defilippo
3. Ugo Lomoio
4. Caterina Francesca Perri
5. Barbara Puccio
6. Pierangelo Veltri
7. Pietro Hiram Guzzi
This article has no evaluationsLatest version Dec 23, 2025
The Deep Core: Mapping the 0.91% Regulatory Backbone of the Human Proteome and Its Role in Cancer Drug Resistance

This article has 1 author:
1. Andres Pirolo
This article has no evaluationsLatest version Feb 4, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Evolution of the AlphaFold Architecture

Artificial Intelligence–Driven Structural Mining Enables Functional Inference in the Human Dark Proteome

The Deep Core: Mapping the 0.91% Regulatory Backbone of the Human Proteome and Its Role in Cancer Drug Resistance