Prediction and discovery of protein-protein direct interactions and stable complexes based on gene co-expression and co-evolution
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In this study we employed a data-driven approach to explore the evolutionary and genetic determinants of protein direct interactions and stable complex formation in the human proteome. We found that simple co-evolutionary and co-expression metrics are highly informative of direct interactions and stable complexes. We used this information to train supervised binary classifiers to predict interactions either directly involved in the formation of a complex (as annotated in IntAct) or forming stable complexes (from Complex Portal). In the former task, our model was able to discriminate direct interactions with an AUROC=0.813, while in the latter it discriminated interaction forming stable complexes with an AUROC=0.964. In both cases, our approach outperformed String, that we employed as a baseline. Feature importance analysis revealed different contributions to the prediction of these distinct interaction types. Co-evolutionary features, in particular those referred to protein domains involved in interaction interfaces, are more important to discriminate direct interactions. On the other hand, co-expression features contributed more to the prediction of stable complexes. From these pairwise predictions we generated a proteome-wide network that we clustered to assess the recovery of known complexes from Complex Portal within network communities. We were able to recover known complexes at a higher accuracy compared to other approaches.
In conclusion, we propose a new method able to discriminate direct interactions as well as forming stable complexes. This method can be used to stratify molecular interaction networks, as well as to perform discovery of new functional complexes at a proteome-wide scale.