Emergence of power-law distributions in protein-protein interaction networks through study bias

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This manuscript makes an important contribution to the understanding of protein-protein interaction (PPI) networks by challenging the widely held assumption that their degree distributions uniformly follow a power law. The authors present convincing evidence that biases in study design, such as data aggregation and selective research focus, may contribute to the appearance of power-law-like distributions. While the power law assumption has already been questioned in network biology, the methodological rigor and correction procedures introduced here are valuable for advancing our understanding of PPI network structure.

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

Protein-protein interaction (PPI) networks are power-law-distributed. However, the experimental procedures for detecting PPIs are affected by technical and study bias. For instance, cancer-associated proteins have received disproportional attention. Moreover, bait proteins in large-scale experiments tend to have many false-positive interaction partners. This raises the question of whether PL distributions in observed PPI networks could be explained by these biases alone. To assess this question, we studied the degree distribution of thousands of PPI networks of controlled provenance. Our findings are supported by mathematical models and extensive simulations and indicate that study bias and technical bias suffice to produce the observed PL distribution. It is, hence, problematic to derive hypotheses about the degree distribution and the true biological interactome from the PL distributions in observed PPI networks. Our study casts doubt on the use of the PL property of biological networks as a modeling assumption or quality criterion in network biology.

Article activity feed

  1. Author Response:

    eLife Assessment

    This manuscript makes an important contribution to the understanding of protein-protein interaction (PPI) networks by challenging the widely held assumption that their degree distributions uniformly follow a power law. The authors present convincing evidence that biases in study design, such as data aggregation and selective research focus, may contribute to the appearance of power-law-like distributions. While the power law assumption has already been questioned in network biology, the methodological rigor and correction procedures introduced here are valuable for advancing our understanding of PPI network structure.

    Thanks for this assessment which perfectly reflects our study.

    Reviewer #1 (Public Review):

    This manuscript was previously reviewed and this earlier evaluation resulted in two conflicting assessments. I fully endorse the favourable opinion of former Reviewer 1 and find most negative comments of former Reviewer 2 inappropriate.

    This work is absolutely necessary. Even though the authors find it difficult to be fully assertive in the end, their ground work in trying to demonstrate the existence of bias in PPI data is undeniably valuable. Other authors have tried before to show the limitation of unequivocally assigning the degree distribution to a power law but these doubts have had a weak impact. This new study is a great opportunity to discuss further a concern for a simplistic view of PPI network topology. The recent contribution of Broido & Clauset was definitely one to bounce on. The approach of this new manuscript is compelling. Dividing the study in several parts, each reflecting an attempt to bring out commonly used shortcuts in PPI network analyses, makes sense.

    Surprisingly, the authors do not refer to the endless controversy of labeling hubs as party or date, which is another manifestation of the interpretative bias of PPI data.

    This is a good point. In particular, it may be interesting if hub nodes that emerge from considering only prey interactions differ regarding party and date nodes. We now refer to this distinction in the Discussion:

    “[...] Further work will be needed to establish if true hub proteins exist in the PPI network and what their role is. For instance, it was previously claimed (Han et al., 2004) – and controversially discussed (Agarwal et al., 2010) – that the correlation of gene expression values between hub nodes with their interaction partners follows a bimodal distribution, leading to the distinction of party (high correlation) and date (low correlation) hubs. In the future, it would be interesting to study if the ratio of party and date hubs changes when considering prey degree only.”

    The only worthy point prompted by former Reviewer 2 is the effect of spoke expansion. In their response, the authors suggest that it would probably extend questioning and even if it is considered as future work, it could be mentioned in the main manuscript.

    Thank you for this comment. We agree that considering different expansion methods is an interesting research question regarding its effect on the PL property. We have added the following sentences to the Discussion to highlight the opportunity for future work:

    “[...] An additional complexity arising in AP-MS studies is that more than two interaction partners can be detected. These -ary interactions are commonly transformed into binary interactions using either the spoke model, which reports all interactions with the bait protein (as used by IntAct, for example), or the matrix expansion model, which reports all pairwise interactions. Both expansion models can, in principle, introduce false positives and it would be interesting to consider the effect of expansion model choice on the PL property in future work.”

    In the end, this submission is an invitation to constructively rethink the analysis of PPI networks and it feeds the discussion on modelling degree distributions that should not be considered as a solved issue.

    Reviewer #2 (Public Review):

    Many naturally occurring networks are assumed to have a power-law (PL) degree distribution. This assumption has certainly been widely held in the field of protein interactomes (PPIs), although important studies around 2010 have conclusively shown that many of these PL distributions are either the result of data mis-handling or of sloppy statistical procedures (see e.g. Porter and Stumpf in Science around 2014, which I would advise the authors to cite). The value of the present study is to introduce a new mechanism, experiment bias, to explain the appearance of such distributions in the PPI case, and in particular to show how correcting empirically for this mechanism can lead to a reappraisal of which proteins are genuine hubs in these networks. The claims are well supported by empirical evidence and some theoretical analysis. Overall, this is a worthwhile contribution and, while its significance is somewhat dented by the fact that the PL enthusiasm of many had already been tempered by the studies mentioned above,

    Thanks a lot for your constructive feedback. We now cite the work by Porter and Stumpf and have addressed your specific recommendations as detailed below.

    Reviewer #3 (Public Review):

    I would like to congratulate the authors to an impressive piece of work highlighting important real and potential biases, which may lead to power-law distributed node degrees in protein-protein interaction networks. This manuscript is easy to follow and very well written manuscript. I truly enjoyed the concise and convincing scientific presentation. Even if some of the concerns have already been discussed or raised in the past, the manuscript assesses potential biases in PPIs in a rigorous manner.

    I deem the following observations highly relevant to be communicated to the community again:

    (1) PL-like distributions emerge by aggregation of data sets alone.

    (2) Research interest in itself is PL-distributed and drives PL-like properties in PPI networks

    (3) Bait usage is a major driver of PL-like behaviour.

    (4) Accounting for biases changes the biological interpretation of the networks

    (5) Simulation studies further corroborate these findings.

    Thank you for this positive assessment of our work.

  2. eLife Assessment

    This manuscript makes an important contribution to the understanding of protein-protein interaction (PPI) networks by challenging the widely held assumption that their degree distributions uniformly follow a power law. The authors present convincing evidence that biases in study design, such as data aggregation and selective research focus, may contribute to the appearance of power-law-like distributions. While the power law assumption has already been questioned in network biology, the methodological rigor and correction procedures introduced here are valuable for advancing our understanding of PPI network structure.

  3. Reviewer #1 (Public Review):

    This manuscript was previously reviewed and this earlier evaluation resulted in two conflicting assessments. I fully endorse the favourable opinion of former Reviewer 1 and find most negative comments of former Reviewer 2 inappropriate.

    This work is absolutely necessary. Even though the authors find it difficult to be fully assertive in the end, their ground work in trying to demonstrate the existence of bias in PPI data is undeniably valuable. Other authors have tried before to show the limitation of unequivocally assigning the degree distribution to a power law but these doubts have had a weak impact. This new study is a great opportunity to discuss further a concern for a simplistic view of PPI network topology. The recent contribution of Broido & Clauset was definitely one to bounce on. The approach of this new manuscript is compelling. Dividing the study in several parts, each reflecting an attempt to bring out commonly used shortcuts in PPI network analyses, makes sense.

    Surprisingly, the authors do not refer to the endless controversy of labeling hubs as party or date, which is another manifestation of the interpretative bias of PPI data.

    The only worthy point prompted by former Reviewer 2 is the effect of spoke expansion. In their response, the authors suggest that it would probably extend questioning and even if it is considered as future work, it could be mentioned in the main manuscript.

    In the end, this submission is an invitation to constructively rethink the analysis of PPI networks and it feeds the discussion on modelling degree distributions that should not be considered as a solved issue.

  4. Reviewer #2 (Public Review):

    Many naturally occurring networks are assumed to have a power-law (PL) degree distribution. This assumption has certainly been widely held in the field of protein interactomes (PPIs), although important studies around 2010 have conclusively shown that many of these PL distributions are either the result of data mis-handling or of sloppy statistical procedures (see e.g. Porter and Stumpf in Science around 2014, which I would advise the authors to cite). The value of the present study is to introduce a new mechanism, experiment bias, to explain the appearance of such distributions in the PPI case, and in particular to show how correcting empirically for this mechanism can lead to a reappraisal of which proteins are genuine hubs in these networks. The claims are well supported by empirical evidence and some theoretical analysis. Overall, this is a worthwhile contribution although its significance is somewhat dented by the fact that the PL enthusiasm of many had already been tempered by the studies mentioned above.

  5. Reviewer #3 (Public Review):

    I would like to congratulate the authors to an impressive piece of work highlighting important real and potential biases, which may lead to power-law distributed node degrees in protein-protein interaction networks.
    This manuscript is easy to follow and very well written manuscript.
    I truly enjoyed the concise and convincing scientific presentation.
    Even if some of the concerns have already been discussed or raised in the past, the manuscript assesses potential biases in PPIs in a rigorous manner.

    I deem the following observations highly relevant to be communicated to the community again:
    (1) PL-like distributions emerge by aggregation of data sets alone.
    (2) Research interest in itself is PL-distributed and drives PL-like properties in PPI networks
    (3) Bait usage is a major driver of PL-like behaviour.
    (4) Accounting for biases changes the biological interpretation of the networks
    (5) Simulation studies further corroborate these findings.