The probability of edge existence due to node degree: a baseline for network-based predictions

Michael Zietz
Daniel S Himmelstein
Kyle Kloster
Christopher Williams
Michael W Nagle
Casey S Greene

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (GigaScience)

Abstract

Important tasks in biomedical discovery such as predicting gene functions, gene–disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network’s specific connections using network permutation to generate features that depend only on degree. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Researchers seeking to predict new or missing edges in biological networks should use our permutation approach to obtain a baseline for performance that may be nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).

GigaScience
Mar 4, 2024

AbstractImportant tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network’s specific connections. We discover that performance attributable to factors other than degree is often only a small portion of …

AbstractImportant tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network’s specific connections. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Degree’s predictive performance diminishes when the networks used for training and testing—despite measuring the same biological relationships—were generated using distinct techniques and hence have large differences in degree distribution. We introduce the permutation-derived edge prior as the probability that an edge exists based only on degree. The edge prior shows excellent discrimination and calibration for 20 biomedical networks (16 bipartite, 3 undirected, 1 directed), with AUROCs frequently exceeding 0.85. Researchers seeking to predict new or missing edges in biological networks should use the edge prior as a baseline to identify the fraction of performance that is nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).

This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giae001), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

Reviewer 2: Linlin Zhuo

In this manuscript, the authors introduce a network permutation framework to quantify the effects of node degree on edge prediction. The importance of degree in the edge detection task is self-evident, and the quantification of this effect is undoubtedly groundbreaking. The experimental results on a variety of datasets demonstrate the advanced nature of the method proposed by the authors. However, some parts require further explanation from the authors and can be considered for acceptance in a later stage.

1.The imbalance of the degree distribution has a significant impact on the results of the edge detection task. In this manuscript, the author proposes a framework to quantify this impact. It is important to note that the manuscript does not explicitly mention the specific form in which the quantification is reflected, such as whether it is presented as an indicator or in another form. Therefore, further explanation from the author is needed to clarify this aspect.

2.The authors propose that researchers employ marginal priors as a reference point to discern the contributions attributed to node degree from those arising from specific performance. It would be helpful if the authors could elaborate further on the methodology or provide a sample demonstration to clarify the implementation of this approach.

3.For the XSwap algorithm, I wonder that if the authors could provide a more detailed explanation of its workings, including a step-by-step implementation of the improved XSwap. Furthermore, it would be beneficial if the authors could highlight the significance of the improved XSwap algorithm in biomedical tasks.

4.The author presents the pseudocode of the XSwap algorithm in Figure 2, along with the improved pseudocode after the author's enhancements. Both pseudocodes are accompanied by explanatory text. However, I believe that expressing them in the form of a figure would make it more visually appealing and intuitive.

5.The authors introduce the edge prior to quantify the probability of two nodes being connected solely based on their degree. I request the authors to provide a detailed explanation of the specific implementation of the edge prior.

6.In the "Prediction tasks" section, the author utilizes three prediction tasks to assess the performance of the edge prior. It is recommended to segment correctly for better display of the content.

7.The focus of the article might not be prominent enough. It is advisable for the author to provide further elaboration on the advanced nature of the proposed framework and its significance in practical tasks. This would help emphasize the main contributions of the research and its relevance in real-world applications.

Read the original source
GigaScience
Mar 4, 2024

AbstractImportant tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network’s specific connections. We discover that performance attributable to factors other than degree is often only a small portion of …

AbstractImportant tasks in biomedical discovery such as predicting gene functions, gene-disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network’s specific connections. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Degree’s predictive performance diminishes when the networks used for training and testing—despite measuring the same biological relationships—were generated using distinct techniques and hence have large differences in degree distribution. We introduce the permutation-derived edge prior as the probability that an edge exists based only on degree. The edge prior shows excellent discrimination and calibration for 20 biomedical networks (16 bipartite, 3 undirected, 1 directed), with AUROCs frequently exceeding 0.85. Researchers seeking to predict new or missing edges in biological networks should use the edge prior as a baseline to identify the fraction of performance that is nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).

This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giae001), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

Reviewer 1: Babita Pandey

The manuscript "The probability of edge existence due to node degree: a baseline for network-based predictions" presents novel work. But some of the sections are written very briefly, so it is difficult to understand. The section that needs revision are: Degree-grouping, The edge prior encapsulates degree, Degree can underly a large fraction of performance and Analytical approximation of the edge prior. The result section needs revision.

Some other concerns are: Academic adhar, Jaccard coefficient, preferential atachment etc are link prediction methods. Why auther has termed them as edge prediction features.

Read the original source
Version published to 10.1093/gigascience/giae001
Jan 1, 2024
Version published to 10.1101/2023.01.05.522939 on bioRxiv
Jan 6, 2023

Inferring and Evaluating Network Medicine-Based Disease Modules with Nextflow

This article has 15 authors:
1. Johannes Kersting
2. Chloé Bucheron
3. Lisa M. Spindler
4. Joaquim Aguirre-Plans
5. Quirin Manz
6. Tanja Pock
7. Mo Tan
8. Fernando M. Delgado-Chaves
9. Cristian Nogales
10. Harald H. H. W. Schmidt
11. Jörg Menche
12. Andreas Maier
13. Jan Baumbach
14. Emre Guney
15. Markus List
This article has no evaluationsLatest version Nov 20, 2025
Optimal Inference of Asynchronous Boolean Network Models

This article has 1 author:
1. Guy Karlebach
This article has no evaluationsLatest version Dec 19, 2025
SignifiKANTE: Efficient P -value computation for gene regulatory networks

This article has 5 authors:
1. Fabian Woller
2. Paul Martini
3. Souptik Sen
4. David B. Blumenthal
5. Anne Hartebrodt
This article has no evaluationsLatest version Nov 14, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Inferring and Evaluating Network Medicine-Based Disease Modules with Nextflow

Optimal Inference of Asynchronous Boolean Network Models

SignifiKANTE: Efficient P -value computation for gene regulatory networks