Persistent homology centrality improves link prediction performance in Pubmed co-occurrence networks

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper provides a novel approach to understanding the nature of innovation and scientific progress by analyzing large-scale datasets of scientific literature. A new measure of novelty potential or disruptiveness for a set of scientific entities is proposed, based in the mathematical formalism of algebraic topology via a method called persistent homology . In this framework, understanding where academic ideas depart from the existing body of knowledge to fill knowledge gaps is key to scoring a set of entities and their potential for filling future knowledge gaps. This framework is motivated by the assumption that scientific discovery has underlying regularities that can be modeled and predicted.

Our method uses a filtration , which is a type of ranking of hypergraph components along a chosen parameter. In this work two different axes are used, which constructs a growing grid of sub-hypergraphs. The axes of time (scientific knowledge evolution) and normalized point-wise mutual information (network structure) affords the ability to succinctly represent the entire dynamic structure of the scientific literature network. We then find that using very simple and interpretable measures of centrality derived from this crude bifiltration or vineyard affords the ability to predict links within the dynamic scientific network.

While several different methods of link prediction have been proposed in the past, the method presented here extends the notion of link prediction to a higher dimension, as the boundary of the knowledge gap may be more than just 0-dimensional nodes.

The system presented here not only suggests a mathematical basis, consistent with observations in cognitive neurosciences regarding early childhood language acquisition, but additionally provides useful applications for the scientific community in predicting and ranking hypothesis for scientific discovery.

Article activity feed