sabinaHSBM : An R package for link prediction network reconstruction using Hierarchical Stochastic Block Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

  • Network analysis is a powerful framework for investigating complex systems across disciplines, including ecology, where it helps uncover patterns in predator–prey, host– parasite, or plant–pollinator interactions. However, ecological network data are often incomplete or error-prone due to sampling limitations, detection failures, and taxonomic uncertainty—leading to missing (false negative) and spurious (false positive) links that obscure structure and hinder inference. The hierarchical stochastic block model (HSBM), particularly in its degree-corrected form, is among the most effective tools for reconstructing networks under such uncertainty. Despite its robustness, the primary implementation of HSBM in the Python-based graph-tool library has remained largely inaccessible to ecologists.

  • Here, we introduce sabinaHSBM , the first R package that makes degree-corrected HSBM broadly available through a user-friendly, flexible workflow. By bridging a gap between advanced network modeling and widely used ecological analysis platforms, sabinaHSBM facilitates network reconstruction and link prediction from binary bipartite data. The workflow involves three main steps: (1) preparing input data, (2) estimating posterior link probabilities, and (3) reconstructing the network. The package supports detection of undocumented and spurious links, exploration of hierarchical structure, and propagation of uncertainty throughout. Key features include cross-validation, flexible thresholding, probabilistic evaluation metrics, and two link prediction modes: estimating all link probabilities or identifying undocumented ones.

  • We illustrate the package’s functionality through a case study using a published global dataset of carnivore–parasite associations, showing that inferred groupings are phylogenetically clustered. To assess predictive accuracy, we examined the top 10 highest-probability links identified by the model and found published evidence for 8, despite their absence from the original dataset. This highlights the model’s ability to recover biologically meaningful but underreported interactions.

  • By integrating all components of HSBM-based reconstruction into an accessible R package, sabinaHSBM empowers researchers to improve relational data quality and uncover overlooked patterns in complex ecological networks and beyond.

  • Article activity feed