Scalable Microbiome Network Inference: Mitigating Sparsity and Computational Bottlenecks in Random Effects Models

Debarshi Roy
Tarini Shankar Ghosh

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The application of Large Language Models (LLMs) and Transformers to biological and healthcare datasets requires the extraction of highly accurate, noise-filtered ecological networks. The Random Effects Model (REM) is a powerful statistical method for inferring microbial interaction networks and identifying keystone species across heterogeneous studies. However, existing implementations in R that rely on single-threaded “Iteratively Reweighted Least Squares” (IRLS) are computationally prohibitive for high-dimensional metagenomic data, creating a significant bottleneck for downstream machine learning pipelines. In this paper, we present Parallel-REM, a highly scalable, Python-based parallel pipeline accelerating large-scale network inference. By integrating robust variance filtering, sparsity checks, and a batched Master-Worker parallelisation strategy using joblib and statsmodels , we resolve native convergence failures associated with sparse biological matrices. Benchmarking on a massive clinical dataset comprising 70,185 samples and 466 optimal species demonstrates a 26.1x speedup over sequential baselines on a 64-core architecture, reducing computation time from days to minutes. Furthermore, statistical validation shows > 99.9% directional concordance with the original R implementation. Parallel-REM democratises largescale network extraction, providing the high-throughput infrastructure necessary to feed clean, topological and biological features into modern deep learning and Transformer-based diagnostic architectures.

Version published to 10.64898/2026.03.27.714858 on bioRxiv
Mar 31, 2026

Robust causal gene network estimation for large-scale single-cell perturbation screens using reduced control function

This article has 2 authors:
1. Changhao Ge
2. Hongzhe Li
This article has no evaluationsLatest version Apr 21, 2026
LOCOM2: Robust Differential Abundance Analysis for Microbiome Data

This article has 3 authors:
1. Mengyu He
2. Glen A. Satten
3. Yi-Juan Hu
This article has no evaluationsLatest version Apr 9, 2026
The Second Brain: Diffusion Models for Realistic Human Microbiome Generation

This article has 2 authors:
1. Brandon Yee
2. Jiayi Fu
This article has no evaluationsLatest version May 11, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Robust causal gene network estimation for large-scale single-cell perturbation screens using reduced control function

LOCOM2: Robust Differential Abundance Analysis for Microbiome Data

The Second Brain: Diffusion Models for Realistic Human Microbiome Generation