PLIERv2: bigger, better and faster

Marc Subirana-Granés
Sutanu Nandi
Haoyu Zhang
Maria Chikina
Milton Pividori

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Gene expression analysis has long been fundamental for elucidating molecular pathways and gene– disease relationships, but traditional single-gene approaches cannot capture the coordinated regulatory networks underlying complex phenotypes; although unsupervised matrix factorization methods (e.g., PCA, NMF) reveal coexpression patterns, they lack the ability to incorporate prior biological knowledge and often struggle with interpretability and technical noise correction. Semi-supervised strategies such as PLIER have improved interpretability by integrating pathway annotations during latent variable extraction, yet the original PLIER implementation is prohibitively slow and memory-intensive, making it impractical for modern large-scale resources like ARCHS4 or recount3. Here, we introduce PLIERv2, which overcomes these constraints through a two-phase algorithmic design (an unsupervised “PLIERbase” initialization followed by a “PLIERfull” regression that incorporates priors via glmnet), rigorous internal cross-validation to tune regularization parameters for each latent variable, and efficient on-disk data handling using memory-mapped matrices from the bigstatsr package. Benchmarking on GTEx, recount2, and ARCHS4 demonstrates that PLIERv2 achieves 7×–41× speedups over PLIERv1, succeeds in modeling hundreds of thousands of samples that PLIERv1 cannot handle, and maintains or improves biological specificity of latent variables as shown by tissue-alignment and pathway enrichment analyses. By filling the gap in scalable, biologically informed latent variable extraction, PLIERv2 enables comprehensive analysis of modern transcriptomic compendia and paves the way for deeper insights into gene regulatory networks and downstream applications in translational genomics.

Version published to 10.1101/2025.06.05.658122 on bioRxiv
Jun 9, 2025

ST-LDAW: A Topic-Model and Damped Weighted Least-Squares Method for Integrative Deconvolution of Single-Cell and Spatial Transcriptomics

This article has 8 authors:
1. Xiaoyang Wang
2. Dongmei Ai
3. Li C. Xia
4. HuiLing Liu
5. Lulu Chen
6. Zhimin Li
7. Yang Du
8. Yujia Li
This article has no evaluationsLatest version Jan 13, 2026
Understanding Pathways in Bioinformatics, Genomics, and Health Applications

This article has 1 author:
1. Diptarup Mallick
This article has no evaluationsLatest version Jan 19, 2026
Optimal Inference of Asynchronous Boolean Network Models

This article has 1 author:
1. Guy Karlebach
This article has no evaluationsLatest version Dec 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

ST-LDAW: A Topic-Model and Damped Weighted Least-Squares Method for Integrative Deconvolution of Single-Cell and Spatial Transcriptomics

Understanding Pathways in Bioinformatics, Genomics, and Health Applications

Optimal Inference of Asynchronous Boolean Network Models