reComBat-seq: Regularized negative binomial regression for batch-effect correction in underdetermined transcriptomics datasets
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Batch effect correction is essential for the integration of large-scale transcriptomics datasets such as single-cell RNA-seq or multi-study bulk RNA-seq datasets for reducing technical noise that may mask biological signal. Existing correction methods, either do not produce count data output which is crucial for state-of-the-art downstream analyses such as differential expression analysis or fail to converge in underdetermined study designs.
Results
We present reComBat-seq, a method that extends the Negative Binomial regression framework of ComBat-seq by incorporating Elastic Net regularization. This approach resolves problems with rank-deficient design matrices while also preserving the integer nature of count data. Benchmarking on simulated and real datasets such as single-cell RNA-seq data demonstrates that reComBat-seq successfully removes batch effects in complex study designs while maintaining compatibility with downstream differential expression tools.
Availability and Implementation
reComBat-seq source code can be found at https://github.com/menchelab/reComBat-seq . All code to reproduce the presented analyses can be found at https://github.com/menchelab/reComBatseq_Studies . Data produced in this study is available at https://doi.org/10.5281/zenodo.19736515 . Used single-cell RNA-seq data can be found at https://doi.org/10.5281/zenodo.14234956 .
Supplementary Information
Proofs and volcano plots of differential expression analysis