BioBatchNet: A Dual-Encoder Framework for Robust Batch Effect Correction in Imaging Mass Cytometry
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Imaging Mass Cytometry (IMC) is a cutting-edge technology for analysing spatially resolved protein expression at the single-cell level. However, its downstream analyses are often hindered by batch effects, which introduce systematic biases and obscure true biological variations. Existing correction methods, largely developed for scRNA-seq data, struggle to achieve precise control, leading to either over-correction by removing critical biological information, or under-correction by leaving residual batch effects. Moreover, these methods face challenges in adapting to IMC data due to differences in data characteristics. Furthermore, IMC data often feature imbalanced and overlapping cell populations, complicating clustering and downstream analysis. These challenges underscore the need for a robust and controllable batch effect correction approach tailored to IMC data.
Results
We present BioBatchNet, a dual-encoder framework utilising adversarial training to explicitly disentangle batch-specific and biological signals. BioBatchNet enables controllable batch effect correction, effectively balancing correction with the preservation of biological variation. We evaluated BioBatchNet on three IMC datasets, where it outperformed seven benchmarking methods with robustness in both correction and biological signal conservation. Additionally, we developed a Constrained Pairwise Clustering (CPC) method, which employs constrained pairs to improve clustering performance, even in datasets with imbalanced and overlapping cell populations. To validate its generalisability, BioBatchNet was also applied to four scRNA-seq datasets, where it delivered competitive performance compared to eight typical methods. These results demonstrate BioBatchNet’s generalisability and robustness in correcting batch effects across diverse single-cell datasets and underscore its potential for large-scale biological analyses.