IntegrateRigor: annotation-free integration optimization for cell identity recovery reveals cancer–immune interface niches

Zhiqian Zhai
Changhu Wang
Chengfeng Jiang
Ziqi Rong
Jingyi Jessica Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Integrating single-cell and spatial transcriptomics data across batches is essential for recovering comparable cell identities—including cell types, subtypes, and states—as a prerequisite for downstream analyses in multi-condition and large-scale studies. This task remains challenging because between-batch variation removal often conflicts with cell identity preservation, and current methods typically rely on generic highly variable gene selection and lack principled metrics for hyperparameter tuning when cell identity annotations are unavailable. Together, these limitations often lead to over-integration, which merges biologically distinct cell identities, or under-integration, which leaves cells separated by batch rather than identity. Here we introduce IntegrateRigor, a data-driven, annotation-free, method-agnostic framework that optimizes integration specifically for reliable cell identity recovery across batches. IntegrateRigor first selects genes whose expression patterns are stable across batches using a gene-wise likelihood-based batch stability score, excluding batch-sensitive genes that can bias cell identity alignment during integration. It then identifies the optimal integration configuration across methods and hyperparameters by defining a dataset-level integration score that explicitly balances between-batch variation removal against cell identity preservation, without requiring prior annotations. In a colorectal cancer single-cell and spatial transcriptomics dataset, IntegrateRigor revealed previously uncharacterized cancer–immune interface niches in the tumor microenvironment that were masked by under-integration under default settings and by over-integration in previous literature. Across diverse datasets spanning multiple sources of between-batch variation, IntegrateRigor consistently improved cell identity recovery by mitigating both over-integration and under-integration across five state-of-the-art methods. By transforming integration from a heuristic preprocessing step into a statistically principled, dataset-adaptive procedure for cell identity recovery, IntegrateRigor improves the reproducibility and biological discovery power of large-scale single-cell and spatial transcriptomics analyses.

Version published to 10.64898/2026.05.14.725078 on bioRxiv
May 17, 2026

Pathway-informed Universal Domain Adaptation for Single-cell RNA-seq Data

This article has 6 authors:
1. Xinrong Wei
2. Xingyi Li
3. Huan Liu
4. Gaoyuan Du
5. Feng Wei
6. Xuequn Shang
This article has no evaluationsLatest version May 11, 2026
Clonal embeddings allow exploratory analysis of lineage-resolved single-cell data

This article has 4 authors:
1. Sergey Isaev
2. Alek G Erickson
3. Igor Adameyko
4. Peter V Kharchenko
This article has no evaluationsLatest version May 5, 2026
CHAMPOLLION: Robust Multi-Omics Integration via Inverse Optimal Transport Using Paired Cells

This article has 3 authors:
1. Jules Samaran
2. Gabriel Peyré
3. Laura Cantini
This article has no evaluationsLatest version Apr 30, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Pathway-informed Universal Domain Adaptation for Single-cell RNA-seq Data

Clonal embeddings allow exploratory analysis of lineage-resolved single-cell data

CHAMPOLLION: Robust Multi-Omics Integration via Inverse Optimal Transport Using Paired Cells