Harmonizing Heterogeneous Single-cell Gene Expression Data with Individual-level Covariate Information
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
The growing availability of single-cell RNA sequencing (scRNA-seq) data highlights the necessity for robust integration methods to uncover both shared and unique cellular features across samples. These datasets often exhibit technical variations and biological differences, complicating integrative analyses. While numerous integration methods have been proposed, many fail to account for individual-level covariates or are limited to discrete variables.
Results
To address these limitations, we propose scINSIGHT2, a generalized linear latent variable model that accommodates both continuous covariates, such as age, and discrete factors, such as disease conditions. Through both simulation studies and real-data applications, we demonstrate that scINSIGHT2 accurately harmonizes scRNA-seq datasets, whether from single or multiple sources. These results highlight scINSIGHT2’s utility in capturing meaningful biological insights from scRNA-seq data while accounting for individual-level variation.
Availability and implementation
The scINSIGHT2 method has been implemented as an R package, which is available at https://github.com/yudimu/scINSIGHT2/ .