Harmonizing Heterogeneous Single-cell Gene Expression Data with Individual-level Covariate Information

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Motivation

The growing availability of single-cell RNA sequencing (scRNA-seq) data highlights the necessity for robust integration methods to uncover both shared and unique cellular features across samples. These datasets often exhibit technical variations and biological differences, complicating integrative analyses. While numerous integration methods have been proposed, many fail to account for individual-level covariates or are limited to discrete variables.

Results

To address these limitations, we propose scINSIGHT2, a generalized linear latent variable model that accommodates both continuous covariates, such as age, and discrete factors, such as disease conditions. Through both simulation studies and real-data applications, we demonstrate that scINSIGHT2 accurately harmonizes scRNA-seq datasets, whether from single or multiple sources. These results highlight scINSIGHT2’s utility in capturing meaningful biological insights from scRNA-seq data while accounting for individual-level variation.

Availability and implementation

The scINSIGHT2 method has been implemented as an R package, which is available at https://github.com/yudimu/scINSIGHT2/ .

Article activity feed