Novel Parameter-Free and Interpretable Integration of CITE-seq RNA and ADT Profiles via Tensor Decomposition-Based Unsupervised Feature Extraction

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

CITE-seq jointly profiles cellular transcripts and surface proteins, but integrating RNA and antibody-derived tags (ADTs) remains challenging because the two modalities differ markedly in dimensionality, sparsity, and noise characteristics. We present a tensordecomposition-based unsupervised feature extraction framework for the parameter-free integration of CITE-seq data. By constructing a gene × cell × protein tensor and applying HOSVD, this method derives the shared latent representations of genes, cells, and proteins without prior gene filtering or modality-weight tuning. Across five ImmGen T-cell CITE-seq datasets, the resulting cell embeddings were generally more consistent with annotated cell types than RNA-only, protein-only, or totalVI-based embeddings, whereas the organ-level consistency did not improve. The latent factors also enabled post hoc unsupervised gene selection, and the selected genes showed biologically meaningful enrichment for T-cell-related terms. In addition, failure in a poor-quality dataset served as a useful quality-control signal. Together with a blocked sparse-matrix implementation for large tensors, these results indicate that tensor decomposition-based unsupervised feature extraction provides an interpretable, scalable, and competitive approach for integrating RNA and ADT measurements in CITE-seq experiments.

Article activity feed