ARCADIA Reveals Spatially Dependent Transcriptional Programs through Integration of scRNA-seq and Spatial Proteomics
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Cellular states are strongly influenced by spatial context, but single-cell RNA sequencing (scRNA-seq) loses information about local tissue organization, while spatial proteomic assays capture limited marker panels that constrain transcriptomic inference. Integrating these modalities can elucidate how spatial niches shape transcriptional programs, yet existing approaches depend on either feature-level correspondence such as gene–protein linkage or cell-level barcode pairing, which is often unavailable. We present ARCADIA (ARchetype-based Clustering and Alignment with Dual Integrative Autoencoders), a generative framework for cross-modal integration that operates without cell barcode pairing and does not assume direct feature-to-feature correspondence. ARCADIA identifies modality-specific archetypes, i.e., convex combinations of cells representing extreme phenotypic states, and aligns these archetype anchors across modalities by minimizing the discrepancy between their cell-type composition profiles. The aligned archetypes define a shared coordinate system that anchors a pair of dual variational autoencoders (VAEs) trained with cross-modal geometric regularization, preserving archetype structure and spatial neighborhood information while enabling bidirectional translation between modalities. On a semi-synthetic benchmark derived from paired CITE-seq and synthetic spatial grids, ARCADIA accurately recapitulates cell-type correspondences and spatially dependent subpopulation structures, outperforming existing weak-linkage methods. Applied to independent human tonsil scRNA-seq and CODEX data, ARCADIA reconstructs known tissue architecture and reveals spatially dependent transcriptional programs linking B-cell maturation and T-cell activation or exhaustion to their microenvironmental niches.
Availability
All data analyzed in this work have been previously published and are available in the original studies. ARCADIA is publicly accessible at https://github.com/azizilab/ARCADIA_public . The notebooks to reproduce figures and a preprocessed version of the semi-synthetic dataset are available at https://github.com/azizilab/arcadia reproducibility.