A multitask single-cell analysis framework with knowledge graph as a prior
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Single-cell RNA sequencing has reshaped cancer biology by revealing the cellular and molecular diversity of the tumor microenvironment, yet most analysis pipelines still treat clustering, annotation, pathway scoring, metabolic inference, and cell–cell communication as separate tasks. Here, we present scKNIFE (Knowledge graph-based NMF for Inference of Functional Entities), a unified graph-regularized non-negative matrix factorization framework that jointly reconstructs single-cell expression profiles and infers activities of diverse biological entities, including pathways, cell-type markers, metabolic reactions and tasks, ligand–receptor interactions, and cell-state programs. By embedding a heterogeneous knowledge graph directly into the factorization objective, scKNIFE propagates information between biologically related entities through Laplacian regularization while sparsity constraints preserve interpretability. The integrated prior graph spans 47,274 nodes and 506,620 edges, assembled from complementary resources including Gene Ontology, Reactome, Hallmark gene sets, Human-GEM, CellChat, PanglaoDB, Cytopus, and Tabula Muris-derived annotations. Across multiple cancer-focused single-cell datasets, scKNIFE yields competitive to leading performance for clustering and cell-type annotation, while also recovering biologically coherent metabolic programs that agree with dedicated metabolism-focused methods and capture therapy-response-associated states. In addition, the framework supports downstream inference of cell-type-specific biological activities from a single latent representation. Together, these results establish scKNIFE as a modular and extensible framework for end-to-end biological interpretation of single-cell cancer transcriptomics