CellCover Defines Marker Gene Panels Capturing Developmental Progression in Neocortical Neural Stem Cell Identity

Curation statements for this article:
  • Curated by eLife

    eLife logo

    eLife Assessment

    This study offers a valuable methodological advance by introducing a gene panel selection approach that captures combinatorial specificity to define cell identity. The findings address key limitations of current single-gene marker methods. The evidence is compelling, but would be strengthened by further validation of rare cell states and unexpected marker categories.

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Definition of cell classes across the tissues of living organisms is central in the analysis of growing atlases of single-cell RNA sequencing (scRNA-seq) data across biomedicine. Marker genes for cell classes are most often defined by differential expression (DE) methods that serially assess individual genes across landscapes of diverse cells. This serial approach has been extremely useful, but is limited because it ignores possible redundancy or complementarity across genes that can only be captured by analyzing multiple genes simultaneously. Interrogating binarized expression data, we aim to identify discriminating panels of genes that are specific to, not only enriched in, individual cell types. To efficiently explore the vast space of possible marker panels, leverage the large number of cells often sequenced, and overcome zero-inflation in scRNA-seq data, we propose viewing marker gene panel selection as a variation of the “minimal set-covering problem” in combinatorial optimization. Using scRNA-seq data from blood and brain tissue, we show that this new method, CellCover, performs as good or better than DE and other methods in defining cell-type discriminating gene panels, while reducing gene redundancy and capturing cell-class-specific signals that are distinct from those defined by DE methods. Transfer learning experiments across mouse, primate, and human data demonstrate that CellCover identifies markers of conserved cell classes in neocortical neurogenesis, as well as developmental progression in both progenitors and neurons. Exploring markers of human outer radial glia (oRG, or basal RG) across mammals, we show that transcriptomic elements of this key cell type in the expansion of the human cortex likely appeared in gliogenic precursors of the rodent before the full program emerged in neurogenic cells of the primate lineage. We have assembled the public datasets we use in this report within the NeMO Analytics multi-omic data exploration environment [1], where the expression of individual genes (NeMO: Individual genes in cortex and NeMO: Individual genes in blood) and marker gene panels (NeMO: Telley 3 CellCover Panels, NeMO: Telley 12 CellCover Panels, NeMO: Sorted Brain Cell CellCover Panels, and NeMO: Blood 34 CellCover Panels) can be freely explored without coding expertise. CellCover is available in CellCover R and CellCover Python.

Article activity feed

  1. eLife Assessment

    This study offers a valuable methodological advance by introducing a gene panel selection approach that captures combinatorial specificity to define cell identity. The findings address key limitations of current single-gene marker methods. The evidence is compelling, but would be strengthened by further validation of rare cell states and unexpected marker categories.

  2. Joint Public Review:

    In this study, the authors introduce CellCover, a gene panel selection algorithm that leverages a minimal covering approach to identify compact sets of genes with high combinatorial specificity for defining cell identities and states. This framework addresses a key limitation in existing marker selection strategies, which often emphasize individually strong markers while neglecting the informative power of gene combinations. The authors demonstrate the utility of CellCover through benchmarking analyses and biological applications, particularly in uncovering previously unresolved cell states and lineage transitions during neocorticogenesis.

    The major strengths of the work include the conceptual shift toward combinatorial marker selection, a clear mathematical formulation of the minimal covering strategy, and biologically relevant applications that underscore the method's power to resolve subtle cell-type differences. The authors' analysis of the Telley et al. dataset highlights intriguing cases of ribosomal, mitochondrial, and tRNA gene usage in specific cortical cell types, suggesting previously underappreciated molecular signatures in neurodevelopment. Additionally, the observation that outer radial glia markers emerge prior to gliogenic progenitors in primates offers novel insights into the temporal dynamics of cortical lineage specification.

    However, several aspects of the study would benefit from further elaboration. First, the interpretability of gene panels containing individually lowly expressed genes but high combinatorial specificity could be improved by providing clearer guidelines or illustrative examples. Second, the utility of CellCover in identifying rare or transient cell states should be more thoroughly quantified, especially under noisy conditions typical of single-cell datasets. Third, while the findings on unexpected gene categories are provocative, they require further validation - either through independent transcriptomic datasets or orthogonal methods such as immunostaining or single-molecule FISH-to confirm their cell-type-specific expression patterns.

    Specifically, the manuscript would benefit from further clarification and additional validation in the following areas:

    • A more in-depth explanation of marker panel applications is needed. Specifically, how should users interpret gene panels where individual genes show only moderate or low expression levels, but the combination provides high specificity? Providing a concrete example, along with guidelines for interpreting such combinatorial signatures, would enhance the practical utility of the method.

    • Further quantification of CellCover's sensitivity in detecting rare cell subtypes or states would strengthen the evaluation of its performance. Additionally, it would be helpful to assess how CellCover performs under noisy conditions, such as low cell numbers or read depths, which are common challenges in scRNA-seq datasets.

    • It is intriguing and novel that CellCover analysis of the dataset from Telley et al. suggests cell-type-specific expression of ribosomal, mitochondrial, or tRNA genes. These findings would be significantly strengthened by additional validation. For example, the reported radial glia-specific expression of Rps18-ps3 and Rps10-ps1, as well as the postmitotic neuron-specific expression of mt-Tv and mt-Nd4l, should be corroborated using independent scRNA-seq or spatial transcriptomic datasets of the developing neocortex. Alternatively, these expression patterns could be directly examined through immunostaining or single-molecule FISH analysis.

    • The observation that outer radial glia (oRG) markers are expressed in neural progenitors before the emergence of gliogenic progenitors in primates and humans is compelling. This could be further supported by examining the temporal and spatial expression patterns of early oRG-specific markers versus gliogenic progenitor markers in recent human spatial transcriptomic datasets - such as the one published by Xuyu et al. (PMID: 40369074) or Wang et al. (PMID: 39779846).

    Summary:

    Overall, this work provides a conceptually innovative and practically useful method for cell type classification that will be valuable to the single-cell and developmental biology communities. Its impact will likely grow as more researchers seek scalable, interpretable, and biologically informed gene panels for multimodal assays, diagnostics, and perturbation studies.