Reconstructing cell type evolution across species through cell phylogenies of single-cell RNAseq data

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

The origin and evolution of cell types has emerged as a key topic in evolutionary biology. Driven by rapidly accumulating single-cell datasets, recent attempts to infer cell type evolution have largely been limited to pairwise comparisons because we lack approaches to build cell phylogenies using model-based approaches. Here we approach the challenges of applying explicit phylogenetic methods to single-cell data by using principal components as phylogenetic characters. We infer a cell phylogeny from a large, comparative single-cell data set of eye cells from five distantly-related mammals. Robust cell type clades enable us to provide a phylogenetic, rather than phenetic, definition of cell type, allowing us to forgo marker genes and phylogenetically classify cells by topology. We further observe evolutionary relationships between diverse vessel endothelia and identify the myelinating and non-myelinating Schwann cells as sister cell types. Finally, we examine principal component loadings and describe the gene expression dynamics underlying the function and identity of cell type clades that have been conserved across the five species. A cell phylogeny provides a rigorous framework towards investigating the evolutionary history of cells and will be critical to interpret comparative single-cell datasets that aim to ask fundamental evolutionary questions.

Article activity feed

  1. Our study is unique in that instead of using gene expression values directly, we use principal components calculated from gene expression values as our phylogenetic characters. In addition, we remove later principal components that may represent highly heterogeneous cell-specific signal.

    Seems like it would be worth including a direct comparison of Brownian motion to other evolutionary models. The computational overhead shouldn't be very high and, if the comparison supports the use of Brownian motion, it could be a more compelling argument than this.

  2. This dataset was chosen for the uniformity of sampling, consistency of lab and sequencing protocols, the high quality of its cell type annotations, and the abundance of genomic resources available for the five model species. UMI counts were downloaded as CSV files from the NCBI GEO database (GSE146188). A file containing meta-data, including cluster assignment and cell type labels, was obtained from the Broad Institute Single Cell Portal

    I wonder about the effect of scRNA-seq methodology on downstream results here. How do droplet-based approaches (like that used for van Zyl et al.) compare to others (e.g. Smart-seq2) when generating cell type trees? There can substantial differences in the # of genes detected by these methods, with droplet-based approaches often generating datasets with less genes. Does this affect the estimation of rank and/or the outputs of the PCA you use for evolutionary modeling? It seems like this would be an important issue to solve since droplet-based methods are essentially downsampling informative data in a nonrandom way that may bias evolutionary inference.

    TLDR: are cell tree topologies consistent independent of sequencing methodologies?