Bonsai : Tree representations for distortion-free visualization and exploratory analysis of single-cell omics data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Single-cell omics methods promise to revolutionize our understanding of gene regulatory processes during cell differentiation, but analysis of such data continues to pose a major challenge. Apart from technical challenges such as the sparsity and heterogeneous noise properties of these data, the crucial problem is that we know little about the potentially very complex high-dimensional structures that the data represent. Consequently, there is an urgent need for exploratory analysis methods that allow rigorous representation and visualization of the structure in the data. However, currently popular methods such as UMAP and t-SNE are unsatisfactory because they are ad hoc , stochastic, uninterpretable, and known to severely distort the structure in the data.

Here we show that these challenges can be overcome by representing the data on tree structures and present Bonsai : a novel method that reconstructs the most likely tree relating any set of high-dimensional objects while rigorously accounting for heterogeneous measurement noise.

We show that, in contrast to other visualization methods, distances along the Bonsai trees accurately represent true distances between the objects in high-dimensional space across many types of datasets. Moreover, Bonsai automatically regularizes measurement noise, outperforming even methods specifically designed for that purpose on tasks such as nearest-neighbor identification.

By analyzing a blood cell dataset, we show that Bonsai trees not only capture known lineage relationships but also provide novel biological insights. For example, Bonsai uncovers that different subsets of NK cells derive from the myeloid and lymphoid lineage, and pinpoints genes that distinguish myeloid-NK from lymphoid-NK cells.

Bonsai is free from tunable parameters and scales to datasets of hundreds of thousands of cells. The accompanying tool, Bonsai-scout , provides visualizations of the Bonsai trees and allows for interactive data exploration such as identifying subclades and their markers, visualizing features along the tree, changing the tree layout, and zooming in on substructures. Finally, application to a dataset of football statistics shows the generality of Bonsai in successfully capturing complex structures in high-dimensional data.

Article activity feed