Quantifying distribution shifts in single-cell data with scXMatch

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

A basic task that frequently arises when analyzing single-cell data is to assess if there is a global distribution shift between the data profiles of cells from two different conditions. Widely used approaches to address this task such as visual inspection of two-dimensional representations or clustering-based workflows lack a solid statistical underpinning and are notoriously unstable and prone to confirmation bias. To promote more rigorous analysis, we here present the scverse-compatible Python tool scXMatch. scXMatch is based on the cross-matching test, a more than 20 year old non-parametric test to quantify distribution shifts in arbitrary data spaces for which a suitable distance measure is available. Since the test's original version was designed for small sample sizes, we developed a resource-efficient variant based on k-nearest neighbor graphs which scales to realistically sized single-cell datasets. We evaluated scXMatch on single-cell gene expression, chromatin accessibility, and imaging-derived cell morphology data, showing that it can robustly detect distribution shifts for different types of single-cell data. scXMatch thus aims to set a new standard in the single-cell biology field, replacing easy-to-manipulate semi-manual distribution shift quantification workflows by principled statistical testing.

Article activity feed