Batch correction methods used in single cell RNA-sequencing analyses are often poorly calibrated

Sindri Emmanúel Antonsson
Páll Melsted

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

As the number of experiments that employ single-cell RNA-sequencing (scRNA-seq) grows it opens up the possibility of combining results across experiments or processing cells from the same experiment assayed in separate sequencing runs. The gain in the number of cells that can be compared comes at the cost of batch effects that may be present. Several methods have been proposed to combat this for scRNA-seq datasets.

We compared seven widely used method used for batch correction of scRNA-seq datasets. We present a novel approach to measure the degree to which the methods alter the data in the process of batch correction, both at the fine scale comparing distances between cells as well as measuring effects observed across clusters of cells. We demonstrate that many of the published method are poorly calibrated in the sense that the process of correction creates measurable artifacts in the data.

In particular, MNN, SCVI and LIGER performed poorly in our tests, often altering the data considerably. Batch correction with Combat, BBKNN and Seurat introduced artifacts that could be detected in our setup. However, we found that Harmony was the only method that consistently performed well, in all the testing methodology we present. Due to these result Harmony is the only method we can safely recommend using when performing batch correction of scRNA-seq data.

Version published to 10.1101/2024.03.19.585562 on bioRxiv
Mar 21, 2024

Discuss this preprint

Listed in

Abstract

Article activity feed