A benchmarking study of copy number variation inference methods using single-cell RNA-sequencing data

Xin Chen
Li Tai Fang
Zhong Chen
Wanqiu Chen
Hongjin Wu
Bin Zhu
Malcolm Moos
Andrew Farmer
Xiaowen Zhang
Wei Xiong
Shusheng Gong
Wendell Jones
Christopher E Mason
Shixiu Wu
Chunlin Xiao
Charles Wang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Single-cell RNA-sequencing (scRNA-seq) has emerged as a powerful tool for cancer research, enabling in-depth characterization of tumor heterogeneity at the single-cell level. Recently, several scRNA-seq copy number variation (scCNV) inference methods have been developed, expanding the application of scRNA-seq to study genetic heterogeneity in cancer using transcriptomic data. However, the fidelity of these methods has not been investigated systematically.

Methods

We benchmarked five commonly used scCNV inference methods: HoneyBADGER, CopyKAT, CaSpER, inferCNV, and sciCNV. We evaluated their performance across four different scRNA-seq platforms using data from our previous multicenter study. We evaluated scCNV performance further using scRNA-seq datasets derived from mixed samples consisting of five human lung adenocarcinoma cell lines and also sequenced tissues from a small cell lung cancer patient and used the data to validate our findings with a clinical scRNA-seq dataset.

Results

We found that the sensitivity and specificity of the five scCNV inference methods varied, depending on the selection of reference data, sequencing depth, and read length. CopyKAT and CaSpER outperformed other methods overall, while inferCNV, sciCNV, and CopyKAT performed better than other methods in subclone identification. We found that batch effects significantly affected the performance of subclone identification in mixed datasets in most methods we tested.

Conclusion

Our benchmarking study revealed the strengths and weaknesses of each of these scCNV inference methods and provided guidance for selecting the optimal CNV inference method using scRNA-seq data.

Version published to 10.1093/pcmedi/pbaf011
Apr 9, 2025
Version published to 10.1101/2024.09.09.612120 on bioRxiv
Sep 14, 2024

Benchmarking RNA-seq Tools for Real-World Diagnostic Applications

This article has 15 authors:
1. Sarah Silverstein
2. Kaushik Ganapathy
3. Sandra Donkervoort
4. Veronique Bolduc
5. Ying Hu
6. Justin Moy
7. Prech Uapinyoying
8. Svetlana Gorokhova
9. Vijay Ganesh
10. Ben Weisburd
11. Rotem OrBach
12. A. Reghan Foley
13. Pejman Mohammadi
14. David Adams
15. Carsten Bonnemann
This article has no evaluationsLatest version Jan 29, 2026
Cell-type-specific transcriptomic signatures associated with Alzheimer’s disease in the ROSMAP cohort: a single-nucleus RNA-seq pseudobulk analysis.

This article has 1 author:
1. Jose Israel Nadal Vidal
This article has no evaluationsLatest version Jan 6, 2026
An integrated single-cell transcriptomic dataset for Mouse cortex

This article has 8 authors:
1. Xuefeng Shi
2. Zhihui Qi
3. Hong Huang
4. Zhiming Ye
5. YuMin Wu
6. Kahei Chan
7. Maojin Yao
8. Zhongxing Wang
This article has no evaluationsLatest version Dec 18, 2025

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusion

Article activity feed

Related articles

Benchmarking RNA-seq Tools for Real-World Diagnostic Applications

Cell-type-specific transcriptomic signatures associated with Alzheimer’s disease in the ROSMAP cohort: a single-nucleus RNA-seq pseudobulk analysis.

An integrated single-cell transcriptomic dataset for Mouse cortex