CanID: a robust and accurate RNAseq Expression-based diagnostic classification scheme for pediatric malignancies

Daniel K. Putnam
Alexander M. Gout
Delaram Rahbarinia
Meiling Jin
David Finkelstein
Xiaotu Ma
Jinghui Zhang
David A. Wheeler
Larissa V. Furtado
Xiang Chen

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Cancer subtype classification is critical for precision therapy and there is a growing trend of augmenting histopathology testing procedures with omics-based machine learning classifiers. However, analytical challenges remain for pediatric cancer on the scope and precision of the current classifiers as well as the evolving subtype standardization. To address these challenges, we built Cancer Identification or CanID, a stacked ensemble machine learning classification scheme, using the transcriptomic features derived from gene-level RNA sequencing count data as the sole input. CanID was developed primarily from 3203 pediatric cancer samples of 13 solid tumor subtypes and 38 hematologic malignancy subtypes with subtype labels curated without the use of RNA-seq data. The accuracies of independent testing in three independent or external data sets for Solid Tumor and Hematologic Malignancy are 99% and 92–93%, respectively. Notably, CanID was able to classify subtypes challenging for clinical histology evaluation and was robust to both biological and technical challenges, including differences in data collection protocols, class imbalance, potential mislabeled training samples and classes unobserved in training. The high accuracy, robustness, biological interpretability of this transcriptome-based classification scheme represents a valuable approach to advance tumor diagnosis and clinically meaningful stratification of tumor types. CanID can be accessed on GitHub at https://github.com/chenlab-sj/CanID .

Version published to 10.1101/2025.08.20.671349 on bioRxiv
Aug 24, 2025

Development of a Metabolic Subtype Classifier for Low- Grade Glioma to Guide Precision Therapy

This article has 8 authors:
1. Yiqi Tan
2. Le Zeng
3. Ganghua Zhang
4. Jianing Fang
5. Zhijing Yin
6. Wenzhi Deng
7. Ke Cao
8. Jiaode Jiang
This article has no evaluationsLatest version Dec 16, 2025
Multi-Omic Integration and Machine Learning Reveal Regulatory Networks Driving Breast Cancer Progression

This article has 2 authors:
1. Unmilita Das Moon
2. Kushal Raj Roy
This article has no evaluationsLatest version Dec 11, 2025
Inferring Clinically Relevant Molecular Subtypes of Pancreatic Cancer from Routine Histopathology Using Deep Learning

This article has 13 authors:
1. Abdul Rehman Akbar
2. Alejandro Levya
3. Ashwini Esnakula
4. Elshad Hasanov
5. Anne Noonan
6. Upender Manne
7. Vaibhav Sahai
8. Lingbin Meng
9. Susan Tsai
10. Anil Parwani
11. Wei Chen
12. Ashish Manne
13. Muhammad Khalid Khan Niazi
This article has no evaluationsLatest version Jan 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Development of a Metabolic Subtype Classifier for Low- Grade Glioma to Guide Precision Therapy

Multi-Omic Integration and Machine Learning Reveal Regulatory Networks Driving Breast Cancer Progression

Inferring Clinically Relevant Molecular Subtypes of Pancreatic Cancer from Routine Histopathology Using Deep Learning