Sinle-Cell Transcriptomics and Machine Learning Algorithms Unveil Metastasis-Associated Cellular Subtypes and Prognostic Signatures in Colorectal Cancer
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Colorectal cancer (CRC) is a prevalent digestive tract malignancy, with liver metastasis occurring in up to 50% of cases. Identifying reliable early metastasis markers is crucial for improving CRC prognosis. Methods In this study, we analyzed single-cell RNA sequencing data from CRC patients, including primary tumors, adjacent normal tissues, and liver metastases. Copy number variation (CNV) analysis using CopyKAT algorithm distinguished tumor from non-tumor cells. We identified key tumor subtypes influencing metastasis through differential gene expression and pathway analyses. Leveraging 103 machine learning algorithms, we developed a metastasis-associated risk model based on identified biomarkers. The model was validated across multiple external datasets.. Results We delineated five tumor cell subtypes, with EMP1 + cells emerging as a key subtype in CRC metastasis. The machine learning approach identified a five-gene signature (SPINK1, PLAC8, LAMB3, CEACAM5, CDA) for metastasis risk prediction. The risk model significantly stratified patients into high- and low-risk groups across six independent cohorts, with high-risk scores correlating with poorer survival. Gene set enrichment analysis revealed enrichment of epithelial-mesenchymal transition (EMT) pathways in the high-risk group. Mutation analysis showed higher overall mutation frequencies in the high-risk group, particularly in genes like APC, TP53, and KRAS. Conclusion Our single-cell transcriptomics and machine learning approach uncovered novel cellular subtypes and a gene signature associated with CRC metastasis, providing new insights for early diagnosis and potential therapeutic targets.