A Transformer-Based Approach to Survival Outcome Prediction

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate prediction of patient survival outcomes is a critical challenge in cancer research, with the potential to inform personalized treatment strategies and improve patient care. We leveraged Geneformer, a state-of-the-art transformer model pre-trained on a massive single-cell RNA-seq dataset, to develop a model for the prediction of overall survival (OS). We adapted Geneformer for bulk tumor data analysis by appending a task-specific transformer layer and fine-tuning the model on RNA-seq data from The Cancer Genome Atlas (TCGA). Additionally, we employed a rank-value encoding scheme to prioritize informative genes and reduce noise. Our model demonstrated a robust correlation between predicted and true OS, with Pearson correlation coefficient of 0.72 (p<0.00001). Survival analysis revealed significant differences in survival between patient subgroups stratified based on the model’s predictions. The Geneformer-based model outperformed traditional machine learning approaches (Random Forest and Neural Network) in patient stratification tasks. Further analysis demonstrated the consistency of the model’s performance across different tumor stages and patient subgroups. Our study highlights the potential of leveraging pre-trained transformer models, originally developed for single-cell data analysis, to predict clinically relevant outcomes from bulk tumor gene expression data. The superior performance of our Geneformer-based model underscores its potential to enhance prognostication and treatment decision-making in cancer research. Future work will focus on refining the model architecture, incorporating multi-omics data, and validating its performance on external datasets to further advance its clinical utility.

Short Abstract

Accurate prediction of patient survival has important implications for cancer research as it enables the development of personalized treatment plans, guides clinical decision-making, and can be leveraged for clinical trial optimization. We utilized Geneformer, a transformer model pre-trained on single-cell RNA-seq data, to predict overall survival (OS) from bulk tumor gene expression. Adapting Geneformer for bulk tumor analysis and using rank-value encoding, we achieved strong correlations between predicted and true OS (r=0.72, p<0.00001). Our model outperformed traditional machine learning approaches in patient stratification, demonstrating consistent performance across tumor stages and subgroups. This study highlights the potential of pre-trained transformer models for prognostication in cancer, paving the way for refined, personalized treatment strategies.

Article activity feed