Tabular Foundation Model for Breast Cancer Prognosis using Gene Expression Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Survival analysis is essential in oncology for modeling time-to-event outcomes such as overall survival and disease recurrence. Traditional approaches, such as the Cox Proportional Hazards (CPH) model, have been widely used due to their interpretability but rely on restrictive assumptions of linearity and proportional hazards, which limit their ability to capture the nonlinear relationships present in high-dimensional genomic data. Recent machine learning methods, including Random Survival Forests (RSF) and deep learning models such as DeepSurv, have improved flexibility and predictive performance but require extensive training data, hyperparameter tuning, and computationally expensive optimization, which hinder their practical use. We propose TabSurv, a novel survival prediction framework that leverages a foundation model for tabular data tasks using in-context learning. TabSurv predicts survival times in a regression setting, allowing rapid adaptation to new datasets with minimal computational cost. The model is trained using only uncensored samples and evaluated with the concordance index (C-index) and stability metrics to assess both accuracy and robustness. We benchmark TabSurv against seven state-of-the-art survival models across 12 breast cancer datasets. The results demonstrate that TabSurv achieves competitive or superior performance, obtaining the best C-index on six datasets and the highest overall stability score. These findings highlight TabSurv as a powerful and efficient tool for breast cancer prognosis using high-dimensional molecular data.

Article activity feed