The Limitations of TabPFN for High-Dimensional RNA-seq Analysis
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Arcadia Science)
Abstract
Tabular Prior-Data Fitted Networks (TabPFN) demonstrate remarkable performance on small-to-medium tabular datasets through in-context learning, but struggle with high-dimensional genomic data such as RNA-seq with tens of thousands of features. We investigate multiple approaches to adapt TabPFN for transcriptomic analysis using two benchmark datasets: Age-ARCHS4, a regression dataset derived from the ARCHS4 dataset (57,873 samples, 10,000 genes), and an Inflammatory Bowel Disease (IBD) classification dataset encompassing Crohn’s Disease and Ulcerative Colitis samples (2,490 samples, 10,000 genes). Our experimental design proceeds in two phases: first evaluating existing optimization methods, then testing novel adaptations including (1) self-supervised embedding learning and (2) Bulk-Former integration. We demonstrate that when constrained to equal training conditions (500 features, 10,000 samples), TabPFN outperforms classical baselines like random forest and XGBoost. However, when classical methods utilize full feature sets while TabPFN adaptations attempt to handle higher-dimensional data, all TabPFN variants consistently underperform the naive baseline. Our findings reveal fundamental limitations in current approaches to adapting TabPFN for genomic applications, showing that architectural modifications paradoxically degrade performance, while intelligent metadata-based subgrouping emerges as the most effective strategy for deploying TabPFN on biological data.
Article activity feed
-
4.3.3 Key FindingsThe metadata-based subgrouping approach yields several important insights:Organ-specific models outperform global models: Colon-specific TabPFN (R2 = 0.7217) sub-stantially outperforms naive TabPFN on the full dataset (R2 = 0.6074)Sex stratification provides additional gains: Sex-aware blood models achieve R2 = 0.705 compared to R2 = 0.657 for blood-only modelsCombined strategy achieves best performance: Organ + Sex Aware TabPFN reaches R2 = 0.75, representing a 23Biological relevance matters: Performance gains align with known biological differences between tissues and sexes in gene expression patternsThis approach demonstrates that TabPFN can be successfully applied to genomic data when its architectural constraints are respected through intelligent data partitioning rather than circumvented through complex …
4.3.3 Key FindingsThe metadata-based subgrouping approach yields several important insights:Organ-specific models outperform global models: Colon-specific TabPFN (R2 = 0.7217) sub-stantially outperforms naive TabPFN on the full dataset (R2 = 0.6074)Sex stratification provides additional gains: Sex-aware blood models achieve R2 = 0.705 compared to R2 = 0.657 for blood-only modelsCombined strategy achieves best performance: Organ + Sex Aware TabPFN reaches R2 = 0.75, representing a 23Biological relevance matters: Performance gains align with known biological differences between tissues and sexes in gene expression patternsThis approach demonstrates that TabPFN can be successfully applied to genomic data when its architectural constraints are respected through intelligent data partitioning rather than circumvented through complex adaptations.
Thank you for this thorough project! It would be of interest to see comparisons with the other model architectures for these experiments, as shown in previous figures. An equally valid interpretation for 4.3 is that any of the models would show an improvement on the partitions, perhaps of greater magnitude. than TabPFN. Could random partitions also show similar improvement?
It would also be of interest as another control to provide the partition label when inferring from the full dataset to see if the mutual information between the partition label and the target is the main source of improvement. These labels could also be shuffled to determine if that disrupts the performance change.
Thank you for a thoughtful set of experiments!
-