TabularGRPO: Modern Mixture-Of-Experts Transformer with Group Relative Policy Optimization GRPO for Tabular Data Learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Tabular data remains the cornerstone of decision-making in healthcare, finance, and industrial analytics. We propose TabularGRPO, a novel reinforcement learning framework that synergizes Mixture-of-Experts (MoE) architectures with variance-reduced policy gradients. TabularGRPO addresses three fundamental challenges in tabular learning: 1) Feature-type heterogeneity through dynamic expert routing, 2) Class imbalance via group-wise advantage normalization, and 3) Sample inefficiency with KL-regularized policy updates. Evaluations on challenging datasets demonstrate TabularGRPO’s superiority over current dominanting models as XGBoost, Catboost with 6.0% higher precision and 13.0% higher F1 score, establishing new state-of-the-art performance. Code and benchmarks are publicly released. The code we used to train and evaluate our models is available at https://github.com/enkhtogtokh/tabulargrpo

Article activity feed