Leakage-Aware LLM Augmentation for Attrition Prediction: A Decision-Centric Evaluation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Employee attrition imposes high financial and organizational costs, with preventable departures typically far more expensive than false alarms. This study frames attrition prediction as a decision-support problem and introduces a leakage-aware framework that leverages LLM-based augmentation to generate realistic minority-class samples. Using the IBM HR dataset, we benchmark classical, tree-based, transformer, and AutoML models. Results show that LLM-based augmentation consistently improves recall of potential leavers, even when AUC or Average Precision remain statistically unchanged. From a managerial perspective, higher recall enables organizations to prevent more costly departures at the expense of only modest increases in false positives, producing a favorable cost–benefit balance. SHAP analyses confirm that key drivers such as overtime, mobility, and job satisfaction remain interpretable and actionable, while fairness analysis shows small subgroup disparities, supporting equitable deployment. Overall, the proposed framework demonstrates how leakage-aware, recall-oriented augmentation can translate generative AI advances into transparent, fair, and decision-relevant tools for HR retention, with potential applicability to other rare-event domains such as churn, fraud, and risk prediction.