From Correlation to Causation: Evaluating Fairness Metrics at the Preprocessing Stage of ML Pipelines
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Fairness is a foundational concern in the development of trustworthy AI, yet most research concentrates on model-level bias, overlooking how unfairness can originate and amplify during data preprocessing. This study presents a comprehensive, component-level comparison of fairness metrics—spanning statistical, causal, and counterfactual paradigms—to evaluate bias at the preprocessing stage of machine learning (ML) pipelines. By isolating and analyzing the fairness impact of individual preprocessing stages, we demonstrate that early-stage interventions can substantially reduce the need for downstream mitigation. To address this, we develop novel fairness metrics across all three paradigms—statistical, causal, and counterfactual—by applying causal reasoning methodologies, including Propensity Score Matching (PSM) a nd structural interventions. These newly proposed metrics extend classical measures such as SPD, EOD, AOD, and ERD into their causal and counterfactual counterparts, enabling a more nuanced and interpretable fairness evaluation. The analysis is grounded in five widely studied, real-world datasets—Adult Census, Bank Marketing, German Credit, Titanic, and COMPAS—each offering unique challenges due to variations in instance size, domain context, and sensitive attributes (e.g., race, gender, age, marital status). Through these diverse pipelines, we address three core questions: the conceptual and practical distinctions between fairness metrics, the capacity of causal techniques to uncover structural bias, and the challenges in integrating fairness evaluations into a unified, context-aware methodology. The findings reveal that statistical metrics often mask deeper, pathway-dependent or individual-level inequities that only causal and counterfactual perspectives expose. This work supports a shift toward proactive, fine-grained fairness auditing, offering practitioners robust tools for ethically sound and technically rigorous ML deployment.