Refocusing Algorithmic Fairness on Feature-Level Bias: A Diagnostic Approach Using Dutch EHR Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
While algorithmic fairness research in healthcare has predominantly focused on disparities in model performance, less attention has been given to the underlying data structures that may drive such disparities. High-level fairness metrics often obscure the deeper feature-level dynamics necessary for a critical and context-aware assessment of fairness. To address this gap, we propose and apply a diagnostic framework termed Feature-Level Bias Identification And Sensemaking (FL-BIAS). As a case study, we conducted a secondary analysis of a retrospective cross-sectional cohort study using electronic health record (EHR) data from Dutch general practitioners, linked with sociodemographic data from Statistics Netherlands. The dataset included 112,872 patients, of whom 16.2% had a non-Western migration background. Hospitalization in the following year was modeled using Johns Hopkins Aggregated Diagnosis Groups (ADGs). We trained logistic regression and XGBoost models on different subgroup datasets to evaluate performance disparities using fairness metrics. To analyze feature-level contributions to predictions and errors, we applied Shapley Value methods, including Kernel SHAP and Cohort Shapley. Exploratory analysis revealed significant differences in SES and ADG distributions between Dutch and non-Western groups, though Multiple Correspondence Analysis showed minimal structural variation. Mediation analysis indicated that the effect of migration background on hospitalization was largely mediated by SES, with potential unobserved confounding. While standard fairness metrics indicated modest bias in favor of non-Western patients, deeper feature-level analyses revealed subgroup-specific patterns of variable importance that suggest potentially less favorable underlying conditions. For instance, malignancy (ADG 32) had a stronger predictive impact among non-Western patients but contributed less to false negatives compared to Dutch patients, possibly reflecting structural disparities in cancer diagnosis and care. These findings highlight the need for contextual, multi-level evaluations of algorithmic bias. Fairness in healthcare AI must be approached as a socio-technical challenge, requiring multidisciplinary collaboration to uncover root causes and guide effective mitigation strategies.
Author Summary
Unfair algorithms often originate in the data itself. To help detect such hidden biases, we combined known data science methods into a diagnostic approach named Feature-Level Bias Identification And Sensemaking (FL-BIAS). Using Dutch general practitioner records linked with national demographic information, we explored how patient migration background corrected for socioeconomic status affect predictions of hospital admissions. We discovered that certain medical conditions, such as cancer and chronic illness, contributed differently to predictions for Dutch compared to non-Western patients, revealing subtle but important patterns in how data reflects social inequalities. Our goal was to move beyond simple fairness scores and better understand how inequalities can be hidden in health data, supporting the design of fairer and more transparent healthcare AI tools that truly serve all patients.