Refocusing Algorithmic Fairness on Feature-Level Bias: A Diagnostic Approach Using Dutch EHR Data

Cathleen S. Parsons
Shelley-Ann M. Girwar
Sepinoud Azimi
Marco R. Spruit
Marcel R. Haas

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

While algorithmic fairness research in healthcare has predominantly focused on disparities in model performance, less attention has been given to the underlying data structures that may drive such disparities. High-level fairness metrics often obscure the deeper feature-level dynamics necessary for a critical and context-aware assessment of fairness. To address this gap, we propose and apply a diagnostic framework termed Feature-Level Bias Identification And Sensemaking (FL-BIAS). As a case study, we conducted a secondary analysis of a retrospective cross-sectional cohort study using electronic health record (EHR) data from Dutch general practitioners, linked with sociodemographic data from Statistics Netherlands. The dataset included 112,872 patients, of whom 16.2% had a non-Western migration background. Hospitalization in the following year was modeled using Johns Hopkins Aggregated Diagnosis Groups (ADGs). We trained logistic regression and XGBoost models on different subgroup datasets to evaluate performance disparities using fairness metrics. To analyze feature-level contributions to predictions and errors, we applied Shapley Value methods, including Kernel SHAP and Cohort Shapley. Exploratory analysis revealed significant differences in SES and ADG distributions between Dutch and non-Western groups, though Multiple Correspondence Analysis showed minimal structural variation. Mediation analysis indicated that the effect of migration background on hospitalization was largely mediated by SES, with potential unobserved confounding. While standard fairness metrics indicated modest bias in favor of non-Western patients, deeper feature-level analyses revealed subgroup-specific patterns of variable importance that suggest potentially less favorable underlying conditions. For instance, malignancy (ADG 32) had a stronger predictive impact among non-Western patients but contributed less to false negatives compared to Dutch patients, possibly reflecting structural disparities in cancer diagnosis and care. These findings highlight the need for contextual, multi-level evaluations of algorithmic bias. Fairness in healthcare AI must be approached as a socio-technical challenge, requiring multidisciplinary collaboration to uncover root causes and guide effective mitigation strategies.

Author Summary

Unfair algorithms often originate in the data itself. To help detect such hidden biases, we combined known data science methods into a diagnostic approach named Feature-Level Bias Identification And Sensemaking (FL-BIAS). Using Dutch general practitioner records linked with national demographic information, we explored how patient migration background corrected for socioeconomic status affect predictions of hospital admissions. We discovered that certain medical conditions, such as cancer and chronic illness, contributed differently to predictions for Dutch compared to non-Western patients, revealing subtle but important patterns in how data reflects social inequalities. Our goal was to move beyond simple fairness scores and better understand how inequalities can be hidden in health data, supporting the design of fairer and more transparent healthcare AI tools that truly serve all patients.

Version published to 10.1101/2025.11.09.25339863 on medRxiv
Nov 11, 2025

A Systematic Fairness Evaluation of Racial Bias in Alzheimer’s Disease Diagnosis Using Machine Learning Models

This article has 3 authors:
1. Neha Goud Baddam
2. Bizhan Alipour Pijani
3. Serdar Bozdag
This article has no evaluationsLatest version Oct 2, 2025
Evaluating the impact of capitation funding top-up payments in primary care

This article has 8 authors:
1. Sarah Opie-Martin
2. Freya Tracey
3. Emma Whitfield
4. Melissa Co
5. Jake Beech
6. Luisa M Pettigrew
7. Jonathan M Clarke
8. Therese Lloyd
This article has no evaluationsLatest version Sep 26, 2025
Revisiting Sampling Bias: Implications on Fairness Measurement and Mitigation

This article has 4 authors:
1. Sami Zhioua
2. Ruta Binkyte
3. Ayoub Ouni
4. Farah Barika Ktata
This article has no evaluationsLatest version Sep 18, 2025

Discuss this preprint

Listed in

Abstract

Author Summary

Article activity feed

Related articles

A Systematic Fairness Evaluation of Racial Bias in Alzheimer’s Disease Diagnosis Using Machine Learning Models

Evaluating the impact of capitation funding top-up payments in primary care

Revisiting Sampling Bias: Implications on Fairness Measurement and Mitigation