Explainable and fair anti money laundering models using a reproducible SHAP framework for financial institutions
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Financial institutions are under growing regulatory pressure to detect and report money laundering in a way that is accurate, auditable, and fair. This study introduces a reproducible machine learning pipeline for Anti-Money Laundering (AML) detection that integrates statistically validated synthetic data generation, class-imbalance handling, and post-hoc explainability. Using a 10,000-record synthetic AML dataset generated with the Synthetic Data Vault (SDV) and Faker, we train Random Forest and Multilayer Perceptron classifiers with class weighting and F₂-optimized threshold tuning to maximize minority-class recall. Model performance is evaluated using PR-AUC, precision/recall for the suspicious class, F₁ score, MCC, balanced accuracy, and probability calibration. Global and local model interpretability are achieved using TreeSHAP and KernelSHAP, enabling analysts to understand feature contributions and diagnose false positives and false negatives. Fairness audits across age and regional proxies reveal Equal Opportunity gaps, which are mitigated via post-processing threshold adjustments. Results show substantially improved AML recall at regulatorily compliant operating points and provide transparent, auditable outputs aligned with Bank Secrecy Act (BSA) and FATF guidance. This work offers U.S. financial institutions a deployable framework that enhances compliance efficiency, reduces false positives, and supports supervisory review, replication, and industry benchmarking.