A Framework for Generating Counterfactual Explanations to Explain Black-Box Models

Joshan Parmar
Pietro Liò
Soumya Banerjee

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Automated decision-making systems are becoming more prevalent across a range of fields. For example, they are used across the finance industry to decide whether customers are entitled to loans, and in medicine to support clinical decisions. With decisions that substantially impact individual lives, users will often want explanations of why these decisions were made. Indeed, current and in-progress legal frameworks have begun to require these. For example, GDPR entitles users to explanations about decisions made by these automated systems. Similarly, the Algorithmic Accountability Act would require companies to conduct impact assessments for bias, effectiveness and other factors. One way of producing such explanations for the decisions of machine learning models is to ask the question “What must I change in the input to change the output of a model”. This is known as a counterfactual explanation. Our methodology begins by fitting simple models to the variation of the output caused by the change of one input feature. We then score each of these models using user-tuneable scoring functions. By selecting the best possible change and then repeating, we chain together these simple models into sequential counterfactuals. Our method is modular, which means that it is inherently extensible. Each component of out methodology can be easily modified to make it more useful for specific situations. We examine how well our methodology performs on multiple data formats: images and tabular data. We compare our framework to other methodologies. We demonstrate that our method can, in some cases, produce more interpretative counterfactuals that change fewer features than some existing methodologies. Finally, we make a Python implementation of our code available so that it can be used by the machine learning community.

Version published to 10.20944/preprints202501.1083.v1
Jan 15, 2025

Integrating Model Explainability and Uncertainty Quantification for Trustworthy Fraud Detection

This article has 2 authors:
1. Tebogo Mapaila
2. Makhamisa Senekane
This article has no evaluationsLatest version Jan 7, 2026
Interpretability and Trust in Large Language and Agentic Models: A Survey of Methods, Metrics, and Applications

This article has 1 author:
1. Jithesh Yemi Reddy
This article has no evaluationsLatest version Dec 24, 2025
Classifying 25 Misinterpretations of Statistical Tests: A Comparison of Six Large Language Models

This article has 3 authors:
1. Alessandro Rovetta
2. Lucia Castaldo
3. Mohammad Ali Mansournia
This article has no evaluationsLatest version Jan 20, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrating Model Explainability and Uncertainty Quantification for Trustworthy Fraud Detection

Interpretability and Trust in Large Language and Agentic Models: A Survey of Methods, Metrics, and Applications

Classifying 25 Misinterpretations of Statistical Tests: A Comparison of Six Large Language Models