Fault-Injection Probing: A Causal Interpretability Framework for Quantum Machine Learning Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Quantum machine learning (QML) models achieve competitive performance on real-world tasks, yet interpreting what these models learn remains an open challenge. Classical interpretability techniques depend on access to intermediate representations, which quantum systems forbid due to measurement collapse, the no-cloning theorem, and exponential state-space dimensionality. We introduce Fault-Injection Probing (FIP), a framework that repurposes controlled quantum errors—bit flips, phase flips, depolarising channels, and erasure—as interpretability probes. FIP injects a known fault at a specific qubit and circuit layer, then measures the output shift. Comparing shifts across inputs with and without a target feature yields causal attribution scores linking qubits to learned representations. On variational quantum classifiers trained for sentiment analysis, FIP identifies sentiment-encoding qubits whose targeted perturbation flips 72% of relevant predictions. On a synthetic benchmark with known ground-truth mappings, FIP achieves 100% identification accuracy with zero false positives across all eight qubits. The framework is model-agnostic, extending to quantum kernels, reservoir models, and QAOA, and supports practical applications including model debugging and adversarial robustness assessment.

Article activity feed