Mapping the Mind of an Instruction-based Image Editing using SMILE
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Despite recent advancements in modifying high-quality images from text, instruction-based image editing models are known as black boxes. Their lack of transparency makes it difficult for users to fully trust these systems. To tackle this issue, we introduce SMILE (Statistical Model-agnostic Interpretability with Local Explanations). This novelty approach is model-agnostic and provides transparent, localized explanations and visual heatmaps, helping users understand how specific textual inputs influence image generation. Our extensive testing across different metrics—stability, accuracy, fidelity, and consistency—shows that SMILE significantly improves interpretability and reliability when applied to popular models such as Instruct-pix2pix, Img2Img-Turbo, and Diffusers-Inpaint. These results highlight the importance of interpretability in making AI more transparent and trustworthy, especially in critical areas like autonomous driving and healthcare.