Interpretability requires interaction and integration of complexity-theoretic and experimental efforts
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
System opacity underlies many of the risks we currently worry about in AI and undermines many of the intended scientific applications. Understanding the conditions under which we can provably or otherwise reasonably guarantee that interpretability methods meet the requirements of scientific and societal needs is a central concern. We argue that the interpretability field is at a critical juncture, with a minimal foundation of theoretical and empirical results, but it is not well positioned to seize the opportunities or meet the challenges. This will require an approach so far unexplored: interaction of complexity-theoretic and experimental efforts and continual integration of their formal and empirical results. We present a comprehensive, actionable research strategy to do so based on computational modeling and parameterized complexity analysis, and algorithmic design and parametric experimentation. We illustrate its potential and feasibility by arguing from case studies of input and component attribution.