What Does It Mean to Explain? A Functional Taxonomy Across Human and Machine Learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Forms of explanation are typically studied as cognitive events: researchers ask what explaining does to the explainer or the learner. This paper instead analyzes explanation as a family of instructional operations and asks what functionally distinct behaviors are grouped under the everyday verb "explain." Drawing on behavior-analytic concepts, the paper proposes a two-dimensional taxonomy organized by (a) the learner's existing repertoire and (b) the type of concept being taught. Explanation types include structured examples, demonstrations, rule statements, metaphors and analogies, causal chains, and four subtypes of corrective feedback (simple, feature-directed, rule-referencing, and error-magnitude signaling). Each is defined by its repertoire requirements and the concept types it can and cannot handle. The paper then maps these behavioral categories onto parallel constructs in cognitivism and machine learning, showing that many apparent theoretical disagreements reduce to terminological translation problems, while also identifying genuine gaps—most notably the absence of widely adopted, training-time counterparts to feature-directed and rule-referencing feedback in current neural network practice. These gaps are independently corroborated by a recent comprehensive survey of LLM reasoning failures which documents systematic breakdowns in LLM reasoning that align with the failure modes the present taxonomy predicts. The taxonomy generates testable predictions about when particular forms of explanation should be necessary, sufficient, or impossible, and it suggests new directions for research on explanatory operations in both human and machine learners.

Article activity feed