The Interoceptive Origins of Mental Imagery: An Evolutionary Account
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
I propose that imagery's origins lie in interoceptive processing, the phylogenetically ancient system linking internal bodily states to valence, emotion, and motivational salience (Craig, 2009; Barrett, 2017). Interoceptive signals inherently specify survival-relevant goals: hunger with approach, threat-arousal with avoidance, sexual arousal with pursuit. This makes interoception the necessary foundation for any planning system because planning requires knowing what you are planning for: which bodily states and emotions to pursue or avoid. Motor imagery emerged first, extending forward models to enable offline action planning. Because survival-relevant motor actions already involved interoceptive states, motor imagery inherited interoceptive integration from these forward models. Motor imagery worked well for first- person action planning directly coupled to immediate motor execution. However, as social complexity increased, organisms needed to simulate scenarios involving others; tracking rival alliances, predicting dominance conflicts, monitoring mating competitors. Here, organisms faced a discriminability problem rooted in the low-dimensional nature of autonomic arousal (Barrett, 2017). The same autonomic pattern can occur across different social situations (dominance encounters, rival conflicts, mating opportunities) that demand distinct behavioral responses. Visual imagery evolved to solve this discriminability problem by binding distinctive sensory features with interoceptive states. This binding makes simulations goal-directed: imagining a dominant rival automatically activates the associated arousal pattern and avoidance tendency, while imagining a mating opportunity activates approach motivation. Mental simulations therefore evolved as multimodal representations where sensory and interoceptive components are bound during construction itself, creating images that inherently possess valence and motivation, not as subsequent responses to neutral sensory simulations but as constitutive features of mental images themselves (cf Silvanto & Nagai, 2025).