Equilibrium Propagation Discovers Top-Down Feedback for Audio-Visual Binding in Continuous Wave Fields
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Cross-modal binding — the fusion of simultaneous sensory streams into a unified percept — has not been achieved in physical neural networks without backpropagation. Whether top-down feedback between hierarchical field layers can emerge from local learning rules alone remains untested. We extend a Landau-Ginzburg wave field architecture trained by Equilibrium Propagation to a two-layer system: primary audio and visual fields drive a binding field that sends top-down feedback to both primaries through coupling coefficients initialized to zero. Trained on the GRID audiovisual corpus, the coupling coefficients grow from 0.0 to 0.051 over ten epochs — a result absent in the unimodal case — confirming that Equilibrium Propagation discovers top-down feedback when cross-modal binding is required. The binding field outperforms late fusion; replacing phase-sensitive measurement with amplitude-only readout costs 9.2 percentage points, exceeding the analogous unimodal penalty. When presented with conflicting audiovisual inputs, the system produces fusion responses in 83% of trials, stable under contrastive readout training and therefore reflecting field dynamics rather than readout bias. Symmetric noise degradation — 33.3 versus 33.7 percentage points for audio and video respectively — confirms genuine integration.