Reconstructing What the Brain Hears: Cross-Subject Music Decoding from fMRI via Prior-Guided Diffusion Model

Matteo Ciferri
Matteo Ferrante
Nicola Toschi

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Reconstructing music directly from brain activity offers a unique window onto the representational geometry of the auditory system and paves the way for next-generation brain–computer interfaces. We introduce a fully data-driven pipeline that combines cross-subject functional alignment with bayesian decoding in the latent space of a diffusion-based audio generator. Functional alignment projects individual fMRI responses onto a shared representational manifold, increasing the performance of cross-participant accuracy with respect to anatomically normalized baselines. A bayesian search over latent trajectories then selects the most plausible waveform candidate, stabilizing reconstructions against neural noise. Crucially, we bridge CLAP’s multi-modal embeddings to music-domain latents through a dedicated aligner, eliminating the need for hand-crafted captions and preserving the intrinsic structure of musical features. Evaluated on ten diverse genres, the model achieves a cross-subject-averaged Identification Accuracy of 0.914 ± 0.019, and produces audio that naïve listeners recognize above chance in 85.7% of trials. Voxel-wise analyses locate the predictive signal within a bilateral circuit spanning early auditory, inferior-frontal, and premotor cortices, consistent with hierarchical and sensorimotor theories of music perception. The framework establishes a principled bridge between generative audio models and cognitive neuroscience, opening avenues for thought-driven composition, objective metrics for music-based therapy, and translational applications in non-verbal communication and neurotechnologies.

Version published to 10.21203/rs.3.rs-7301336/v1 on Research Square
Aug 20, 2025

A common neural architecture for encoding finger movements

This article has 3 authors:
1. Theo Marins
2. Frederico Augusto Casarsa de Azevedo
3. Guilherme Wood
This article has no evaluationsLatest version Sep 11, 2025
Shared latent representations of speech production for cross-patient speech decoding

This article has 11 authors:
1. Z. Spalding
2. S. Duraivel
3. S. Rahimpour
4. C. Wang
5. K. Barth
6. C. Schmitz
7. S. P. Lad
8. A. H. Friedman
9. D. G. Southwell
10. J. Viventi
11. G. B. Cogan
This article has no evaluationsLatest version Aug 21, 2025
Mixture models for domain-adaptive brain decoding

This article has 2 authors:
1. Aidan Dempster
2. Brokoslaw Laschowski
This article has no evaluationsLatest version Oct 6, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A common neural architecture for encoding finger movements

Shared latent representations of speech production for cross-patient speech decoding

Mixture models for domain-adaptive brain decoding