Du-IN-v2: Unleashing the Power of Vector Quantization for Decoding Cognitive States from Intracranial Neural Signals
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
While invasive brain-computer interfaces have shown promise for high-performance speech decoding under medical use, the potential of intracranial stereoElectroEncephaloGraphy (sEEG), which causes less damage to patients, remains underexplored. With the rapid progress in representation learning, leveraging abundant pure recordings to further enhance speech decoding becomes increasingly attractive. However, some popular methods pre-train temporal models based on brain-level tokens, overlooking the brain's desynchronization nature; others pre-train spatial-temporal models based on channel-level tokens, yet fail to evaluate them on more challenging tasks, e.g., speech decoding, which demands intricate processing in specific brain regions. To tackle these issues, we introduce a general pre-training framework for speech decoding -- Du-IN-v2, which can extract contextual embeddings based on region-level tokens through discrete codex-guided mask modeling. To further push its limits, we propose Decoupling Product Quantization (DPQ), where different codexes are designed to extract different parts of brain dynamics. Our model achieves SOTA performance on both the 61-word classification task and the 49-syllable sequence classification task, surpassing all baselines. Model comparison and ablation studies reveal that our design choices, including (i) temporal modeling based on region-level tokens by utilizing 1D depthwise convolution to fuse channels in vSMC and STG regions and (ii) self-supervision by discrete decoupling codex-guided mask modeling, significantly contribute to these performances. Collectively, our approach, inspired by neuroscience findings, capitalizing on region-level representations from specific brain regions, is suitable for invasive brain modeling. It marks a promising neuro-inspired AI approach in BCI.