A biologically oriented algorithm for spatial sound segregation

Read the full article


Listening in an acoustically cluttered scene remains a difficult task for both machines and hearing-impaired listeners. Normal-hearing listeners accomplish this task with relative ease by segregating the scene into its constituent sound sources, then selecting and attending to a target source. An assistive listening device that mimics the biological mechanisms underlying this behavior may provide an effective solution for those with difficulty listening in acoustically cluttered environments (e.g., a cocktail party). Here, we present a binaural sound segregation algorithm based on a hierarchical network model of the auditory system. In the algorithm, binaural sound inputs first drive populations of neurons tuned to specific spatial locations and frequencies. Lateral inhibition then sharpens the spatial response of the neurons. Finally, the spiking response of neurons in the output layer are then reconstructed into audible waveforms via a novel reconstruction method. We evaluate the performance of the algorithm with psychoacoustic measures of normal-hearing listeners. This two-microphone algorithm is shown to provide listeners with perceptual benefit similar to that of a 16-microphone acoustic beamformer in a difficult listening task. Unlike deep-learning approaches, the proposed algorithm is biologically interpretable and does not need to be trained on large datasets. This study presents a biologically based algorithm for sound source segregation as well as a method to reconstruct highly intelligible audio signals from spiking models.

Author Summary

Animal and humans can navigate complex auditory environments with relative ease, attending to certain sounds while suppressing others. Normally, various sounds originate from various spatial locations. This paper presents an algorithmic model to perform sound segregation based on how animals make use of this spatial information at various stages of the auditory pathway. We showed that the performance of this two-microphone algorithm provides as much benefit to normal-hearing listeners a multi-microphone algorithm. Unlike mathematical and machine-learning approaches, our model is fully interpretable and does not require training with large datasets. Such an approach may benefit the design of machine hearing algorithms. To interpret the spike-trains generated in the model, we designed a method to recover sounds from model spikes with high intelligibility. This method can be applied to spiking neural networks for audio-related applications, or to interpret each node within a spiking model of the auditory cortex.

Article activity feed