Reverse-Engineering Speech and Music Categorization from a Single Sound Source

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Classifying whether an auditory signal is music or speech is important for both humans and computational systems. Although previous literature suggests that music and speech are easily separable categories, common experimental approaches may bias findings toward this distinction by relying on stimuli from different sound sources and predefined response labels. Here, we use stimulus material from the dùndún drum–a speech surrogate that can signal either speech-related or musical content. We first replicate standard speech-music categorization results (N=108). Then, we depart from the typical experimental procedure by asking new participants (N=180) to sort and label the stimulus material, without predefined categories. Hierarchical clustering of participants’ stimulus groupings reveals multiple organizing dimensions, with the speech–music distinction reliably present but secondary under label-free conditions. By reverse-engineering the relationship between sorting behavior, acoustic features, and semantic labels, we characterize how speech–music categorization relates to other salient perceptual dimensions and how its behavioral prominence depends on task constraints.

Article activity feed