AudioSet-Tools: A Python Framework for Taxonomy-Aware AudioSet Curation and Reproducible Audio Research
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This work presents AudioSet-Tools, a modular and composable Python framework designed to streamline the creation of task-specific datasets derived from Google’s AudioSet. Despite its extensive coverage, AudioSet suffers from weak labeling, class imbalance, and a loosely structured taxonomy, which limit its practical applicability in machine listening workflows. AudioSet-Tools addresses these issues through configurable taxonomy-aware label filtering and class re-balancing strategies. The framework includes automated routines for data download and transformation, enabling reproducible and semantically consistent dataset generation for both downstream fine-tuning and pre-training of machine/deep learning models. While domain-agnostic, we showcase its versatility through AudioSet-EV, a curated subset focused on emergency vehicle siren recognition — a socially relevant and technically challenging use case that exemplifies the structural and semantic gaps in AudioSet taxonomy. We further provide an extensive comparative benchmark of AudioSet-EV against state-of-the-art emergency vehicle corpora, with source code and datasets openly released on GitHub and Zenodo, to foster transparency and reproducibility in real-world audio signal processing research.