AudioSet-Tools: A Python Framework for Taxonomy-Aware AudioSet Curation and Reproducible Audio Research

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This work presents AudioSet-Tools, a modular and composable Python framework designed to streamline the creation of task-specific datasets derived from Google’s AudioSet. Despite its extensive coverage, AudioSet suffers from weak labeling, class imbalance, and a loosely structured taxonomy, which limit its practical applicability in machine listening workflows. AudioSet-Tools addresses these issues through configurable taxonomy-aware label filtering and class re-balancing strategies. The framework includes automated routines for data download and transformation, enabling reproducible and semantically consistent dataset generation for both downstream fine-tuning and pre-training of machine/deep learning models. While domain-agnostic, we showcase its versatility through AudioSet-EV, a curated subset focused on emergency vehicle siren recognition — a socially relevant and technically challenging use case that exemplifies the structural and semantic gaps in AudioSet taxonomy. We further provide an extensive comparative benchmark of AudioSet-EV against state-of-the-art emergency vehicle corpora, with source code and datasets openly released on GitHub and Zenodo, to foster transparency and reproducibility in real-world audio signal processing research.

Article activity feed