Fine-Tuning Whisper for American English Air Traffic Control Speech Recognition: A Data-Efficient Pipeline

Jeffrey Su
Omar Haq

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Automatic speech recognition (ASR) for air traffic control (ATC) presents a persistent domain adaptation challenge: VHF radio channel degradation, specialized vocabulary, and a near-total absence of American English training corpora in the published literature. The state-of-the-art open ATC ASR model—WhisperATC by van Doorn et al. [20], fine-tuned on European ATCO2 and ATCOSIM corpora—achieves 3.88% word error rate (WER) on ATCOSIM (speaker-split evaluation) but degrades to 30.3% on American ATC transmissions, exposing a systematic accent and phraseology mismatch. We present a data-efficient fine-tuning pipeline that adapts Whisper Large v3 [17] to American English ATC using only 55 manually transcribed clips recorded from three major US airports (KIAH, KJFK, KSFO) via LiveATC.net. Domain-matched audio preprocessing—a 300–3400 Hz Butterworth bandpass filter with EBUR128 loudness normalization—combined with five-fold stochastic data augmentation addresses the limited corpus size. Full fine-tuning with conservative hyperparameters achieves 13.7% WER, a 54.8% relative reduction from the European-trained baseline, using 370× fewer training clips than the most comparable prior study. A secondary contribution is the characterization of a structural incompatibility between the HuggingFace PEFT LoRA implementation and Whisper’s log-mel spectrogram encoder that prevents parameter-efficient fine-tuning without modification of library internals. All code, the fine-tuned model, and training notebooks are publicly available.

Version published to 10.21203/rs.3.rs-8970162/v1 on Research Square
Feb 26, 2026

Reg2Bangla: An End-to-End Regional Speech Standardization

This article has 7 authors:
1. Samiul Basir Bhuiyan
2. Md Sazzad Hossain Adib
3. Mohammed Aman Bhuiyan
4. Aritra Islam Saswato
5. Ahmed Faizul Haque Dhrubo
6. Mohammad Ashrafuzzaman Khan
7. Mohammad Abdul Qayum
This article has no evaluationsLatest version Mar 17, 2026
Speaker-Aware Simulation Improves Conversational Speech Recognition

This article has 2 authors:
1. Máté Gedeon
2. Péter Mihajlik
This article has no evaluationsLatest version Mar 10, 2026
Benchmarking Self-Supervised Speech Models on Multilingual Nigerian Speech

This article has 2 authors:
1. Omotayo Omoyemi
2. Ifeoluwa Oladeni
This article has no evaluationsLatest version Mar 20, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Reg2Bangla: An End-to-End Regional Speech Standardization

Speaker-Aware Simulation Improves Conversational Speech Recognition

Benchmarking Self-Supervised Speech Models on Multilingual Nigerian Speech