An Analysis of English Vowel Variation in Pakistani vs. Arabic Talkers: A Computational Acoustic and Machine Learning Approach

Nadia Safeer

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The paper examines the acoustic properties of the production of the English vowels by the non-native speakers with two language and cultural backgrounds, namely Pakistani English (PakE) and Arabic English (ArE). The study, through a multi-methodological framework premised on machine learning, explores the impact of the first language on the production of English vowels amongst native speakers of Pahari in Pakistan and Arabic speakers in Saudi Arabia (KSA). The task of the participants (10 participants per region, mixed-sex) was to create a list of English words with specific emphasis on 10 target vowels inserted into carrier sentences with CVC (hVd) structure and no pauses. F1 and F2 formant frequencies and the duration of the vowel were extracted using PRAAT version 6.1.04. Analysis and visualisation of this data was performed in Python and involved the use of vowel space plots, computation of Euclidean distances, and patterns of clustering among the speakers. Vowel classification and predicting speaker groups were analyzed by supervised and unsupervised machine learning algorithms, including k-means clustering and logistic regression. This was the process that demonstrated phonological patterns in the two groups with system. The results indicated that there were consistent internal differences in each of the groups and significant differences compared to the British English vowel targets. These findings indicate that PakE and ArE have organized phonological regulations. The implications of the study are on the teaching of pronunciation, building of speech recognition systems, and the development of region-specific text-to-speech (TTS) synthesisers. The study also discusses the importance of open-source tools in computational phonetics, with Python-based analysis becoming a common element of code-driven processing.

Version published to 10.20944/preprints202601.2282.v1
Jan 29, 2026

Differentiating second language vowels based on diverse phonetic input

This article has 1 author:
1. Jonas Albæk Villumsen
This article has no evaluationsLatest version Jan 26, 2026
Acoustic Analysis of Maqam Saba (Arabic Musical Mode): Quantitative Detection of Microtones Using Python

This article has 1 author:
1. Ali Mamdouh Mohamed Ahmed
This article has no evaluationsLatest version Jan 28, 2026
Vectorization and Sentiment Analysis of Arabizi Text

This article has 4 authors:
1. noha youssef
2. Sama Gouda
3. Farida Madkour
4. Mona Ibrahim
This article has no evaluationsLatest version Jan 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Differentiating second language vowels based on diverse phonetic input

Acoustic Analysis of Maqam Saba (Arabic Musical Mode): Quantitative Detection of Microtones Using Python

Vectorization and Sentiment Analysis of Arabizi Text