Detection of Speech Steganography for VoIP Stream Based on Deep Learning Approach in G.729 Codec

Hojat Allah Moghadasi
Hamid Dehghani

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

During the last few years, Voice over Internet Protocol (VoIP) has been extensively utilized in real-time communications and social networks so that has become an applicable carrier for steganography techniques and secret communication. To counter these security threats, numerous steganalysis approaches have been developed, among which the integration of signal processing and machine learning techniques has enabled the creation of highly accurate steganalyzers. This research paper proposes a hybrid approach that combines speech signal processing methods with Artificial Intelligence (AI). Data preprocessing is first applied to audio signals compressed in G.729 codec, which effectively extracts intra-frame features and inter-frame correlations. The resulting data are given into a deep learning network for the training model to distinguish between cover data and stego data. The evaluation of the implementation results demonstrates significant improvements in both detection accuracy and computational efficiency. The proposed technique is evaluated for two major steganography families, namely Quantization Index Modulation (QIM) and Pitch Modulation Steganography (PMS), as well as their combined application, Heterogeneous Parallel Steganography (HPS). This method is tested and implemented for various embedding rates (From a range of 10–100%) and diverse segment lengths (From a range of 100 ms to 1000 ms) of the audio data. All three techniques: QIM, PMS, and HPS show a notable superiority in accuracy (Detection accuracy of improvement from a range of 1–11% based on segment lengths and embedding rates) when compared to conventional methods. During the steganalysis testing phase for 1000 ms audio files, the response test time was less than 5ms, highlighting the high speed of the suggested model in the testing step. Besides, the deep learning model's architecture depicts superior execution speed in both the training step (In most cases, minimum time improvement is ¼ ) and testing step (In most cases, minimum time improvement is ⅓ ) when compared to similar approaches.

Version published to 10.21203/rs.3.rs-6115140/v1 on Research Square
Apr 23, 2025

Forward Thinking of Detecting Indiscernible Video Counterfeits

This article has 3 authors:
1. C S Nakul Kalyan
2. M K Bala Kaaurthik
3. D. Jagadiswary
This article has no evaluationsLatest version Jun 26, 2025
Towards Secure Social Platforms: Hate Speech Detection and Classification in Indian Languages Using Hybrid Soft Computing Techniques

This article has 1 author:
1. Purbani Kar
This article has no evaluationsLatest version Jul 25, 2025
A deep learning method for automatic modulation recognition in the time--frequency domain

This article has 1 author:
1. Kyungsup Kim
This article has no evaluationsLatest version Jun 9, 2025

Listed in

Abstract

Article activity feed

Related articles

Forward Thinking of Detecting Indiscernible Video Counterfeits

Towards Secure Social Platforms: Hate Speech Detection and Classification in Indian Languages Using Hybrid Soft Computing Techniques

A deep learning method for automatic modulation recognition in the time--frequency domain