Enhancing the Accuracy of Monopole and Dipole Source Identification with Vision Transformer

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The identification of mixed monopole and dipole sound sources under highly randomized acoustic environments is of interest in many industrial applications. The DAMAS–MS method is one of the few methods that has been explicitly developed to address this problem. However, it suffers from a critical constraint in that it consistently exhibits limited accuracy in identifying monopole sources, which leads to their underestimation in the final results. To overcome this constraint, this paper proposed a novel identification framework that integrates vision transformer (ViT) with beamforming techniques. The framework leverages preliminary beamforming results to construct input features by extracting the real and imaginary components of the cross-spectral matrix at target frequencies and incorporating spatial position encodings derived from estimated source locations. To ensure adaptability to varying source densities, multiple ViT sub-models are trained on representative scenarios. This strategy enables effective generalization across the target range and supports multi-label identification of monopole and dipole sources with varied configurations. Furthermore, anechoic chamber experiments with synthesized monopole and dipole emitters validate the method’s stability under single-frequency excitation. Compared to the DAMAS–MS method, the proposed method achieves improved identification accuracy for monopole sources, while maintaining comparable performance in dipole source identification, underscoring its potential for practical applications.

Article activity feed