Realization of Indoor Positioning Estimation in RF Communication Using Hybrid CNN and Transformer Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Indoor localization is a critical technology for numerous applications, yet it remains a challenge due to the unreliability of GPS signals within buildings. Wi-Fi Received Signal Strength Indicator (RSSI) fingerprinting offers a cost-effective alternative, but its accuracy is often compromised by signal fluctuations inherent in complex indoor environments. This paper addresses these challenges by proposing and evaluating two novel hybrid deep learning architectures for robust floor-level identification. The first proposed architecture (Model A) integrates Convolutional Neural Networks (CNNs) for spatial feature extraction with parallel Bidirectional Long Short-Term Memory (BiLSTM) and Multi-Head Self-Attention (MHA) pathways to capture temporal dynamics and contextual signal relationships, achieving an accuracy of 0.9892 on the UJIIndoorLoc dataset. Further investigating alternative hybrid configurations, we introduce a second proposed architecture, the Hybrid CNN-Transformer-Dilated Network (HCTD-Net, Model B). The HCTD-Net combines a CNN frontend with a Transformer encoder for global context modeling and a parallel dilated convolution pathway designed to enhance the temporal receptive field without resolution loss. This HCTD-Net demonstrated strong performance on a relevant indoor localization dataset, achieving an overall accuracy of 0.9754, a macro F1-score of 0.9753, and a macro average specificity of 0.9938. Both proposed models significantly outperform baseline methods, indicating that these distinct hybrid deep learning strategies effectively mitigate RSSI variability and provide highly accurate and reliable solutions for floor determination in multi-building, multi-floor indoor settings. The HCTD-Net, in particular, presents a novel synergistic combination of Transformer and dilated convolution mechanisms for advanced temporal feature learning in this domain.