A Deep Hybrid CNN–ViT Architecture Incorporating Advanced 3D Features for the Estimation of Visibility and Runway Visual Range
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Estimating visibility poses significant issues for transportation safety and operational decision-making, especially in severe weather circumstances where image-based evaluation becomes unreliable. Conventional deep learning (DL) models demonstrate limited feature extraction capabilities from compromised images, while physics-based methods require predefined parameters and exhibit inadequate generalization across diverse atmospheric conditions. This study introduces a hybrid architecture that amalgamates various information sources for the continuous assessment of visibility and runway visual range (RVR) from individual images. The proposed architecture includes a three-dimensional feature matrix—the DDT matrix—encoding dark channel, depth, and transmittance components based on atmospheric scattering theory. Physically informed features are combined with learned representations obtained from Convolutional Neural Networks (CNNs) for local degradation pattern identification and Vision Transformers (ViT) for global contextual modelling through self-attention mechanisms. Meteorological factors such as temperature, winds, and atmospheric pressure are integrated to furnish environmental context. A random forest regressor executes multimodal fusion and final estimation from these diverse feature streams. The quantitative assessment of three datasets—Visibility Image Dataset I (daytime), Dataset II (night-time), and Dataset III (mixed climatic conditions)—results in a Root Mean Squared Error (RMSE) of 117 and a Mean Absolute Error (MAE) of 68.81. This indicates a 22% decrease in error relative to single physical feature methodologies (RMSE ≈ 150). Ablation experiments illustrate the impact of each component on total performance. The approach overcomes shortcomings in current methodologies by integrating local and global feature extraction, including explicit physical models with learned representations, and facilitating continuous regression instead of discrete classification. Cross-dataset validation demonstrates consistent performance across several environmental contexts, encompassing both urban and rural environments with differing availability of reference objects. The findings indicate practical usefulness for aviation safety systems, transportation management infrastructure, and atmospheric monitoring networks that necessitate dependable real-time visibility evaluation under adverse meteorological situations.