A Deep Hybrid CNN–ViT Architecture Incorporating Advanced 3D Features for the Estimation of Visibility and Runway Visual Range

Anand Shankar
Bikash Chandra Sahana

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Estimating visibility poses significant issues for transportation safety and operational decision-making, especially in severe weather circumstances where image-based evaluation becomes unreliable. Conventional deep learning (DL) models demonstrate limited feature extraction capabilities from compromised images, while physics-based methods require predefined parameters and exhibit inadequate generalization across diverse atmospheric conditions. This study introduces a hybrid architecture that amalgamates various information sources for the continuous assessment of visibility and runway visual range (RVR) from individual images. The proposed architecture includes a three-dimensional feature matrix—the DDT matrix—encoding dark channel, depth, and transmittance components based on atmospheric scattering theory. Physically informed features are combined with learned representations obtained from Convolutional Neural Networks (CNNs) for local degradation pattern identification and Vision Transformers (ViT) for global contextual modelling through self-attention mechanisms. Meteorological factors such as temperature, winds, and atmospheric pressure are integrated to furnish environmental context. A random forest regressor executes multimodal fusion and final estimation from these diverse feature streams. The quantitative assessment of three datasets—Visibility Image Dataset I (daytime), Dataset II (night-time), and Dataset III (mixed climatic conditions)—results in a Root Mean Squared Error (RMSE) of 117 and a Mean Absolute Error (MAE) of 68.81. This indicates a 22% decrease in error relative to single physical feature methodologies (RMSE ≈ 150). Ablation experiments illustrate the impact of each component on total performance. The approach overcomes shortcomings in current methodologies by integrating local and global feature extraction, including explicit physical models with learned representations, and facilitating continuous regression instead of discrete classification. Cross-dataset validation demonstrates consistent performance across several environmental contexts, encompassing both urban and rural environments with differing availability of reference objects. The findings indicate practical usefulness for aviation safety systems, transportation management infrastructure, and atmospheric monitoring networks that necessitate dependable real-time visibility evaluation under adverse meteorological situations.

Version published to 10.21203/rs.3.rs-8678337/v1 on Research Square
Feb 5, 2026

Hybrid CNN-Transformer Ensemble for Enhanced Tank Detection in Aerial Imagery

This article has 1 author:
1. Yunus Serhat Bıçakçı
This article has no evaluationsLatest version Feb 5, 2026
DeepAQI: A Vision-Based EfficientNet Framework for Air Quality Index Prediction from Environmental Metadata

This article has 2 authors:
1. Yash Mishra
2. Kedarnath senapati
This article has no evaluationsLatest version Mar 12, 2026
HydroVision: Predicting Optically Active Parameters in Surface Water Using Computer Vision

This article has 3 authors:
1. Shubham Laxmikant Deshmukh
2. Matthew Wilchek
3. Feras A. Batarseh
This article has no evaluationsLatest version Jan 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Hybrid CNN-Transformer Ensemble for Enhanced Tank Detection in Aerial Imagery

DeepAQI: A Vision-Based EfficientNet Framework for Air Quality Index Prediction from Environmental Metadata

HydroVision: Predicting Optically Active Parameters in Surface Water Using Computer Vision