Developing and Evaluating Deep Learning Approaches for Visual Field Denoising in Glaucoma
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Purpose
To investigate the relative efficacy of nine distinct visual field (VF) denoising artificial intelligence (AI) methods and a pathology-aware AI strategy to discourage over-correction of glaucomatous defects.
Design
Retrospective study
Participants
87,940 paired visual field (VF) and optical coherence tomography (OCT) samples from a tertiary academic center.
Methods
Denoising models were trained on a separate VF-only dataset and evaluated on an independent structure-function dataset of paired VF-OCT samples. We implemented and evaluated nine distinct VF denoising strategies representing three broad categories: baseline measurements, self-supervised and image restoration models (including Noise2Noise, Noise2Void, and NAFNet), and latent variable compression-based models (autoencoders and variational autoencoders). All models were designed to reconstruct VF sensitivity maps. We then predicted retinal nerve fiber layer thickness (RNFLT) maps from the denoised VFs using a fixed, independently trained VF-to-RNFLT prediction model.
Main Outcome Measures
Predicted VF and RNFLT maps and resultant evaluation metrics.
Results
The raw VF baseline achieved a global R² of 0.5468 and MAE of 16.83 μm. Restoration-based models maintained or slightly improved concordance, with the pathology-aware NAFNet achieving the highest global R² (0.5485) and a comparable MAE (16.82 μm). In contrast, compression-based models degraded concordance, with CNN-VAE showing a significant reduction (R² ≈ 0.50).
In severe glaucoma, concordance decreased across all methods; however, compression architectures exhibited disproportionately greater degradation compared with restoration-based approaches.
Conclusions
We present a comparative benchmark of AI-based VF denoising strategies paired with structure–function evaluation. While restoration-based models can reduce variability without loss of biological signal, latent compression risks attenuating clinically meaningful defects. Visually smoother fields are not necessarily more biologically accurate.