Longitudinal Forecasting of Retinal Structure and Function Using a Multimodal StyleGAN-Based Architecture
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Generative Adversarial Networks (GANs) have emerged as powerful tools for medical image synthesis and clinical outcome prediction. In ophthalmology, accurate forecasting of Optical Coherence Tomography (OCT) images and best-corrected visual acuity (BCVA) values can significantly enhance patient monitoring and personalized treatment planning. We introduce a multimodal GAN inspired by the StyleGAN architecture, featuring super-resolution modules, a multi-scale patch discriminator, and temporal attention mechanisms. To predict logMAR values, a hybrid deep–shallow LSTM model was jointly trained alongside the image pipeline. Synthesized scans were processed through an EfficientNet-based classifier to predict 16 retinal biomarkers. To ensure subject independence, we employed a 3-fold patient-level cross-validation strategy. The proposed multimodal GAN achieved an SSIM of 0.9264, an FID of 11.9, and a PSNR of 38.1 dB for OCT forecasting. The logMAR module delivered an MAE of 0.052, while the biomarker classifier attained a macro-F1 score of 0.81. Based on logMAR change forecasting, patients were further categorized into Winner, Stabilizer, and Loser outcome groups using a threshold of Δ=0.05, achieving an overall F1 score of 0.84. Our approach effectively forecasts retinal morphology and functional outcomes, providing valuable predictive insights for proactive clinical decision-making in retinal health management.