Longitudinal Forecasting of Retinal Structure and Function Using a Multimodal StyleGAN-Based Architecture
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Generative Adversarial Networks (GANs) have emerged as powerful tools for medical image synthesis and clinical outcome prediction. In ophthalmology, accurate forecasting of Optical Coherence Tomography (OCT) images and Best Corrected Visual Acuity (BCVA) values can significantly enhance patient monitoring, personalized treatment planning, and early clinical intervention. Methods: We introduce a multimodal GAN framework inspired by StyleGAN architecture, enhanced with super-resolution modules, a multi-scale patch discriminator, and temporal attention mechanisms. For predicting logMAR time series values, we incorporated a hybrid deep-shallow LSTM model, jointly trained alongside the image generation pipeline. After OCT image generation, the synthesized scans were fed into a classification model to predict relevant retinal biomarkers, enabling a com bined structural and functional assessment. Experiments were conducted using the publicly available OLIVES dataset, which contains data from 87 patients and 96 unique eyes, with more than 67,000 OCT images collected from the PRIME and TREX-DME clinical trials. To ensure subject independence between training and testing, we used a patient-level split, assigning 70% of patients to training and 30% to testing. This yields a held-out test set of approximately 26 patients (around 29 eyes), providing a sufficiently large and clinically diverse cohort to make our reported results reliable. Model evaluation was performed using an image similarity-based approach for predicted OCTs, while BCVA (Best Corrected Visual Acuity) and CST (Central Subfield Thickness) predictions were assessed using mean absolute error (MAE), peak signal-to-noise ratio (PSNR), and trend-based categorization of logMAR into improvement, stabilization, or deterioration, with subsequent evaluation via class-wise recall, precision, and F1 scores. Results: The proposed multimodal GAN achieved a Structural Similarity Index (SSIM) of 0.9264, Fréchet inception distance(FID) 11.9 and a Peak Signal-to-Noise Ratio (PSNR) of 38.1 dB for OCT image forecasting, demonstrating superior anatomical fidelity and perceptual quality. The logMARprediction module delivered accurate forecasting performance, with a Mean Absolute Error (MAE) of 0.052 and Mean Squared Error (MSE) of 0.0058, closely aligned with observed clinical outcomes. Conclusions: The developed multimodal GAN approach effectively forecasts future OCT scans, predicts logMAR values, and identifies retinal biomarkers, offering valuable predictive insights into patient trajectories. Such integrative forecasting supports personalized clinical decision-making and proactive disease management in ophthalmology, with potential implications for improving patient outcomes and clinical workflows.