MAYOCTransformer: Masked-Attention for Yielding Comprehensive Semantic Segmentation of Retinal Optical Coherence Tomography Images using Transformer-based Neural Networks

Run Zhou Ye
Jenna Krivit
Gregor Reiter
Raymond Iezzi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose

Optical coherence tomography (OCT) is a widely used imaging modality in ophthalmology. Accurate semantic segmentation of these images is critical for both clinical and research applications, yet existing convolutional neural network (CNN)-based methods face challenges in generalizability and robustness. This study introduces MAYOCTransformer, the first transformer-based deep learning model for comprehensive semantic segmentation of OCT images, and evaluates its performance against CNN-based models.

Methods

A large dataset of 3,500 OCT images was manually segmented using an iterative deep learning-assisted workflow. The MAYOCTransformer model, based on the Mask2Former architecture, was trained and compared against CNN-based segmentation models, including U-Net, U-Net++, FPN, and DeepLabV3+. Comprehensive segmentation tasks included 10 retinal layer segmentation, choroid stroma and vessel segmentation, and the identification of 9 types of discrete pathological findings including intraretinal fluid (IRF), subretinal fluid (SRF), pigment epithelial detachment (PED), subretinal hyper-reflective material (SHRM), intraretinal hyper-reflective foci, and reticular pseudodrusen. Model performance was evaluated using the Dice similarity coefficient (DSC) on a hold-out test set with five-fold cross-validation. Additional validation was performed using external datasets, open-source segmentation models, and a randomized blinded expert evaluation.

Results

MAYOCTransformer outperformed CNN-based models in most segmentation tasks. Choroid segmentation performance was comparable between MAYOCTransformer and CNN models. External validation demonstrated the model’s generalizability, achieving higher DSC scores than publicly available segmentation models. A blinded expert evaluation showed that MAYOCTransformer’s segmentation was non-inferior to manual annotations.

Conclusion

MAYOCTransformer provides improved segmentation performance over CNN- based models. Its ability to generalize to external datasets suggests potential applicability in clinical and research settings.

Version published to 10.1101/2025.07.08.663601v1 on bioRxiv
Jul 11, 2025

A Unified Vision Transformer and Convolutional Neural Network Framework for Multi-Domain Cancer Classification

This article has 6 authors:
1. Heba M. Emara
2. Walid El-Shafai
3. Naglaa F. Soliman
4. Abeer D. Algarni
5. Fathi E. Abd El-Samie
6. Amira A. Mahmoud
This article has no evaluationsLatest version Jun 13, 2025
A Novel Clinically Explainable Vision Transformer for OCT-Based Retinal Disease Classification: Integrating UniMIE Enhancement and Grad-CAM Interpretability

This article has 6 authors:
1. Vishal Upmanu
2. Jaya Singh
3. Pranshu Saxena
4. Jagendra Singh
5. Shilpa Srivast
6. Aprna Tripathi
This article has no evaluationsLatest version Jun 25, 2025
An Open-Source Generalizable Deep Learning Framework for Automated Corneal Segmentation in Anterior Segment Optical Coherence Tomography Imaging

This article has 8 authors:
1. Lynn Kandakji
2. Siyin Liu
3. Shafi Balal
4. Ismail Moghul
5. Bruce Allan
6. Stephen Tuft
7. Daniel Gore
8. Nikolas Pontikos
This article has no evaluationsLatest version Jun 20, 2025

Listed in

Abstract

Purpose

Methods

Results

Conclusion

Article activity feed

Related articles

A Unified Vision Transformer and Convolutional Neural Network Framework for Multi-Domain Cancer Classification

A Novel Clinically Explainable Vision Transformer for OCT-Based Retinal Disease Classification: Integrating UniMIE Enhancement and Grad-CAM Interpretability

An Open-Source Generalizable Deep Learning Framework for Automated Corneal Segmentation in Anterior Segment Optical Coherence Tomography Imaging