Leveraging pretrained vision transformers for automated cancer diagnosis in optical coherence tomography images

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study presents an approach to brain cancer detection based on optical coherence tomography (OCT) images and advanced machine learning techniques. The research addresses the critical need for accurate, real-time differentiation between cancerous and noncancerous brain tissue during neurosurgical procedures. The proposed method combines a pre-trained large vision transformer (ViT) model, specifically DINOv2, with a convolutional neural network (CNN) operating on the grey level co-occurrence matrix (GLCM) texture features. This dual-path architecture leverages both the global contextual feature extraction capabilities of transformers and the local texture analysis strengths of GLCM + CNNs. To mitigate patient-specific bias from the limited cohort, we incorporate an adversarial discriminator network that attempts to identify individual patients from feature representations, creating a competing objective that forces the model to learn generalizable cancer-indicative features rather than patient-specific characteristics. We also explore an alternative state space model approach using MambaVision blocks, which achieves comparable performance. The dataset comprised OCT images from 11 patients, with 5,831 B-frame slices from 7 patients used for training and validation, and 1,610 slices from 4 patients used for testing. The model achieved high accuracy in distinguishing cancerous from noncancerous tissue, with over 99% accuracy on the training dataset, 98.8% on the validation dataset and 98.6% accuracy on the test dataset. This approach demonstrates significant potential for achieving and improving intraoperative decision-making in brain cancer surgeries, offering real-time, high-accuracy tissue classification and surgical guidance.

Article activity feed