Graph-Based Learning and Multimodal Learning for Colon Disease Classification: An Interpretable Study using CNN-GNN Pipelines and Vision-Language Models

Shahriar Sultan. Ramit
Alaya Parven. Alo
Md. Sadekur Rahman
Masud Rana Rashel
A. K.M. Kamrul Islam

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Colorectal cancer (CRC) is a significant health issue in the world that requires the use of improved diagnostic instruments to detect it at an early and precise stage. In this paper, the researcher proposes an interpretable classification of colon diseases based on endoscopic images of the Kvasir V2 data set. Each image was subjected to a systematic preprocessing pipeline prior to being model trained to be consistent and better represent features. The size of the images was reduced to 224 224 pixels to match the specifications of deep learning inputs. Pixel intensities were brought to a stable value to enable convergence and contrast enhancement was used to improve the visibility of the mucosal textures. The edge sharpening methods included unsharp masking and Laplacian filter as a way of emphasizing structural boundaries and highlighted the edges of lesions and polyp margins. To augment the data, random rotation and flips, zoom scaling, and light articulations were introduced to enhance the diversity of the data as well as reduce overfitting and enhance resistance to real-world variation in colonoscopy imaging. In this paper, a hybrid pipeline consisting of CNNs and GNNs is suggested to extract visual features and model relational dependencies and Vision-Language Models (VLMs), which combines Vision Transformer (ViT) with BERT to learn across multiple modalities. It tested various methods of creating graphs (cosine similarity, ε-radius, k-nearest neighbors) and GNN models (GCN, GAT, GraphSAGE, GIN) and reached the highest accuracy of 91 percent with ViT + Epsilon + GIN. Tuned ViT-BERT performed the best with 95.17% accuracy and 0.95 F1-score. Grad-CAM visualizations have further improvements in interpretability as they demonstrate clinically-relevant areas of pictures, which place the framework as a strong, interpretable, and transparent instrument of automated CRC diagnostics in various clinical environments.

Version published to 10.21203/rs.3.rs-9038592/v1 on Research Square
Apr 10, 2026

Prostate Cancer Detection in Bi-parametric MRI Using Deep Learning Model

This article has 1 author:
1. Ghulfam Hussain
This article has no evaluationsLatest version Apr 15, 2026
Detection of Cancer-Associated Nuclei in Histopathology Images Using Deep Convolutional Neural Networks

This article has 1 author:
1. Afya Shaikh
This article has no evaluationsLatest version Mar 31, 2026
Optimizing Deep Learning for Skin Cancer: A Comparative Study of Convolutional and Attention-Based Models

This article has 1 author:
1. Khaled Wael Ezzat
This article has no evaluationsLatest version Apr 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Prostate Cancer Detection in Bi-parametric MRI Using Deep Learning Model

Detection of Cancer-Associated Nuclei in Histopathology Images Using Deep Convolutional Neural Networks

Optimizing Deep Learning for Skin Cancer: A Comparative Study of Convolutional and Attention-Based Models