Prostate Cancer Detection in Bi-parametric MRI Using Deep Learning Model

Ghulfam Hussain

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Prostate cancer is an example of a widespread cancer among men in the world and early and accurate diagnosis plays a vital role in enhancing the likelihood of a more favorable patient outcome and reducing the occurrence of invasive surgeries. In the last several years, computer-aided diagnosis systems with deep learning have also shown significant potential on the analysis of medical images, although the conventional convolutional neural networks have the tendency to fail to recreate the long-range contextual attributes in multi-faceted data of a magnetic resonance imaging (MRI). To address these limitations, the current research work is premised on investigating the effectiveness of transformer-based architecture to identify prostate cancer with a comparative analysis of two architectures, Vision Transformer (ViT) and Swin Transformer. The first step in this research involves processing prostate MRI images by first using a complete preprocessing process that entails image normalization, data augmentation to a clinical relevance that ensures that images are better and that the process also tries to eliminate class imbalance. ViT and Swin Transformer are then pretrained and used to learn prostate tissue discriminative representation by extracting features using their respective self-attention mechanisms. The extracted features are then subjected to the supervised classification, in which the performance of the model is evaluated using the typical metrics of analysis such as the accuracy and precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). Both transformer-based models can be compared as competitive in the prostate cancer detection task, as Vision Transformer is more effective in capturing the global context, and Swin Transformer is more effective in capturing the hierarchical feature representation. The cross-validation findings are also in favor of the stability of the proposed framework and its capacity to be generalized. Overall, the current paper demonstrates that the transformer-based models can possibly be applied in automated diagnosis of prostate cancer, and that it may be possible to gain a clearer idea of their flaws and strengths to create AI-assisted screening systems that are clinically reliable in the future.

Version published to 10.21203/rs.3.rs-9407693/v1 on Research Square
Apr 15, 2026

Optimizing Deep Learning for Skin Cancer: A Comparative Study of Convolutional and Attention-Based Models

This article has 1 author:
1. Khaled Wael Ezzat
This article has no evaluationsLatest version Apr 8, 2026
Decoding Tumor Phenotypes: A Radiologist-Inspired Deep Learning Framework for Breast Cancer Recurrence Prediction

This article has 17 authors:
1. Tao Tan
2. Chunyao Lu
3. Tianyu Zhang
4. Xinglong Liang
5. Antonio Portaluri
6. Luyi Han
7. Yaqian Chen
8. Nika Rasoolzadeh
9. Ruixiang Qi
10. Yuan Gao
11. Xin Wang
12. Yaofei Duan
13. Zahra Aghdam
14. Muzhen He
15. Jonas Teuwen
16. Maciej Mazurowski
17. Ritse Mann
This article has no evaluationsLatest version Apr 15, 2026
Graph-Based Learning and Multimodal Learning for Colon Disease Classification: An Interpretable Study using CNN-GNN Pipelines and Vision-Language Models

This article has 5 authors:
1. Shahriar Sultan. Ramit
2. Alaya Parven. Alo
3. Md. Sadekur Rahman
4. Masud Rana Rashel
5. A. K.M. Kamrul Islam
This article has no evaluationsLatest version Apr 10, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Optimizing Deep Learning for Skin Cancer: A Comparative Study of Convolutional and Attention-Based Models

Decoding Tumor Phenotypes: A Radiologist-Inspired Deep Learning Framework for Breast Cancer Recurrence Prediction

Graph-Based Learning and Multimodal Learning for Colon Disease Classification: An Interpretable Study using CNN-GNN Pipelines and Vision-Language Models