Patient-Level Classification of Rotator Cuff Tears on Shoulder MRI Using an Explainable Vision Transformer Framework

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background/Objectives: Diagnosing Rotator Cuff Tears (RCTs) via Magnetic Resonance Imaging (MRI) is clinically challenging due to complex 3D anatomy and significant in-terobserver variability. Traditional slice-centric Convolutional Neural Networks (CNNs) often fail to capture the necessary volumetric context for accurate grading. This study aims to develop and validate the Patient-Aware Vision Transformer (Pa-ViT), an explainable deep learning framework designed for the automated, patient-level classification of RCTs (Normal, Partial-Thickness, and Full-Thickness). Methods: A large-scale retrospective dataset comprising 2,447 T2-weighted coronal shoulder MRI examinations was utilized. The proposed Pa-ViT framework employs a Vision Transformer (ViT-Base) backbone within a Weakly-Supervised Multiple Instance Learning (MIL) paradigm to aggregate slice-level semantic features into a unified patient diagnosis. The model was trained using a weighted cross-entropy loss to address class imbalance and was benchmarked against widely used CNN architectures and traditional machine learning classifiers. Results: The Pa-ViT model achieved a high overall accuracy of 91% and a macro-averaged F1-score of 0.91, significantly outperforming the standard VGG-16 baseline (87%). Notably, the model demonstrated superior discriminative power for the challenging Partial-Thickness Tear class (ROC AUC: 0.903). Furthermore, Attention Rollout visualizations confirmed the model’s reliance on genuine anatomical features, such as the supraspinatus footprint, rather than artifacts. Conclusions: By effectively modeling long-range dependencies, the Pa-ViT framework provides a robust alternative to traditional CNNs. It offers a clinically viable, explainable decision support tool that enhances diagnostic sensitivity, particularly for subtle partial-thickness tears.

Article activity feed