CMANet: Cross-Modal Attention Network for 3-D Knee MRI and Report-Guided Osteoarthritis Assessment

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Objective

Knee osteoarthritis (OA) is a leading cause of disability worldwide, with early identification of structural changes critical for improving patient outcomes. While magnetic resonance imaging (MRI) provides rich spatial detail, its interpretation remains challenging due to complex anatomy, subtle lesion presentation, and limited voxel-level annotations. Meanwhile, radiology reports encode semantic and diagnostic insights that are typically underutilized in imaging AI pipelines. In this work, we introduce CMANet, a Cross-Modal Attention Network that integrates 3D knee MRI volumes with their corresponding free-text radiology reports for joint OA severity classification and lesion segmentation. CMANet introduces four key innovations: (1) an asymmetric cross-modal attention mechanism that enables bidirectional information flow between image and text, (2) a weakly supervised anatomical alignment module linking report phrases to MRI regions, (3) a multi-task prediction head for simultaneous OA grading and voxel-level lesion detection, and (4) interpretable attention pathways for tracing predictions to report language and anatomical structures. Evaluated on a dataset of 642 patients with paired MRI and radiology reports, CMANet achieved significant improvements over unimodal baselines—boosting KL-grade classification AUC from 0.769 to 0.871 (Δ =0.102, p=0.004) and increasing Dice scores for cartilage and BML lesion segmentation. The model also demonstrated generalizability in predicting 2-year OA progression (AUC=0.804) and achieved improved alignment between anatomical regions and textual descriptions. These results highlight the potential of multimodal learning to enhance diagnostic accuracy, spatial localization, and explainability in musculoskeletal imaging.

Article activity feed