Integrating multi-scale cross-attention and graph-guided label reasoning for multi-label chest X-ray classification

Guokun Shi
Zijian Wang
Yucheng Shi
Jingwen Pan
Liping Sun
Fang Fang
LI Jin

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Multi-label chest X-ray (CXR) classification is challenging because abnormalities span a diverse set of spatial scales and disease labels are strongly interdependent. We develop a visual–semantic framework that jointly models multi-scale visual fusion and label-prior-guided decoding.The visual encoder has two parallel branches: a Vision Transformer (ViT) branch captures global anatomical context, whereas a DenseNet-121 branch extracts local texture cues from intermediate convolutional stages.We align and fuse the two representations using a multi-scale bidirectional cross-attention module.To model label dependencies more explicitly, we build a label graph using semantic label embeddings and training-set co-occurrence statistics and apply a graph convolutional network (GCN) to generate label embeddings that initialize the Transformer decoder’s label queries.On ChestX-ray14 and CheXpert, we achieved mean areas under the ROC curve (AUCs) of 0.849 and 0.815, respectively.Qualitative visualizations further indicate better alignment between label queries and disease-relevant regions in selected examples.Overall, our results suggest that integrating global and local visual evidence with explicit label priors improves multi-label CXR classification.

Version published to 10.21203/rs.3.rs-9268273/v1 on Research Square
Apr 16, 2026

Label-Graph Guided Semantic Alignment for Multi-Class Remote Sensing Image Recognition

This article has 4 authors:
1. KUN ZHOU
2. LIWEI ZHU
3. YI ZHANG
4. CUNCUN WEI
This article has no evaluationsLatest version Apr 2, 2026
HDFF-Net: A Hybrid Dual-Feature Fusion Network with Cross-Modal Attention for Automated Colposcopic Transformation Zone Classification

This article has 2 authors:
1. B. Shubhaker¹
2. B. S. Raghavendra²
This article has no evaluationsLatest version Apr 7, 2026
RNNet-MST: A ResNet-50 with Multi-Scale Transformer Blocks for Pulmonary Nodule Classification and Attention-Based Localization on Chest X-Ray Images

This article has 9 authors:
1. Edrill F. Bilan
2. Emman T. Manduriaga
3. Hernando S. Salapare
4. Ymir M. Garcia
5. Khatalyn E. Mata
6. Rose Anna R. Banal
7. Imelda C. Ang
8. Wei-Ta Chu
9. Dan Michael A. Cortez
This article has no evaluationsLatest version May 21, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Label-Graph Guided Semantic Alignment for Multi-Class Remote Sensing Image Recognition

HDFF-Net: A Hybrid Dual-Feature Fusion Network with Cross-Modal Attention for Automated Colposcopic Transformation Zone Classification

RNNet-MST: A ResNet-50 with Multi-Scale Transformer Blocks for Pulmonary Nodule Classification and Attention-Based Localization on Chest X-Ray Images