TempSem-GraphNet: Temporal-Semantic Graph Network for Coherent Chest X-ray Report Generation
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Automated chest X-ray (CXR) report generation often overlooks the vital temporal dimension and fails to integrate diverse information modalities effectively, resulting in reports lacking coherence and clinical utility. To address these limitations, we propose TempSem-GraphNet, a novel framework that explicitly models multi-modal temporal-semantic relationships for generating coherent and accurate reports from multi-temporal CXR images. Our core innovation is the Multi-modal Temporal-Semantic Graph (TempSem-Graph), which unifies visual lesions from current and historical CXRs with semantic concepts extracted from prior reports, linked by temporal, semantic, and modality-specific edges. This graph is processed by a Hierarchical Temporal-Semantic Graph Attention Network (HTGAT) to aggregate context-rich features, which then condition a fine-tuned Large Language Model (LLM) for report generation. Evaluated on the MIMIC-CXR-JPG dataset, TempSem-GraphNet significantly outperforms state-of-the-art baselines across natural language generation and clinical entity metrics. Human evaluations further corroborate our quantitative findings, demonstrating superior temporal coherence, clinical accuracy, and utility. Our work represents a significant step towards automating longitudinal medical reporting with enhanced precision and clinical relevance.