An ensemble-based deep learning method through multi-scale cross-attention training for cephalometric landmark localization on lateral X-ray images
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Cephalometric landmark localization constitutes a core step in the diagnosis and treatment planning of orthodontics, or-thognathic surgery, and maxillofacial surgery. However, the manual marking process is time-consuming, laborious, and subject to inter-observer differences. Despite the advancements made by deep learning-based methods (such as heatmap regression), existing single-scale feature extraction and two-stage strategies still encounter problems like limited accuracy, error accumulation, and low computational efficiency. To this end, this paper proposes a novel multi-scale cross-attention training framework and ensemble learning strategy, aiming to enhance the robustness and accuracy of landmark localization by collaboratively modeling local details and global context relationships. The multi-scale cross-attention mechanism captures the local details and global spatial dependencies of anatomical landmarks through cross-scale feature interaction, generating complementary and enhanced feature representations. The strategy of heterogeneous model ensemble learning can combine the cross-attention features of multiple network architectures to mitigate the performance fluctuations of a single model and optimize the balance of landmark prediction. On the ISBI dataset (19 landmarks) and the MICCAI 2023 CL-Detection Challenge dataset (38 landmarks), this method achieved leading performances of 82.58% SDR (1.44 mm MRE) and 77.95% SDR (1.73 mm MRE), respectively, surpassing the existing optimal methods. The ablation experiments further indicated that the multi-scale interaction and ensemble strategies both contributed improvements in SDR. This approach realizes high-precision fully automatic localization in complex multi-landmark scenarios, providing efficient and consistent quantitative analysis tools for clinical practice. Simultaneously, it offers a universal technical framework for multi-scale modeling and model ensemble in key point detection tasks of medical images.