An ensemble-based deep learning method through multi-scale cross-attention training for cephalometric landmark localization on lateral X-ray images

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Cephalometric landmark localization constitutes a core step in the diagnosis and treatment planning of orthodontics, or-thognathic surgery, and maxillofacial surgery. However, the manual marking process is time-consuming, laborious, and subject to inter-observer differences. Despite the advancements made by deep learning-based methods (such as heatmap regression), existing single-scale feature extraction and two-stage strategies still encounter problems like limited accuracy, error accumulation, and low computational efficiency. To this end, this paper proposes a novel multi-scale cross-attention training framework and ensemble learning strategy, aiming to enhance the robustness and accuracy of landmark localization by collaboratively modeling local details and global context relationships. The multi-scale cross-attention mechanism captures the local details and global spatial dependencies of anatomical landmarks through cross-scale feature interaction, generating complementary and enhanced feature representations. The strategy of heterogeneous model ensemble learning can combine the cross-attention features of multiple network architectures to mitigate the performance fluctuations of a single model and optimize the balance of landmark prediction. On the ISBI dataset (19 landmarks) and the MICCAI 2023 CL-Detection Challenge dataset (38 landmarks), this method achieved leading performances of 82.58% SDR (1.44 mm MRE) and 77.95% SDR (1.73 mm MRE), respectively, surpassing the existing optimal methods. The ablation experiments further indicated that the multi-scale interaction and ensemble strategies both contributed improvements in SDR. This approach realizes high-precision fully automatic localization in complex multi-landmark scenarios, providing efficient and consistent quantitative analysis tools for clinical practice. Simultaneously, it offers a universal technical framework for multi-scale modeling and model ensemble in key point detection tasks of medical images.

Article activity feed