Lung Cancer Multimodal Auxiliary Diagnosis Based on Entropy Weight Decision Fusion
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Lung cancer is one of the malignant tumors with the highest incidence and mortality rates worldwide. Traditional clinical diagnosis relies heavily on physicians' experience, which is associated with problems such as strong subjectivity, high rates of misdiagnosis and missed diagnosis, and significant disparities in regional medical standards. With the breakthroughs of deep learning in computer vision and natural language processing (NLP), multimodal data-driven auxiliary diagnosis—integrating computed tomography (CT) images and clinical text—has emerged as a research hotspot. However, simple concatenation of heterogeneous image and text data often fails to achieve effective feature alignment, leading to suboptimal medical decision-making performance. To address the aforementioned issues, this paper proposes a lung cancer multimodal auxiliary diagnosis model based on entropy weight decision fusion. Methods: This study adopted a retrospective cohort design, enrolling 5,847 patients from 2020 to 2025 (including 1,823 lung cancer patients, 2,253 normal control patients, and 1,771 pulmonary nodule control patients) for the analysis of their images and CT reports. Three datasets were constructed, with each randomly sampled from the original dataset. The study incorporated ViT (Vision Transformer) and BERT (Bidirectional Encoder Representations from Transformers) as feature extractors for images and text, respectively, to extract high-dimensional semantic features from lung CT images and CT Imaging Report . Secondly, independent classifiers based on Multi-Layer Perceptron (MLP) were established to convert the embedding vectors of different modalities into predicted probability distributions (Logits). Finally, the entropy weight method was employed to adaptively fuse the decision results of images and text. Model performance was validated using 5-fold cross-validation, with evaluation metrics including the Area Under the Receiver Operating Characteristic Curve (AUC), Accuracy , Precision ,Recall , and F1-score. Results: The proposed method in this study can fully leverage the complementary information from CT images and imaging text multimodality. On the clinical lung cancer dataset, it achieved an accuracy of 0.9375, a precision of 0.9324, a Recall of 0.9322 ,and an F1-score of 0.9322, significantly improving diagnostic performance. Conclusion: This study validates that decision fusion of multimodal data outperforms single-modality models in terms of diagnostic Accuracy, Precision, and Recall on real-world lung cancer datasets. It provides an effective solution for clinical auxiliary diagnosis of lung cancer.