Evaluation of a Multimodal Convolutional Neural Network-Based Approach for DICOM Files Classification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The Digital Imaging and Communication in Medicine (DICOM) standard preserve both pixel-level image data and clinically relevant metadata. However, conventional deep learning pipelines for medical image classification often discard this metadata by converting DICOM files into formats such as PNG or JPEG, leading to information loss and potential bias. This study evaluated the performance of DICOMFusionNet, a multimodal convolutional neural network (CNN) developed to natively process DICOM files by integrating both image data and embedded metadata, in comparison with widely used transfer learning models. A dataset of 1000 pediatric chest radiographs (425 tuberculosis-positive and 575 controls) from Epicentre, Mbarara Regional Referral Hospital, was used. Images were pre-processed to enhance pulmonary visibility, and relevant metadata fields were normalized and one-hot encoded for integration. DICOMFusionNet was benchmarked against Inception V3, VGG16, VGG19 and ResNet50, all requiring DICOM-to-PNG conversion. Performance was evaluated using accuracy, precision, recall, and F1-score. An ablation study assessed the contribution of metadata to classification performance. DICOMFusionNet achieved superior performance with a test accuracy of 92.3% and F1-score of 0.91, outperforming Inception V3 (86.7%), VGG16 (85.4%), VGG19 (85.9%), and ResNet50 (87.1%). The ablation study revealed a significant drop in accuracy (87.8%) and F1-score (0.85) when metadata was excluded, highlighting its critical role in predictive performance. DICOMFusionNet demonstrates that preserving both image and metadata in medical imaging tasks yields more accurate and context-aware classification. This multimodal approach reduces bias, enhance generalization, and provides a promising framework for clinical decision support in diagnostic imaging.

Article activity feed