A Multimodal Attention-Based Multi-Instance Learning Framework for Fair and Interpretable Pediatric Teledermatology

Khondakar Ashik Shahriar

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Purpose : Pediatric skin diseases are prevalent yet frequently underdiagnosed in low-resource settings across Sub-Saharan Africa due to limited access to specialized dermatological care. This study examines whether a subject-level multimodal learning framework can improve diagnostic accuracy, interpretability, and fairness in pediatric teledermatology across diverse skin types. Methods : A subject-level multimodal multi-instance learning framework is developed in which each patient is represented as a bag of clinical images, with visual features integrated alongside demographic and clinical metadata. A gated attention mechanism is employed to aggregate heterogeneous image instances into interpretable subject-level representations, while multimodal fusion provides contextual information for diagnosis. The framework is evaluated using the PASSION pediatric dermatology dataset across four common skin conditions. Ablation studies and statistical analyses are conducted to assess the contributions of attention-based aggregation and multimodal fusion. Fairness is evaluated across Fitzpatrick skin types. Results : The proposed framework achieves an overall classification accuracy of 82.8\% and a macro F1-score of 0.81. Ablation results demonstrate that gated attention-based aggregation significantly outperforms naive pooling strategies, while multimodal fusion further enhances diagnostic robustness. Fairness analysis indicates stable performance across Fitzpatrick skin types. Conclusion : Subject-level multimodal learning provides a robust, interpretable, and equitable approach for AI-assisted pediatric teledermatology, demonstrating strong potential for improving diagnostic access and quality of care in low-resource clinical environments.

Version published to 10.21203/rs.3.rs-8880185/v1 on Research Square
Feb 18, 2026

A Multimodal Large Reasoning Model For Fair and Interpretable Dermatological Diagnosis Across Skin Tones

This article has 17 authors:
1. Juexiao Zhou
2. Yuhao Shen
3. Zhangtianyi Chen
4. Yuanhao He
5. Yan Xu
6. Shuping Zhang
7. Liyuan Sun
8. Zijian Wang
9. Yinghao Zhu
10. Jiahe Qian
11. Yuyuan Yang
12. Ziwen Wang
13. Xinyuan Zhang
14. Wenbin Liu
15. Zongyuan Ge
16. Tao Lu
17. Siyuan Yan
This article has no evaluationsLatest version Mar 31, 2026
ML-ConvNet: A Lightweight and Interpretable Unified Architecture for Medical Image Classification Across Modalities

This article has 10 authors:
1. Williams Ayivi
2. Xiaoling Zhang
3. Yeongx Yeong Hyeon Gu
4. Amil Aligayev
5. Ali Alqahtani
6. Wisdom Xornam Ativi
7. Francis Sam
8. Muhammed Amin Abdullah
9. Emmanuel Sarpong Addai Gyarteng
10. Mugahed A. Al-antari
This article has no evaluationsLatest version Mar 17, 2026
A Vision-Language Foundation Model for Zero-shot Clinical Collaboration and Automated Concept Discovery in Dermatology

This article has 31 authors:
1. Siyuan Yan
2. Xieji Li
3. Dan Mo
4. Philipp Tschandl
5. Yiwen Jiang
6. Zhonghua Wang
7. Ming Hu
8. Lie Ju
9. Cristina Alonso
10. Yizhen Zheng
11. Jiahe Liu
12. Juexiao Zhou
13. Camilla Chello
14. Jen Cheung
15. Julien Anriot
16. Luc Thomas
17. Clare Primiero
18. Gin Tan
19. Aik Ng
20. Simon See
21. Xiaoying Tang
22. Albert Ip
23. Xiaoyang Liao
24. Adrian Bowling
25. Martin Haskett
26. Shuang Zhao
27. Monika Janda
28. H Peter Soyer
29. Victoria Mar
30. Harald Kittler
31. Zongyuan Ge
This article has no evaluationsLatest version Feb 18, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Multimodal Large Reasoning Model For Fair and Interpretable Dermatological Diagnosis Across Skin Tones

ML-ConvNet: A Lightweight and Interpretable Unified Architecture for Medical Image Classification Across Modalities

A Vision-Language Foundation Model for Zero-shot Clinical Collaboration and Automated Concept Discovery in Dermatology