AI Decision Support for Challenging Teledermatology Cases: MedGemma Performance in the Dermatology ECHO Program

Jeffrey B. Appiagyei
Ruth O. Otu
Mollie Henry
Benjamin W. Casterline
Mirna Becevic

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Teledermatology expands access to dermatologic expertise in rural settings, yet diagnostic uncertainty persists in low-resource primary care. This retrospective study evaluated MedGemma-4B-IT, a compact multimodal vision-language model, as adjunctive clinical decision support for challenging diagnostic cases. We analyzed 77 zero-concordance cases (360 clinical photographs) from a Dermatology Extension for Community Healthcare Outcomes (ECHO) tele-mentoring program (2016-2021). Zero-concordance cases showed no overlap between primary clinician provisional diagnosis and dermatologist-confirmed diagnosis. The model was prompted using dermatologist-style format to generate ranked differential diagnoses. Performance was assessed using strict case-level top-k exact-match accuracy and relaxed matching criteria based on fuzzy string similarity. MedGemma achieved 0.0% strict top-1 accuracy, 1.3% top-3 accuracy, 3.9% top-5 accuracy, and 3.9% top-10 accuracy. Relaxed concept-level matching achieved 28.6% top-1, 63.6% top-5, and 67.5% top-10 accuracy. Image-level accuracy was 44.2% (159/360, 95% CI 39.0-49.5%). The model surfaced the correct diagnosis within differential lists in 45.5% of cases despite no exact top-1 matches, suggesting utility for differential expansion rather than definitive diagnosis. Performance varied across diagnostic categories, with highest accuracy in Other categories (54.5%) and lowest in neoplastic conditions (0.0%). Common errors included confusion between inflammatory and other diagnostic groupings. These findings characterize MedGemma performance on real-world teledermatology cases and inform safe, clinician-in-the-loop integration into teledermatology workflows where specialist oversight remains essential.

What this Study Adds

This study provides empirical evaluation of MedGemma-4B-IT as adjunctive decision support for challenging teledermatology cases in a community healthcare ECHO setting. We demonstrate that while strict top-1 diagnostic accuracy is 0%, the model correctly surfaces the dermatologist-confirmed diagnosis within a 10-item differential in 45.5% of zero-concordance cases, suggesting value as a differential diagnostic prompt rather than a direct diagnostic replacement. These findings inform safe, clinician-in-the-loop deployment strategies for compact vision-language models in resource-limited telemedicine settings.

Conclusions

MedGemma demonstrates differential diagnostic utility in challenging teledermatology cases, surfacing the correct diagnosis within a 10-item differential in nearly half of cases despite zero top-1 accuracy. These findings support clinician-in-the-loop AI deployment for diagnostic expansion in resource-limited settings, while highlighting the need for improved neoplastic detection and confidence calibration in future model development.

Results

Under strict exact matching, top-1 accuracy was 0.0% (0/77), increasing to 3.9% (3/77) at top-10. Under relaxed concept-level matching, top-1 accuracy was 28.6% (22/77), rising to 45.5% (35/77) at top-10. The Mean Reciprocal Rank was 0.4287. Diagnostic performance varied by category: Other diagnoses showed 54.5% top-10 accuracy, while neoplastic conditions showed 0.0%.

Methods

This retrospective study analyzed 77 zero-concordance cases (360 images) from the Missouri Dermatology ECHO program (2016-2021). Cases were those where the primary care clinician provisional diagnosis showed no textual overlap with the dermatologist-confirmed diagnosis. Primary outcome was top-1 exact-match accuracy; secondary outcomes included top-k accuracy under concept-level relaxed matching.

Background

Teledermatology expands access to dermatologic expertise in rural settings, yet diagnostic uncertainty persists in low-resource primary care. We evaluated MedGemma-4B-IT, a compact multimodal vision-language model, as adjunctive clinical decision support for challenging teledermatology cases.

Stratified performance analysis by diagnostic category and image count provides actionable guidance for deployment scenarios.

Concept-level relaxed matching reveals clinically relevant differential diagnostic utility that strict exact-match metrics obscure.

Zero-concordance cases provide a rigorous test of model performance, representing the diagnostic frontier where clinical decision support is most needed.

Version published to 10.64898/2026.05.21.26353523 on medRxiv
May 26, 2026