Case-level artificial intelligence for multi-photo teledermatology submissions: development and internal validation using patient-submitted dermatology images

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Store-and-forward teledermatology commonly relies on several patient-submitted photographs of the same concern, but most dermatology artificial intelligence models classify single images independently.

Objective

To develop and internally validate a case-level diagnostic-support model that aggregates multiple patient-submitted photographs for common dermatologic conditions.

Methods

We conducted a retrospective diagnostic-modeling study using the Skin Condition Image Network, a public dataset of deidentified selftaken dermatology images from US adults. We curated 2,336 cases comprising 5,041 images across 10 common inflammatory, allergic, and infectious conditions. Cases were split at the submission level into training, validation, and held-out test sets. Frozen general-purpose and dermatology-specific encoders were compared with image-level classifiers and a gated-attention multiple instance learning model that generated one case-level output from 1–3 images.

Results

The strongest image-level baseline, dermatology-specific embeddings with random forest classification, achieved macro/micro ROCAUCs of 0.797/0.854. Case-level aggregation improved discrimination, with dermatology-specific embeddings plus multiple instance learning achieving mean macro/micro ROC-AUCs of 0.819/0.863 across repeated stratified experiments. The locked final model achieved macro/micro ROCAUCs of 0.800/0.849 on the held-out test set. Balanced-threshold sensitivity/specificity examples were 0.702/0.688 for eczema and 0.818/0.826 for urticaria.

Limitations

Internal validation used a 10-condition subset from a US volunteer dataset; external validation, calibration, subgroup performance analysis, and prospective workflow studies are required.

Conclusion

Modeling the teledermatology submission as a multi-image case better reflects asynchronous dermatology workflow than single-image classification. The model is preliminary clinician-facing support for structured review and triage, not autonomous diagnosis.

Key Points

  • Store-and-forward teledermatology submissions usually contain multiple patient-submitted photographs, whereas most dermatology AI models classify single images independently.

  • This study developed a case-level multiple instance learning model that aggregates 1–3 photographs from the same SCIN submission and produces one clinician-facing diagnostic-support output.

  • Case-level aggregation modestly improved discrimination over the strongest image-level baseline and produced threshold-specific sensitivity/specificity outputs suitable for structured review and triage research.

Article activity feed