A Deep Learning System for Diagnosis of Rheumatoid Arthritis on Digital Hand Photographs
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Objectives: To develop and evaluate a deep learning model for diagnosing untreated rheumatoid arthritis (RA) using digital camera images of bilateral dorsal hands, benchmarking its performance against the widely-used 2010 ACR/EULAR criteria as a clinical reference standard. Methods: This pilot study included 170 participants (86 RA, 84 non-RA) who presented with joint symptoms at participating medical institutions. Digital images of both dorsal hands were captured under standardized conditions and processed using a deep learning-based background removal algorithm. A Swin Transformer-based model was developed and trained on these images. Model performance was evaluated using area under the receiver operating characteristics curve (AUROC), sensitivity, specificity, and calibration metrics. Gradient-Weighted Class Activation Mapping (Grad-CAM) was employed to visualize the model’s decision-making process. Results: The deep learning model achieved an AUROC of 0.870 (95% CI: 0.708-0.988), compared with 0.981 (95% CI: 0.953-1.010) for the ACR/EULAR criteria, with the difference not reaching statistical significance (p=0.131). While demonstrating comparable sensitivity to the ACR/EULAR criteria, the model showed lower specificity, accuracy, and F1-score. Post-Platt scaling calibration analysis revealed good alignment with ideal calibration in the 0.4–0.6 probability range. Grad-CAM visualization confirmed that the model focused on clinically relevant joint regions, particularly the metacarpophalangeal and proximal interphalangeal joints. Conclusion: Our deep learning-based approach for RA diagnosis using standard digital camera images demonstrated clinically viable performance, albeit with lower specificity than the ACR/EULAR criteria. This accessible screening tool could potentially expedite early RA detection, particularly in resource-limited settings. Larger multi-centre studies are needed to validate our findings and establish broader clinical applicability.