Datasheet for the IDHea Primary Care Screening Dataset: A Real-World Ocular Imaging Resource for Research
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Purpose
Ocular screening in primary care is increasingly recognized as a valuable opportunity to detect a wide range of eye conditions at an early stage, especially among individuals with systemic risk factors. The Primary Care Screening dataset is a large-scale, real-world collection of de-identified color fundus photographs (CFP) acquired during routine diabetic retinopathy (DR) screening visits in primary care clinics across the United States, and is available through the Institute for Digital Health (IDHea)—a secure research platform established by Topcon Healthcare, Inc.
Methods
The dataset includes CFP from individuals who participated in eye screening at 643 clinical sites in the United States. The majority of images were obtained using the TRC-NW400 (Topcon Corp., Tokyo, Japan), although other imaging devices were also used. Each image was graded by an eye care specialist using the International Clinical Diabetic Retinopathy (ICDR) grading system. Graders also recorded image quality and a wide range of other retinal findings. The dataset also includes patient-level demographics including age, sex, and 3-digit ZIP code, image metadata, along with model-generated annotations such as AutoMorph image quality, vascular metrics, and retinal pigment scores.
Results
As of March 2025, the dataset includes 427,182 CFP from 161,705 subjects, representing 372,528 eyes and 186,264 individual visits. Image quality was graded as Excellent or Good (218,139 eyes), Fair (109,127 eyes), or Unreadable (37,491 eyes). Most eyes (n = 293,391) had no diabetic retinopathy present (NDRP), while 15,514 had mild non-proliferative DR (NPDR), 10,856 moderate NPDR, 1,140 severe NPDR, and 1,644 proliferative DR. Diabetic macular edema was present in 4–34% of NPDR categories and 18% of proliferative DR cases. Additional findings included drusen or pigmentary changes (n = 9,649), glaucoma suspect (n = 4,257), and macular degeneration (n = 3,581).
Conclusion
The Primary Care Screening dataset represents one of the largest real-world collections of retinal images acquired in primary care settings. Its size, diversity, and detailed expert grading make it a valuable resource for research into automated screening, ocular disease prevalence, and AI model development. An independent Data Access and Governance committee oversees research applications to ensure responsible use. The dataset will be made available via the secure IDHea research platform. More information is available at IDHea.net .