Datasheet for the IDHea Primary Eye Care Dataset: A Real-World Ocular Imaging Resource for Research

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Purpose

Real-world ocular imaging datasets are essential for advancing research in artificial intelligence (AI), autonomous disease screening, and clinical decision support. The Primary Eye Care dataset is a large-scale collection of de-identified retinal imaging data from routine optometric care, made available through the Institute for Digital Health (IDHea)—a secure research platform established by Topcon Healthcare, Inc. This dataset provides an opportunity to study eye health in a community setting and will be available via this cloud-based platform.

Methods

Data were collected and de-identified from individuals who underwent imaging as part of their routine care across 40 optometry practices in the United States and one practice in Australia. The dataset includes three-dimensional optical coherence tomography (OCT), and color fundus photographs acquired using Maestro devices (Topcon Corp., Tokyo, Japan), along with demographic data including age and sex. Imaging data were converted to DICOM format, and OCT analysis metrics such as retinal layer thicknesses were derived. Additional labels including image quality, vessel metrics, and retinal pigment score were generated using open-source AI models.

Results

TThe dataset comprises 873,291 image acquisitions from 276,061 subjects with a mean age of 43.8 years (standard deviation = 19.5). 48.7% were female, 36.2% as male, and 15.1% not reported. Most OCT scans followed the 12 × 9 mm 3D Wide protocol (86.3%), with additional 3D Macula, 3D Disc, anterior segment, radial, and line scans. 59,049 subjects (21.4%) had two or more scans separated by ≥ 365 days. Pre-processed metrics and AI-derived labels, such as TopQ image quality scores, glaucoma risk score, and AutoMorph features are included. 89.4% of OCT scans scored above 25 on the TopQ scale, indicating reliable image quality. A propensity score-matched test subset (∽10%) was held out to enable consistent benchmarking across studies.

Conclusion

The Primary Eye Care Dataset provides a large-scale, real-world collection of ocular imaging data, reflecting a largely healthy, community-based population attending routine optometric visits. This makes it particularly valuable for developing AI models aimed at early detection and prevention at the population level, where most eyes are healthy, and disease prevalence is low. Data access is governed by an independent committee to ensure ethical and responsible use; more information is available at IDHea.net

Article activity feed