Real-world data from the Japanese National Health Insurance System enables fine phenotyping in a 14K-scale population-based study

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The use of medical databases, known as real-world data (RWD), facilitates accurate, efficient, and frequent estimation of disease prevalence in population-based studies. Thus, it is expected to be a game-changer in epidemiology and public health research. However, the lack of standardized data formats across hospitals makes it challenging to integrate data across institutions. In contrast, health insurance claims data in Japan are standardized under a nationally unified format, making them a reliable source of structured RWD.

We aimed to evaluate the utility of insurance claims data in epidemiological research. Incorporating both diagnosis and prescription information from claims into case definitions resulted in four- and six-fold increases in the estimated prevalence of Alzheimer’s disease (AD) and Parkinson’s disease, respectively, compared to conventional definitions based on self-administered questionnaires. Subsequent genome-wide association studies (GWAS) for AD demonstrated an increased log-likelihood of the model and identified a characteristic signal at the APOE locus. These changes were exclusively observed when using claims-enriched case definitions. The effect size of the APOE variant was further compared with previous GWAS findings. Our results were consistent with case–control studies involving over 4,000 cases, while the standard error remained comparable to that of smaller studies. These findings suggest that incorporating claims-based clinical data—diagnoses and prescriptions—into phenotype definitions improves case identification while maintaining accuracy. This approach may support scalable strategies in genomic epidemiology and public health surveillance.

Article activity feed