Korea10K: 10,239 whole genomes with multiomic and clinical health information as the Korean multiomics reference dataset

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Large-scale genome projects have advanced the characterization of human genetic diversity; however, the lack of deeply sequenced, multiomic resources representing the Korean population has limited systematic evaluation of population-specific variation and its functional and translational impact. Results We present Korea10K, a population-scale genomic and multiomic resource comprising 10,239 high-depth whole genomes (mean coverage 30×) with integrated molecular and phenotypic data. Analysis of 9,000 unrelated individuals enabled near-complete discovery of rare and ultra-rare variants and supported the construction of a high-resolution, population-specific imputation panel. Koreans exhibit pronounced autosomal genetic homogeneity despite substantial diversity in Y-chromosomal, mitochondrial, and HLA lineages, reflecting long-term demographic continuity. We further identified 16.4 million variants that alter CpG dinucleotide context, revealing widespread sequence-driven modulation of the genomic CG landscape. Notably, population-specific CG-eliminating variants disrupt CpG probe targets in widely used methylation arrays, introducing a systematic source of bias in epigenome-wide association studies and epigenetic clock estimation. Conclusion Korea10K establishes a high-resolution genomic and multiomic reference for the Korean population and reveals a previously unrecognized interaction between genetic variation and epigenomic measurement. This resource provides a foundation for precision medicine and highlights the need for ancestry-aware interpretation of molecular data.

Article activity feed