Korea10K: 10,239 whole genomes with multiomic and clinical health information as the Korean multiomics reference dataset
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Large-scale genome projects have advanced the characterization of human genetic diversity; however, the lack of deeply sequenced, multiomic resources representing the Korean population has limited systematic evaluation of population-specific variation and its functional and translational impact. Results We present Korea10K, a population-scale genomic and multiomic resource comprising 10,239 high-depth whole genomes (mean coverage 30×) with integrated molecular and phenotypic data. Analysis of 9,000 unrelated individuals enabled near-complete discovery of rare and ultra-rare variants and supported the construction of a high-resolution, population-specific imputation panel. Koreans exhibit pronounced autosomal genetic homogeneity despite substantial diversity in Y-chromosomal, mitochondrial, and HLA lineages, reflecting long-term demographic continuity. We further identified 16.4 million variants that alter CpG dinucleotide context, revealing widespread sequence-driven modulation of the genomic CG landscape. Notably, population-specific CG-eliminating variants disrupt CpG probe targets in widely used methylation arrays, introducing a systematic source of bias in epigenome-wide association studies and epigenetic clock estimation. Conclusion Korea10K establishes a high-resolution genomic and multiomic reference for the Korean population and reveals a previously unrecognized interaction between genetic variation and epigenomic measurement. This resource provides a foundation for precision medicine and highlights the need for ancestry-aware interpretation of molecular data.