10,239 whole genomes with multi-omic and clinical health information as the Korean population multi-omic reference dataset

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We present Korea10K, the largest genomic dataset of the Korean population, comprising 10,239 high-coverage whole genomes (mean depth 30×) with matched multi-omic profiles and phenotype data. Korea10K achieves complete and near-complete discovery of very rare and ultra-rare alleles, respectively, at 9,000 Korean genomes. This dataset provides the high-quality population-specific imputation panel, enabling accurate inference of low-frequency variants. Admixture analyses confirm the overall genetic homogeneity of the Korean population, despite its diverse Y-chromosomal, mitochondrial, and HLA repertoires. This pattern reflects a long and continuous lineage history characterized by persistent internal admixture and genomic homogenization over thousands of years on the Korean peninsula. We also identified 16.8 million genomic variants that directly modify CG sites by creating or abolishing CG dinucleotides, providing the population-scale evidence of coordinated genomic-epigenomic regulatory mechanism in Koreans.

Article activity feed