10,239 whole genomes with multiomic and clinical health information as the Korean Multiomics Reference dataset
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (Arcadia Science)
Abstract
We present Korea10K, the largest genomic dataset of the Korean population, comprising 10,239 high-coverage whole genomes (mean depth 30×) with matched multiomic profiles and phenotype data. Korea10K achieves complete and near-complete discovery of very rare and ultra-rare alleles, respectively, at 9,000 Korean genomes. This dataset provides the high-quality population-specific imputation panel, enabling accurate inference of low-frequency variants. Admixture analyses confirm the genetic homogeneity of the Korean population, despite its diverse Y-chromosomal, mitochondrial, and HLA repertoires. This pattern reflects a long and continuous lineage history characterized by persistent internal admixture and genomic homogenization over thousands of years on the Korean peninsula. We also identified 16.8 million genomic variants that directly modify CG sites by creating or abolishing CG dinucleotides, providing the population-scale evidence of coordinated genomic-epigenomic regulatory mechanism in Koreans.
Article activity feed
-
we performed 100 independent permutations in which the sample order was randomly shuffled without replacement.
I like the approach here. Particularly important because it avoids the bias from cohort heterogeneity that can affect saturation estimates in sequentially recruited cohorts.
-
Comparison with high-coverage 1KGP European genomes (N = 522) showed that 72.8% of the CGVs in the hotspot regions (with >10% deviation against GRCh38) were significantly more frequent in Koreans (2,078/2,853 variants; CG-creating: 1,320/1,803; CG-eliminating: 758/1,050; Figure 3E, F
Comparing to other genomes beyond European would strengthen the claim that these are Korean-specific CG context-associated variants.
-
These CGVs suggest population-level differences in epigenetic regulation and gene expression, particularly in pathways related to psychiatric and metabolic disease risk in the Korean population
Given that there are 1,211 methylomes and 868 transcriptomes in this cohort, these claims would benefit from directly testing whether CGV sites show differential methylation or whether nearby genes show expression differences.
-