Korea10K: 10,239 whole genomes with multiomic and clinical health information as the Korean multiomics reference dataset

Kyungwhan An
Sungwon Jeon
Yoonsung Kwon
Yookyung Choi
Changhan Yoon
Yeonsu Jeon
Jihun Bhak
Dong-Hyun Shin
Hyoung-Jin Choi
Hyomin Lee
Yeo Jin Kim
Eun-Seok Shin
Hyojung Ryu
Asta Blazyte
Dan Bolser
Sangsoo Park
Juok Cho
Soobok Joe
Jin Ok Yang
Jongbum Jeon
Jong-Hwan Kim
Jungeun Kim
Dooyoung Jung
Yun Sung Cho
Kiyuk Chang
Eun Ho Choo
Eunmin Kim
Sang Yeub Lee
Weon Kim
Min Gyu Kang
Ae-Young Her
Suk Chon
Jeong-Taek Woo
Sang Youl Rhee
Siwoo Lee
Hee-Jeong Jin
Younghwa Baek
Hyo-Jeong Ban
Yong Min Ahn
Sang Jin Rhee
Min Ji Kim
Sang Yeol Lee
Chan-Mo Yang
Se-Hoon Shim
Seong-Jin Cho
Shin Gyeom Kim
Hyung-Tae Jung
Byung-Joo Ham
Yoon Young Choi
Jae-Ho Cheong
Seung-Ki Kim
Ji Hoon Phi
Seung Ah Choi
Heon Yung Gee
Sun Young Joo
Jinsei Jung
Wonsuk Shin
Sang-Hyuk Lee
Borah Kim
Woojae Myung
Chong Kun Cheon
Dong Uk Kim
Seok-Soo Byun
Gangnam Jin
Hojun Lee
Kyun Shik Chae
Chang Geun Kim
Bonghee Lee
Jaesuk Lee
Kwangwoo Kim
Semin Lee
Neung-Hwa Park
Haeyoung Jeong
George M. Church
Jong Bhak

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Large-scale genome projects have advanced the characterization of human genetic diversity; however, the lack of deeply sequenced, multiomic resources representing the Korean population has limited systematic evaluation of population-specific variation and its functional and translational impact. Results We present Korea10K, a population-scale genomic and multiomic resource comprising 10,239 high-depth whole genomes (mean coverage 30×) with integrated molecular and phenotypic data. Analysis of 9,000 unrelated individuals enabled near-complete discovery of rare and ultra-rare variants and supported the construction of a high-resolution, population-specific imputation panel. Koreans exhibit pronounced autosomal genetic homogeneity despite substantial diversity in Y-chromosomal, mitochondrial, and HLA lineages, reflecting long-term demographic continuity. We further identified 16.4 million variants that alter CpG dinucleotide context, revealing widespread sequence-driven modulation of the genomic CG landscape. Notably, population-specific CG-eliminating variants disrupt CpG probe targets in widely used methylation arrays, introducing a systematic source of bias in epigenome-wide association studies and epigenetic clock estimation. Conclusion Korea10K establishes a high-resolution genomic and multiomic reference for the Korean population and reveals a previously unrecognized interaction between genetic variation and epigenomic measurement. This resource provides a foundation for precision medicine and highlights the need for ancestry-aware interpretation of molecular data.

Version published to 10.21203/rs.3.rs-9285712/v1 on Research Square
Apr 17, 2026

A Foundational Exome Resource for Jordan: Dual Ancestry Admixture and Population-Specific Variants to Improve Clinical Variant Interpretation

This article has 1 author:
1. Tawfiq Froukh
This article has no evaluationsLatest version May 27, 2026
CANCAN: high-resolution copy number and mutation heterogeneity analysis of DNA sequence data for clinical applications

This article has 14 authors:
1. Arne V Pladsen
2. Daniel Vodak
3. Sen Zhao
4. Sigve Nakken
5. Daniel Nebdal
6. Tonje Lien
7. Britina Kjuul Danielsen
8. Caroline Wang
9. Wanja Kildal
10. Geir Olav Hjortland
11. Olav Engebråten
12. Eivind Hovig
13. Hege G Russnes
14. Ole Christian Lingjærde
This article has no evaluationsLatest version May 19, 2026
Evolutionary genomics based on PacBio HiFi long-read sequencing data reveals the importance of structural variants in shaping population-specific differences between Chinese and Indian rhesus macaques ( Macaca mulatta )

This article has 4 authors:
1. Takahiro Maruki
2. Cyril J. Versoza
3. Jeffrey D. Jensen
4. Susanne P. Pfeifer
This article has no evaluationsLatest version May 29, 2026

Korea10K: 10,239 whole genomes with multiomic and clinical health information as the Korean multiomics reference dataset

Discuss this preprint

Listed in

Abstract

Article activity feed

A Foundational Exome Resource for Jordan: Dual Ancestry Admixture and Population-Specific Variants to Improve Clinical Variant Interpretation

CANCAN: high-resolution copy number and mutation heterogeneity analysis of DNA sequence data for clinical applications

Evolutionary genomics based on PacBio HiFi long-read sequencing data reveals the importance of structural variants in shaping population-specific differences between Chinese and Indian rhesus macaques ( Macaca mulatta )

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Foundational Exome Resource for Jordan: Dual Ancestry Admixture and Population-Specific Variants to Improve Clinical Variant Interpretation

CANCAN: high-resolution copy number and mutation heterogeneity analysis of DNA sequence data for clinical applications

Evolutionary genomics based on PacBio HiFi long-read sequencing data reveals the importance of structural variants in shaping population-specific differences between Chinese and Indian rhesus macaques ( Macaca mulatta )