A Pangenomic Method for Establishing a Somatic Variant Detection Resource in HapMap Mixtures
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Somatic mosaicism is essential in human biology and disease, yet robust benchmarks are scarce. The SMaHT Consortium mixed six HapMap cell lines to create artificial somatic variants spanning 0.25% to 16.5% variant allele fractions. We developed a technology-agnostic method that builds pangenome graphs from individual assemblies to create unified benchmarking sets: > 6M single-nucleotide variants, 1.8M small insertions/deletions, 49K structural variations, and 10K mobile element insertions across autosomes, X, and mitochondrial chromosomes. We validated the variants using ultra-deep simulated reads and developed a binomial-based model to estimate coverage requirements for variant detection. Evaluating multiple callers showed CHM13 alignment improves structural variant detection and offers advantages in difficult-to-map regions compared to GRCh38. Systematic characterization showed regions with low detection rate are enriched in centromeres, satellite sequences, tandem repeats, and falsely duplicated genes. This accurate, versatile resource enables systematic evaluation of somatic variant detection technologies.