A Pangenomic Method for Establishing a Somatic Variant Detection Resource in HapMap Mixtures

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Somatic mosaicism is essential in human biology and disease, yet robust benchmarks are scarce. The SMaHT Consortium mixed six HapMap cell lines to create artificial somatic variants spanning 0.25% to 16.5% variant allele fractions. We developed a technology-agnostic method that builds pangenome graphs from individual assemblies to create unified benchmarking sets: > 6M single-nucleotide variants, 1.8M small insertions/deletions, 49K structural variations, and 10K mobile element insertions across autosomes, X, and mitochondrial chromosomes. We validated the variants using ultra-deep simulated reads and developed a binomial-based model to estimate coverage requirements for variant detection. Evaluating multiple callers showed CHM13 alignment improves structural variant detection and offers advantages in difficult-to-map regions compared to GRCh38. Systematic characterization showed regions with low detection rate are enriched in centromeres, satellite sequences, tandem repeats, and falsely duplicated genes. This accurate, versatile resource enables systematic evaluation of somatic variant detection technologies.

Article activity feed