Benchmarking sequence performance on the DNBSEQ-T7 using Genome in a Bottle reference genomes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Advances in sequencing technologies have improved the accuracy, throughput, and completeness of human genome characterization, enabling more reliable detection of genetic variation. Well-characterized reference genomes are critical for benchmarking sequencing platforms and bioinformatics analysis pipelines. Here, we present whole genome sequencing datasets generated for the Ashkenazi Jewish trio reference samples from the Genome in a Bottle Consortium. Libraries were prepared using three distinct MGI-based workflows: PCR-free library preparation, FastFS DNA library preparation, and Universal DNA library preparation. Sequencing was performed on the MGI DNBSEQ-T7 platform, generating a minimum of 400 million paired-end reads per sample, corresponding to 30X mean genome coverage.
Raw reads were processed using a standardized GATK bioinformatics workflow. Sequencing performance and variant detection accuracy were evaluated using the Genome in a Bottle high-confidence benchmark variant sets. All workflows demonstrated high sequencing quality and concordance with GIAB benchmark truth sets, with PCR-free libraries showing the strongest indel calling performance and lowest Mendelian violation rates across the Ashkenazi trio.
This dataset provides a resource for benchmarking DNBSEQ-T7 sequencing and bioinformatics workflows, and for evaluating the impact of library preparation strategies on whole genome variant detection performance.