Whole-genome variant detection in long-read sequencing data from ultra-low input patient samples
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Long-read sequencing provides a more complete view of the genome than short-read sequencing, with advantages in the detection of structural variants, tandem repeats, and small variants (single nucleotide variants and insertions and deletions) in difficult-to-map regions. One limitation of long-read sequencing has been high input DNA requirements, with several micrograms required per sample. Here, we evaluated two methods of amplification-based long-read, whole-genome sequencing: Ultra-Low Input HiFi (ULI-HiFi) sequencing and droplet multiple displacement amplification (dMDA) sequencing. When the accuracy of these methods were benchmarked with a reference set of variant calls from the Genome in a Bottle consortium (NA24385), we observed high precision and recall of single nucleotide variants (SNVs) with ULI-HiFi compared to the dMDA-amplified samples (F1 scores for SNVs of 99.82% for ULI-HiFi compared to 89.46% for dMDA). Across a catalog of >1.6 million tandem repeats (TRs), ULI-HiFi achieved 90.4% perfect concordance and 98.9% accuracy when allowing for single motif differences. ULI-HiFi also successfully illuminated several medically-important genes that were poorly mapped with short-read DNA sequencing. Because ULI-HiFi only requires 10-20 nanograms of DNA, we extended ULI-HiFi to analyze a normal, polyp, and adenocarcinoma sample from a patient with familial adenomatous polyposis (FAP), a hereditary form of colorectal cancer. We identified a TR that progressively expanded in length from normal to polyp to adenocarcinoma. This repeat is located in the 5' UTR of LIMD1, a reported tumor suppressor. We conclude that ULI-HiFi improves the characterization of genetic variants in dark regions of genomes from patient samples, enabling a better understanding of human disease.