A comparison of short- and long-read whole genome sequencing for microbial pathogen epidemiology
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Whole genome sequencing provides the highest resolution for characterizing pathogen evolution, epidemiology, and diagnostics. Genome assemblies contain information on the identity and potential phenotypes of a pathogen. Likewise, variant calling can inform on transmission patterns and evolutionary relationships. Recent improvements in Oxford Nanopore long-read sequencing have made its use attractive for genomic epidemiology. However, the accuracy and optimal strategy for analysis of Nanopore reads remains to be determined. We compared the use of Illumina short reads and Oxford Nanopore long reads for genome assembly and variant calling of phytopathogenic bacteria. We generated short- and long-read datasets for diverse phytopathogenic Agrobacterium strains. We then analyzed these data using multiple pipelines designed for either short or long reads and compared the results. We found that assemblies made from long reads were more complete than those made from short-read data and contained few sequence errors. Variant calling pipelines differed in their ability to accurately call variants and infer genotypes from long reads. Results suggest that computationally fragmenting long reads can improve the accuracy of variant calling in population-level studies. Using fragmented long reads, pipelines designed for short reads were more accurate at recovering genotypes than pipelines designed for long reads. Further, short- and long-read datasets can be analyzed together with the same pipelines. These findings show that Oxford Nanopore sequencing is accurate and can be sufficient for microbial pathogen genomics and epidemiology. Ultimately, this enhances the ability of researchers and clinicians to understand and mitigate the spread of pathogens.
Importance
Genome assembly and variant calling are important steps in microbial population studies and epidemiology. Most variant calling and genotyping pipelines are designed for Illumina short sequencing reads. Oxford Nanopore Technology long-read sequencing results in more complete genome assemblies but has historically been of lower quality. Here, we show that Nanopore long reads are now of sufficient quality for bacterial whole genome assembly and epidemiology. We benchmarked the accuracy of multiple variant-calling pipelines with short and long reads. Using an optimized variant calling approach, variant calls and genotypes inferred from long reads are as accurate as those inferred from short reads. Importantly, we found that gold-standard variant calling pipelines designed for short reads are also accurate with long reads when long reads are first fragmented into shorter sequences. This finding allows researchers to incorporate the advantages of Nanopore sequencing for genome assembly, while maintaining high accuracy for epidemiology and population analysis.