Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism–calling pipelines

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

Accurately identifying single-nucleotide polymorphisms (SNPs) from bacterial sequencing data is an essential requirement for using genomics to track transmission and predict important phenotypes such as antimicrobial resistance. However, most previous performance evaluations of SNP calling have been restricted to eukaryotic (human) data. Additionally, bacterial SNP calling requires choosing an appropriate reference genome to align reads to, which, together with the bioinformatic pipeline, affects the accuracy and completeness of a set of SNP calls obtained. This study evaluates the performance of 209 SNP-calling pipelines using a combination of simulated data from 254 strains of 10 clinically common bacteria and real data from environmentally sourced and genomically diverse isolates within the genera Citrobacter, Enterobacter, Escherichia, and Klebsiella.

Results

We evaluated the performance of 209 SNP-calling pipelines, aligning reads to genomes of the same or a divergent strain. Irrespective of pipeline, a principal determinant of reliable SNP calling was reference genome selection. Across multiple taxa, there was a strong inverse relationship between pipeline sensitivity and precision, and the Mash distance (a proxy for average nucleotide divergence) between reads and reference genome. The effect was especially pronounced for diverse, recombinogenic bacteria such as Escherichia coli but less dominant for clonal species such as Mycobacterium tuberculosis.

Conclusions

The accuracy of SNP calling for a given species is compromised by increasing intra-species diversity. When reads were aligned to the same genome from which they were sequenced, among the highest-performing pipelines was Novoalign/GATK. By contrast, when reads were aligned to particularly divergent genomes, the highest-performing pipelines often used the aligners NextGenMap or SMALT, and/or the variant callers LoFreq, mpileup, or Strelka.

Article activity feed

  1. Now published in GigaScience doi: 10.1093/gigascience/giaa007

    Stephen J. Bush 1Nuffield Department of Medicine, University of Oxford, Oxford, UK2National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Stephen J. BushFor correspondence: stephen.bush@roslin.ed.ac.ukDona Foster 1Nuffield Department of Medicine, University of Oxford, Oxford, UK3National Institute for Health Research Oxford Biomedical Research Centre, Oxford, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteDavid W. Eyre 1Nuffield Department of Medicine, University of Oxford, Oxford, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for David W. EyreEmily L. Clark 4The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Emily L. ClarkNicola De Maio 1Nuffield Department of Medicine, University of Oxford, Oxford, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteLiam P. Shaw 1Nuffield Department of Medicine, University of Oxford, Oxford, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Liam P. ShawNicole Stoesser 1Nuffield Department of Medicine, University of Oxford, Oxford, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Nicole StoesserTim E. A. Peto 1Nuffield Department of Medicine, University of Oxford, Oxford, UK2National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, UK3National Institute for Health Research Oxford Biomedical Research Centre, Oxford, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteDerrick W. Crook 1Nuffield Department of Medicine, University of Oxford, Oxford, UK2National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, UK3National Institute for Health Research Oxford Biomedical Research Centre, Oxford, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteA. Sarah Walker 1Nuffield Department of Medicine, University of Oxford, Oxford, UK2National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, UK3National Institute for Health Research Oxford Biomedical Research Centre, Oxford, UKFind this author on Google ScholarFind this author on PubMedSearch for this author on this site

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giaa007 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

    These peer reviews were as follows:

    Reviewer 1: http://dx.doi.org/10.5524/REVIEW.102084 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.102085