Integrated population clustering and genomic epidemiology with PopPIPE

Martin P. McHugh
Samuel T. Horsfield
Johanna von Wachsmann
Jacqueline Toussaint
Kerry A. Pettigrew
Elzbieta Czarniak
Thomas J. Evans
Alistair Leanord
Luke Tysall
Stephen H. Gillespie
Kate E. Templeton
Matthew T. G. Holden
Nicholas J. Croucher
John A. Lees

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Genetic distances between bacterial DNA sequences can be used to cluster populations into closely related subpopulations and as an additional source of information when detecting possible transmission events. Due to their variable gene content and order, reference-free methods offer more sensitive detection of genetic differences, especially among closely related samples found in outbreaks. However, across longer genetic distances, frequent recombination can make calculation and interpretation of these differences more challenging, requiring significant bioinformatic expertise and manual intervention during the analysis process. Here, we present a Pop ulation analysis PIPE line (PopPIPE) which combines rapid reference-free genome analysis methods to analyse bacterial genomes across these two scales, splitting whole populations into subclusters and detecting plausible transmission events within closely related clusters. We use k-mer sketching to split populations into strains, followed by split k-mer analysis and recombination removal to create alignments and subclusters within these strains. We first show that this approach creates high-quality subclusters on a population-wide dataset of Streptococcus pneumoniae . When applied to nosocomial vancomycin-resistant Enterococcus faecium samples, PopPIPE finds transmission clusters that are more epidemiologically plausible than core genome or multilocus sequence typing (MLST) approaches. Our pipeline is rapid and reproducible, creates interactive visualizations and can easily be reconfigured and re-run on new datasets. Therefore, PopPIPE provides a user-friendly pipeline for analyses spanning species-wide clustering to outbreak investigations.

Version published to 10.1099/mgen.0.001404 on Access Microbiology
Apr 28, 2025
Version published to 10.1101/2024.12.05.626978 on bioRxiv
Dec 9, 2024

Environmental DNA at the population level: advancing conservation through non-invasive genetic and genomic approaches

This article has 4 authors:
1. Elisa Barreiro-Docío
2. Daniel García-Souto
3. María Saura
4. Sofía Consuegra
This article has no evaluationsLatest version Feb 2, 2026
Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary

This article has 1 author:
1. Marvin I. De los Santos
This article has no evaluationsLatest version Dec 22, 2025
Genetic estimates of relatedness: Established practices and new opportunities through low coverage whole genome sequencing

This article has 8 authors:
1. Annika Freudiger
2. Natalie Kestel
3. Vladimir Jovanovic
4. Mariana Madruga de Brito
5. Angelina Ruiz-Lambides
6. Katja Nowick
7. Anja Widdig
8. Harald Ringbauer
This article has no evaluationsLatest version Jan 23, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Environmental DNA at the population level: advancing conservation through non-invasive genetic and genomic approaches

Rapid Phylogenomic Analysis of Thousands Outbreak‐Causing Viral Genomes Using Covary

Genetic estimates of relatedness: Established practices and new opportunities through low coverage whole genome sequencing