DAMPA - accelerated and simplified design of probe panels for targeted metagenomics using pangenome graphs

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Targeted metagenomics, where samples are enriched for multiple organisms of interest using oligonucleotide probes, is a highly efficient sequencing methodology that is becoming standard practice for genomics of viruses and complex polymicrobial samples. Efficient enrichment critically requires probes that capture both conserved and highly diverse genomic regions without loss of sensitivity, and with uniform representation in the sequencing pool. Design of optimal probesets poses a challenge: existing computational methods use k-mer hashing to reduce over-abundant sequences, but scalability and efficiency drop with increasing numbers of genomes, while diverse sequences remain under-represented. Here we show that incorporating evolutionary distance to compress probes via a graph-based representation of multiple genomes across species, together with k-mer hashing, reduces overrepresentation of conserved sequences, and yields more uniform coverage even of highly diverse loci. We make the method available in Dampa, an open-source tool that generates probesets in seconds on a standard laptop.

Software availability

DAMPA is available as an open source package that can be installed with conda, and is free for academic use. https://github.com/MultipathogenGenomics/dampa

Data availability

Sequences generated as part of the laboratory validation are available from the ENA project PRJNA1466720.

Ethics

Clinical samples and metadata were collected by the PRL at the NSW Health Pathology-Institute of Clinical Pathology and Medical Research under the Western Sydney Local Health District Human Research Ethics and Governance Committee (Project identifier: 2020/ETH02426). All data was de-identified

Funding

R.J.R. is supported by NHMRC Investigator grant (GNT2018222). TG is supported by NHMRC Investigator grant GNT2025445. MP is supported by Sydney Infectious Diseases Institute seed funding.

Article activity feed