DAMPA - accelerated and simplified design of probe panels for targeted metagenomics using pangenome graphs
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Targeted metagenomics, where samples are enriched for multiple organisms of interest using oligonucleotide probes, is a highly efficient sequencing methodology that is becoming standard practice for genomics of viruses and complex polymicrobial samples. Efficient enrichment critically requires probes that capture both conserved and highly diverse genomic regions without loss of sensitivity, and with uniform representation in the sequencing pool. Design of optimal probesets poses a challenge: existing computational methods use k-mer hashing to reduce over-abundant sequences, but scalability and efficiency drop with increasing numbers of genomes, while diverse sequences remain under-represented. Here we show that incorporating evolutionary distance to compress probes via a graph-based representation of multiple genomes across species, together with k-mer hashing, reduces overrepresentation of conserved sequences, and yields more uniform coverage even of highly diverse loci. We make the method available in Dampa, an open-source tool that generates probesets in seconds on a standard laptop.
Software availability
DAMPA is available as an open source package that can be installed with conda, and is free for academic use. https://github.com/MultipathogenGenomics/dampa
Data availability
Sequences generated as part of the laboratory validation are available from the ENA project PRJNA1466720.
Ethics
Clinical samples and metadata were collected by the PRL at the NSW Health Pathology-Institute of Clinical Pathology and Medical Research under the Western Sydney Local Health District Human Research Ethics and Governance Committee (Project identifier: 2020/ETH02426). All data was de-identified
Funding
R.J.R. is supported by NHMRC Investigator grant (GNT2018222). TG is supported by NHMRC Investigator grant GNT2025445. MP is supported by Sydney Infectious Diseases Institute seed funding.