COATswga: A Coverage Optimizing and Accurate Toolkit for fast primer design in selective whole genome amplification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Despite the transformative nature of next-generation sequencing in genomics, efficiently capturing underrepresented microbial DNA from complex biological mixtures, such as pathogens from host samples, remains a persistent challenge. Unwanted genomes make whole-genome sequencing (WGS) inefficient and expensive without enrichment of the targeted genome. Selective whole genome amplification (sWGA) uses primers that selectively bind to the target genome to preferentially amplify and enrich an entire microbial genome. However, existing sWGA primer design is often complicated and time-consuming, with primers usually having biased amplification producing uneven genome coverage. Developing faster primer design methods that ensure uniform, reliable microbial genome recovery are critical to improving sWGA.
Methods
We developed COATswga, a Coverage Optimizing and Accurate Toolkit for designing sWGA primer sets. The pipeline consists of three key steps: (1) primer discovery , using k-mer counting to identify all candidate primers in the target genomes; (2) filtering , where primers are screened for amplification potential, thermodynamic stability, and specificity; and (3) set formation , which uses a novel interval-based tiling algorithm. COATswga is parallelized for efficiency and supports pre-existing primer set refinement and gap filling. The pipeline was evaluated by designing primer sets for Plasmodium falciparum (strain 3D7) and benchmarking their performance against sets generated using similar pipelines.
Results
COATswga-designed primers demonstrated superior enrichment based on both qPCR and long-read sequencing. In P. falciparum -spiked human DNA samples at parasitemia levels of 400, 100, and 10 parasites/μL, COATswga primers achieved higher amplification efficiency and specificity compared to alternatives. At 100 parasitemia, COATswga yielded ∼99% of reads mapping to the P. falciparum genome with 82.5% of the genome having 5x coverage. Even at 10 parasites/uL, COATswga primers enabled effective amplification of key drug resistance genes ( pfcrt and pfmdr1 ). Additionally, integration with molecular inversion probe (MIP) genotyping showed that COATswga greatly improved sequencing depth and target recovery in low-density infections compared to established sWGA primers.
Conclusion
COATswga provides a robust and flexible solution for designing sWGA primers to reliably recover targeted genomes from complex samples. The new P. falciparum primer set performs well in low-parasitemia samples allowing more extensive application of sWGA in falciparum malaria genomics.