Assembly of bacterial genomes from long-read data (generated by Oxford Nanopore or Pacific Biosciences platforms) can often be complete: a single contig for each chromosome or plasmid in the genome. However, even complete bacterial genome assemblies constructed solely from long reads still contain a variety of errors, and different assemblies of the same genome often contain different errors. Here, we present Trycycler, a tool which produces a consensus assembly from multiple input assemblies of the same genome. Benchmarking using both simulated and real sequencing reads showed that Trycycler consensus assemblies contained fewer errors than any of those constructed with a single long-read assembler. Post-assembly polishing with Medaka and Pilon further reduced errors and yielded the most accurate genome assemblies in our study. As Trycycler can require human judgement and manual intervention, its output is not deterministic, and different users can produce different Trycycler assemblies from the same input data. However, we demonstrated that multiple users with minimal training converge on similar assemblies that are consistently more accurate than those produced by automated assembly tools. We therefore recommend Trycycler+Medaka+Pilon as an ideal approach for generating high-quality bacterial reference genomes.
Supplementary figures, tables and code can be found at: github.com/rrwick/Trycycler-paper
Reads, assemblies and reference sequences can be found at: bridges.monash.edu/articles/dataset/Trycycler_paper_dataset/14890734