Phylogenomic analysis of Wolbachia genomes from the Darwin Tree of Life biodiversity genomics project

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

The Darwin Tree of Life (DToL) project aims to sequence all described terrestrial and aquatic eukaryotic species found in Britain and Ireland. Reference genome sequences are generated from single individuals for each target species. In addition to the target genome, sequenced samples often contain genetic material from microbiomes, endosymbionts, parasites, and other cobionts. Wolbachia endosymbiotic bacteria are found in a diversity of terrestrial arthropods and nematodes, with supergroups A and B the most common in insects. We identified and assembled 110 complete Wolbachia genomes from 93 host species spanning 92 families by filtering data from 368 insect species generated by the DToL project. From 15 infected species, we assembled more than one Wolbachia genome, including cases where individuals carried simultaneous supergroup A and B infections. Different insect orders had distinct patterns of infection, with Lepidopteran hosts mostly infected with supergroup B, while infections in Diptera and Hymenoptera were dominated by A-type Wolbachia . Other than these large-scale order-level associations, host and Wolbachia phylogenies revealed no (or very limited) cophylogeny. This points to the occurrence of frequent host switching events, including between insect orders, in the evolutionary history of the Wolbachia pandemic. While supergroup A and B genomes had distinct GC% and GC skew, and B genomes had a larger core gene set and tended to be longer, it was the abundance of copies of bacteriophage WO who was a strong determinant of Wolbachia genome size. Mining raw genome data generated for reference genome assemblies is a robust way of identifying and analysing cobiont genomes and giving greater ecological context for their hosts.

Article activity feed

  1. The genomes were uniformly of high completeness (Figure S2). Due to the high intrinsic base quality of HiFi reads (Q30 to Q40; from one error in 1000 to one error in 10,000) we were able to distinguish insertions of Wolbachia DNA into the host genome from true components of the Wolbachia genome, and to independently assemble even closely related strains with confidence

    This whole study is really interesting. This is one part that is not entirely clear to me. How do you use this information to distinguish actual Wolbachia DNA from Wolbachia from insertions of Wolbachia DNA into the host genome?