Rethinking large scale phylogenomics with PhyloToL 6, a flexible toolkit to enable phylogeny-informed data curation and analysis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The bulk of eukaryotic diversity is microbial, with macroscopic lineages such as plants, animals and fungi nesting among diverse lineages that include amoebae, flagellates, ciliates, and many types of algae. Our understanding of the evolutionary relationships and genome properties of microbial eukaryotes is rapidly advancing through analyses of transcriptomic and genomic data. However, phylogenomic analyses are challenging for microeukaryotes, and particularly uncultivable lineages, as single-cell approaches generate a mixture of sequence data from hosts, associated microbiomes, and contaminants. Current practices include resampling of hand-curated gene sets that can be difficult for other researchers to replicate. To address these challenges, we present PhyloToL version 6.0, a modular, accessible pipeline that enables effective data curation, including a novel method of phylogeny-informed contamination removal, estimation of homologous gene families, and generation of multisequence alignments and gene trees. We provide several databases that will be of use for those interested in eukaryotic evolution, including: a Hook Database of curated reference sequences for 15,000 gene families; a database of transcriptome and genomes from 1,000 taxa with gene families (GFs) assigned; and a highly-curated set of MSA and gene trees for 500 GFs in these taxa. We also demonstrate the power of a suite of stand-alone utilities that provide basic statistics on sequences, analyze compositional/codon patterns, and enable exploration of trees. We exemplify the power of PhyloToL 6.0 in estimating eukaryotic phylogeny using the 500 conserved GFs, and set standards for curation of omics data for future research in the field.

Article activity feed