Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (GigaScience)
Abstract
Background
De novo transcriptome assemblies are required prior to analyzing RNA sequencing data from a species without an existing reference genome or transcriptome. Despite the prevalence of transcriptomic studies, the effects of using different workflows, or “pipelines," on the resulting assemblies are poorly understood. Here, a pipeline was programmatically automated and used to assemble and annotate raw transcriptomic short-read data collected as part of the Marine Microbial Eukaryotic Transcriptome Sequencing Project. The resulting transcriptome assemblies were evaluated and compared against assemblies that were previously generated with a different pipeline developed by the National Center for Genome Research.
Results
New transcriptome assemblies contained the majority of previous contigs as well as new content. On average, 7.8% of the annotated contigs in the new assemblies were novel gene names not found in the previous assemblies. Taxonomic trends were observed in the assembly metrics. Assemblies from the Dinoflagellata showed a higher number of contigs and unique k-mers than transcriptomes from other phyla, while assemblies from Ciliophora had a lower percentage of open reading frames compared to other phyla.
Conclusions
Given current bioinformatics approaches, there is no single “best” reference transcriptome for a particular set of raw data. As the optimum transcriptome is a moving target, improving (or not) with new tools and approaches, automated and programmable pipelines are invaluable for managing the computationally intensive tasks required for re-processing large sets of samples with revised pipelines and ensuring a common evaluation workflow is applied to all samples. Thus, re-assembling existing data with new tools using automated and programmable pipelines may yield more accurate identification of taxon-specific trends across samples in addition to novel and useful products for the community.
Article activity feed
-
Now published in GigaScience doi: 10.1093/gigascience/giy158
Lisa K. Johnson 1Department of Population Health & Reproduction, School of Veterinary Medicine, University of California Davis2Molecular, Cellular, and Integrative Physiology Graduate Group, University of California DavisFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Lisa K. JohnsonHarriet Alexander 1Department of Population Health & Reproduction, School of Veterinary Medicine, University of California DavisFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Harriet AlexanderC. Titus Brown 1Department of Population Health & Reproduction, School of Veterinary Medicine, University of California Davis2Molecular, Cellular, and Integrative Physiology …
Now published in GigaScience doi: 10.1093/gigascience/giy158
Lisa K. Johnson 1Department of Population Health & Reproduction, School of Veterinary Medicine, University of California Davis2Molecular, Cellular, and Integrative Physiology Graduate Group, University of California DavisFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Lisa K. JohnsonHarriet Alexander 1Department of Population Health & Reproduction, School of Veterinary Medicine, University of California DavisFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Harriet AlexanderC. Titus Brown 1Department of Population Health & Reproduction, School of Veterinary Medicine, University of California Davis2Molecular, Cellular, and Integrative Physiology Graduate Group, University of California Davis3Genome Center, University of California DavisFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for C. Titus Brown
A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giy158 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.
These peer reviews were as follows:
Reviewer 1: http://dx.doi.org/10.5524/REVIEW.101581 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.101582 Reviewer 3: http://dx.doi.org/10.5524/REVIEW.101583 Reviewer 4: http://dx.doi.org/10.5524/REVIEW.101584
-
-
-