Comparison of de novo and reference genome-based transcriptome assembly pipelines for differential expression analysis of RNA sequencing data

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Objective

As sequencing technologies become more accessible and bioinformatic tools improve, genomic resources are increasingly available for non-model species. Using a draft genome to guide transcriptome assembly from RNA sequencing data, rather than performing assembly de novo , affects downstream analyses. Yet, direct comparisons of these approaches are rare. Here, we compare the results of the standard de novo assembly pipeline (‘Trinity’) and two reference genome-based pipelines (‘Tuxedo’ and the ‘new Tuxedo’) for differential expression and gene ontology enrichment analysis of a companion study on Atlantic cod ( Gadus morhua ).

Results

The new Tuxedo pipeline produced a higher quality assembly than the Tuxedo suite. However, greater enrichment of Trinity-identified differentially expressed genes suggests that a higher proportion of them represent biologically meaningful differences in transcription, as opposed to transcriptional noise or false positives. Coupled with the ability to annotate novel loci, the increased sensitivity of the Trinity pipeline might make it preferable over the reference genome-based approaches for studies aimed at broadly characterizing variation in the magnitude of expression differences and biological processes. However, the ‘new Tuxedo’ pipeline might be appropriate when a more conservative approach is warranted, such as for the identification of candidate genes.

Article activity feed

  1. Coupled with the ability to annotate novel loci, the increased sensitivity of the de novo pipeline for detecting likely biologically meaningful differential expression might make it preferable over the reference genome-based approaches for studies aimed at broadly characterizing variation in the magnitudes of expression differences and biological processes. However, the reference-based ‘new Tuxedo’ pipeline might be more appropriate when a more conservative approach is warranted, such as for the detection and identification of candidate genes.

    Did you explore the option of merging the StringTie and Trinity transcriptomes? Tools like orthofuser or evidentialgene could be used for this. I would be curious how well this might work. Can you think of any reason this would be a bad idea?

  2. Assembly

    It would be super helpful in this section if you could compare the transcriptome assembly sequence themselves -- similarity and containment. I think this would help build intuition for the difference in transcriptome content between the different methods. A tool like sourmash can be used to estimate these metrics (sourmash sketch, and then sourmash compare. Then you can use sourmash plot to visualize). I'm curious if the trinity assembly has lower mapping because it has fragmented transcripts which are resolved with the use of the genome.