Comparison of de novo and reference genome-based transcriptome assembly pipelines for differential expression analysis of RNA sequencing data

Rebekah A. Oomen
Halvor Knutsen
Esben M. Olsen
Sissel Jentoft
Nils Chr. Stenseth
Jeffrey A. Hutchings

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

Objective

As sequencing technologies become more accessible and bioinformatic tools improve, genomic resources are increasingly available for non-model species. Using a draft genome to guide transcriptome assembly from RNA sequencing data, rather than performing assembly de novo , affects downstream analyses. Yet, direct comparisons of these approaches are rare. Here, we compare the results of the standard de novo assembly pipeline (‘Trinity’) and two reference genome-based pipelines (‘Tuxedo’ and the ‘new Tuxedo’) for differential expression and gene ontology enrichment analysis of a companion study on Atlantic cod ( Gadus morhua ).

Results

The new Tuxedo pipeline produced a higher quality assembly than the Tuxedo suite. However, greater enrichment of Trinity-identified differentially expressed genes suggests that a higher proportion of them represent biologically meaningful differences in transcription, as opposed to transcriptional noise or false positives. Coupled with the ability to annotate novel loci, the increased sensitivity of the Trinity pipeline might make it preferable over the reference genome-based approaches for studies aimed at broadly characterizing variation in the magnitude of expression differences and biological processes. However, the ‘new Tuxedo’ pipeline might be appropriate when a more conservative approach is warranted, such as for the identification of candidate genes.

Arcadia Science
Aug 29, 2023

Coupled with the ability to annotate novel loci, the increased sensitivity of the de novo pipeline for detecting likely biologically meaningful differential expression might make it preferable over the reference genome-based approaches for studies aimed at broadly characterizing variation in the magnitudes of expression differences and biological processes. However, the reference-based ‘new Tuxedo’ pipeline might be more appropriate when a more conservative approach is warranted, such as for the detection and identification of candidate genes.

Did you explore the option of merging the StringTie and Trinity transcriptomes? Tools like orthofuser or evidentialgene could be used for this. I would be curious how well this might work. Can you think of any reason this would be a bad idea?

Read the original source
Arcadia Science
Aug 29, 2023

Assembly

It would be super helpful in this section if you could compare the transcriptome assembly sequence themselves -- similarity and containment. I think this would help build intuition for the difference in transcriptome content between the different methods. A tool like sourmash can be used to estimate these metrics (sourmash sketch, and then sourmash compare. Then you can use sourmash plot to visualize). I'm curious if the trinity assembly has lower mapping because it has fragmented transcripts which are resolved with the use of the genome.

Read the original source
Arcadia Science
Aug 29, 2023

.

small typo -- period should be a comma :)

Read the original source
Version published to 10.1101/2022.08.20.504634 on bioRxiv
Aug 22, 2022

Optimizing bioinformatic workflows to extract clinically usable gene expression data from targeted RNA sequencing panels: comparison with total RNAseq

This article has 12 authors:
1. Xiaokang Pan
2. Ashley Patton
3. Yi Seok Chang
4. Ryan Stevens
5. Nehad Mohamed
6. Matthew Hunt
7. Daniel Chappell
8. Yan Hu
9. Cecelia Miller
10. Weiqiang Zhao
11. Matthew Avenarius
12. Dan Jones
This article has no evaluationsLatest version Feb 3, 2026
META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

This article has 8 authors:
1. Louis-Maël Guéguen
2. Alban Mathieu
3. Simon Pelletier
4. Anthony Woo
5. Namita Misra
6. Magali Moreau
7. Olivier Perin
8. Arnaud Droit
This article has no evaluationsLatest version Jan 29, 2026
Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

This article has 7 authors:
1. Grazia Visci
2. Elisabetta Notario
3. Giuseppe Defazio
4. Mariano Francesco Caratozzolo
5. Bruno Fosso
6. Marinella Marzano
7. Graziano Pesole
This article has no evaluationsLatest version Jan 30, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Objective

Results

Article activity feed

Related articles

Optimizing bioinformatic workflows to extract clinically usable gene expression data from targeted RNA sequencing panels: comparison with total RNAseq

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world