The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

The reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the sequencing depth. While unmapped or non-exonic reads do not contribute to gene expression quantification, duplicate reads contribute to the quantification but are not informative for reproducibility. We show that mapped, exonic, non-duplicate (MEND) reads are a useful measure of reproducibility of RNA-Seq datasets used for gene expression analysis.

Findings

In bulk RNA-Seq datasets from 2,179 tumors in 48 cohorts, the fraction of reads that contribute to the reproducibility of gene expression analysis varies greatly. Unmapped reads constitute 1–77% of all reads (median [IQR], 3% [3–6%]); duplicate reads constitute 3–100% of mapped reads (median [IQR], 27% [13–43%]); and non-exonic reads constitute 4–97% of mapped, non-duplicate reads (median [IQR], 25% [16–37%]). MEND reads constitute 0–79% of total reads (median [IQR], 50% [30–61%]).

Conclusions

Because not all reads in an RNA-Seq dataset are informative for reproducibility of gene expression measurements and the fraction of reads that are informative varies, we propose reporting a dataset's sequencing depth in MEND reads, which definitively inform the reproducibility of gene expression, rather than total, mapped, or exonic reads. We provide a Docker image containing (i) the existing required tools (RSeQC, sambamba, and samblaster) and (ii) a custom script to calculate MEND reads from RNA-Seq data files. We recommend that all RNA-Seq gene expression experiments, sensitivity studies, and depth recommendations use MEND units for sequencing depth.

Article activity feed

  1. Now published in GigaScience doi: 10.1093/gigascience/giab011

    Holly C. Beale 1UC Santa Cruz Molecular, Cell and Developmental Biology; UC Santa Cruz Genomics InstituteFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteORCID record for Holly C. BealeFor correspondence: hcbeale@ucsc.eduJacquelyn M. Roger 2UC Santa Cruz School of EngineeringFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteMatthew A. Cattle 2UC Santa Cruz School of EngineeringFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteLiam T. McKay 2UC Santa Cruz School of EngineeringFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteDrew K. A. Thomson 2UC Santa Cruz School of EngineeringFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteKatrina Learned 3UC Santa Cruz Genomics InstituteFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteA. Geoffrey Lyle 1UC Santa Cruz Molecular, Cell and Developmental Biology; UC Santa Cruz Genomics InstituteFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteEllen T. Kephart 3UC Santa Cruz Genomics InstituteFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteRob Currie 3UC Santa Cruz Genomics InstituteFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteDu Linh Lam 3UC Santa Cruz Genomics InstituteFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteLauren Sanders 1UC Santa Cruz Molecular, Cell and Developmental Biology; UC Santa Cruz Genomics InstituteFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteJacob Pfeil 3UC Santa Cruz Genomics InstituteFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteJohn Vivian 3UC Santa Cruz Genomics InstituteFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteIsabel Bjork 3UC Santa Cruz Genomics InstituteFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteSofie R. Salama 4Dept. of Biomolecular Engineering, UC Santa Cruz Genomics Institute, Howard Hughes Medical InstituteFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteDavid Haussler 4Dept. of Biomolecular Engineering, UC Santa Cruz Genomics Institute, Howard Hughes Medical InstituteFind this author on Google ScholarFind this author on PubMedSearch for this author on this siteOlena M. Vaske 1UC Santa Cruz Molecular, Cell and Developmental Biology; UC Santa Cruz Genomics InstituteFind this author on Google ScholarFind this author on PubMedSearch for this author on this site

    A version of this preprint has been published in the Open Access journal GigaScience (see paper https://doi.org/10.1093/gigascience/giab011 ), where the paper and peer reviews are published openly under a CC-BY 4.0 license.

    These peer reviews were as follows:

    Reviewer 1: http://dx.doi.org/10.5524/REVIEW.102677 Reviewer 2: http://dx.doi.org/10.5524/REVIEW.102678