Correcting Scale Distortion in RNA Sequencing Data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

RNA sequencing (RNA-seq) is the conventional genome-scale approach used to capture the expression levels of all detectable genes in a biological sample. This is now regularly used in the clinical diagnostic space for cancer patients. While the information gained is intended to impact treatment decisions, numerous technical and quality issues remain. This includes inaccuracies in the dissemination of gene-gene relationships. For such reasons, clinical decisions are still mostly driven by DNA biomarkers, such as gene mutations or fusions. In this study, we aimed to correct for systemic bias based on RNA-sequencing platforms in order to improve our understanding of the gene-gene relationships. To do so, we examined standard pre-processed RNA-seq datasets obtained from three studies conducted by two consortium efforts including The Cancer Genome Atlas (TCGA) and Stand Up 2 Cancer (SU2C). We particularly examined the TCGA Bladder Cancer (n = 408) and Prostate Cancer (n = 498) studies as well as the SU2C Prostate Cancer study (n = 208). Using various statistical tests, in all datasets we detected expression-level dependent biases that differ from sample to sample. Using simulations, we show that these biases corrupt gene-gene correlation estimations and t-tests between subpopulations. To mitigate these biases, we introduce two different nonlinear transforms based on statistical considerations that correct these observed biases. We demonstrate that that these transforms effectively remove the observed per-sample biases, reduce sample-to-sample variance, and improve the characteristics of gene-gene correlation distributions. Using a novel simulation methodology that creates controlled diffferences between subpopulations, we show that these transforms reduce variability and slightly increase sensitivity of two population tests. Altogether, these results improve our capacity to understand gene-gene relationships, and may lead to novel ways to utilize the information derived from clinical tests.

Article activity feed