Not every gene is special: one simple rule to control the false discovery rate when analysing high-throughput sequencing data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

1

Differential expression and differential abundance analyses are commonplace in studies employing high-throughput sequencing approaches; however different tools often fail to return comparable results when applied to the same dataset. Most tools employ various normalisations to attempt to correct for technical variation in the count data. Previously, we demonstrated that these normalisations are often inappropriate due to incorrect assumptions regarding the overall scale (i.e. size) of the biological system in question. In this study, we conducted 100 permutation analyses of 10 RNA-seq datasets to show that scale misspecification results in poor control of the false discovery rate by several commonly-used analysis tools. Moreover, we demonstrate that this can be ameliorated by using a scale model in ALDEx2 or ALDEx3 to account for uncertainty around the size of the system scale. We show that there is an inherent trade-off between satisfactory control of false-discovery rates and sensitivity and that no tool offers both. We also quantify how increasing scale uncertainty affects the difference between groups required for a feature to be reported as differentially expressed. Finally, we provide universal guidance on choosing an appropriate amount of scale uncertainty for any type of analysis. Overall, our work highlights the strengths and pitfalls of commonly used tools for differential expression analyses and highlights the choice between sensitivity and false-discovery rate control that all researchers are making when analysing sequencing data.

Article activity feed