Cutting through the clutter: minimizing redundancy in GO enrichment analysis with evoGO

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Gene Ontology (GO) enrichment analysis is a powerful tool for elucidating underlying biological processes in high-throughput transcriptomic and proteomic studies. However, the redundancy of identified enriched GO terms, caused by the hierarchical nature of GO, significantly complicates the prioritization of relevant terms and makes drawing concise conclusions challenging. To address this problem, we developed evoGO, a novel method that aims to improve the specificity and relevance of enrichment results by considering the GO hierarchy during the analysis. Built on the foundation of conventional overrepresentation analysis (ORA), evoGO reduces the impact of differentially expressed genes on the significance of a given GO term if those genes already contribute to a higher significance of any descendant GO term. The effectiveness of the algorithm was evaluated against other advanced ORA-based GO enrichment analysis methods (topGO, clusterProfiler, and SetRank) using synthetic and real-life datasets. In the synthetic benchmarks, evoGO reduced the number of enriched terms identified by ORA on average by 30% while recovering the highest fraction (96%) of the true positives. When applied to real-life data, evoGO most efficiently prioritized tissue-specific GO terms in the analyses aimed to capture the biological processes inherent to various human tissues. Furthermore, evoGO eliminated the fewest enriched GO terms that had no significantly enriched relatives and thus should not be considered redundant. Finally, evoGO was one of the fastest of the tested methods and, unlike competitors, maintained consistent execution times across both benchmarks. In conclusion, our findings demonstrate that among tested ORA-based approaches, evoGO stands out as an effective and fast method for minimizing redundancy in GO enrichment analysis results while reliably preserving biologically relevant information. We believe that evoGO could be a valuable tool for conducting downstream analyses, particularly in the context of high-throughput transcriptomic and proteomic screening studies. The evoGO method has been implemented as an R package, which is available on GitHub.

Article activity feed