Consensus Clustering for Robust Bioinformatics Analysis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Clustering plays an important role in a multitude of bioinformatics applications, including protein function prediction, population genetics, and gene expression analysis. The results of most clustering algorithms are sensitive to variations of the input data, the clustering algorithm and its parameters, and individual datasets. Consensus clustering (CC) is an extension to clustering algorithms that aims to construct a robust result from those clustering features that are invariant under the above sources of variation. As part of CC, stability scores can provide an idea of the degree of reliability of the resulting clustering. This review structures the CC approaches in the literature into three principal types, introduces and illustrates the concept of stability scores, and illustrates the use of CC in applications to simulated and real-world gene expression datasets. Open-source R implementations for each of these CC algorithms are available in the GitHub repository: https://github.com/behnam-yousefi/ConsensusClustering

Article activity feed