Extensive Benchmarking of Community Detection Algorithms

R Sapna
Harikeshav Karthik
Karthik Raman

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The detection of clusters or community in networks is an important problem in network science. We systematically evaluate many widely used community detection algorithms and their variants to identify clusters in complex networks. As the ground truth for assessing accuracy, we use artificial networks modeled on power-law distributions and real-world social networks. In addition, we also performed gene enrichment analysis on human and yeast protein–protein interaction networks to evaluate algorithms on their ability to uncover enriched communities. We implement and adapt an extensive suite of classical algorithms and their modern variants, classified into five types: stochastic, kernel-based, modularity-based, hierarchical, and local search-based. The algorithms are benchmarked primarily using the Normalized Mutual Information metric, with additional analyses focused on granularity by examining cluster ratio and computational time complexity. We find that decreasing the modularity of networks leads to a consistent decline in performance that follows a sigmoidal trajectory as communities become less defined. Algorithms with greater granularity remain stable when community structures are less distinct, while computation time remains independent of network modularity. Additionally, algorithms tend to perform poorly on smaller networks, and higher accuracy often requires a time complexity trade-off for specific high-performing methods. However, as the analysis expands to more extensive networks, this trade-off becomes more pronounced, highlighting the need for efficient scalability. Based on our benchmark and gene enrichment analysis results, we also present recommendations to practitioners. Our robust Python package, complete with a user-friendly command-line interface, empowers users to easily apply these algorithms to their datasets.

Author summary

In the era of big data, clustering has become an essential tool for processing and analyzing vast amounts of information. By dividing large data sets into smaller, meaningful clusters, we can simplify complex data structures, parallelize tasks, and reveal hidden patterns. This has become an essential preprocessing step that is widely used in various computational domains. Although traditional clustering algorithms remain popular, our work implements range of hybrid techniques that combine classical methods and some novel approaches. We also benchmark the performance of these approaches, their efficiency and efficacy in handling different types of networks.

Version published to 10.1101/2025.05.07.652778v1 on bioRxiv
May 12, 2025

A Novel Method for Community Detection in Bipartite Networks

This article has 4 authors:
1. Ali Movaghar
2. Ali Khosrozadeh
3. Mohammad Mehdi Gilanian Sadeghi
4. Hamidreza Mahyar
This article has no evaluationsLatest version Mar 24, 2025
GnnDebugger: GNN based error correction in De Bruijn Graphs

This article has 3 authors:
1. Marijo Šimunović
2. Mile Šikić
3. Anton Bankevich
This article has no evaluationsLatest version May 13, 2025
Panalyze: automated virus pangenome variation graph construction and analysis

This article has 3 authors:
1. Chandana Tennakoon
2. Thibaut Freville
3. Tim Downing
This article has no evaluationsLatest version Apr 23, 2025

Listed in

Abstract

Author summary

Article activity feed

Related articles

A Novel Method for Community Detection in Bipartite Networks

GnnDebugger: GNN based error correction in De Bruijn Graphs

Panalyze: automated virus pangenome variation graph construction and analysis