CoD-CAST+: A Hierarchically Accelerated and Distributed Clustering Framework for Large-Scale Data

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This study presents CoD-CAST+, a distributed clustering algorithm tailored for large-scale and high-dimensional data analysis. Built upon the CoD-CAST-GPU architecture, CoD-CAST+ preserves the core idea of on-demand affinity computation while introducing a hierarchical acceleration strategy that integrates multi-threading, GPU parallelism, and inter-node coordination. A visualization module is embedded to enhance interpretability, and the system design is guided by a systematic review of clustering algorithms developed over the past decade. Experimental results on a dataset containing 2,048,000 gene records demonstrate that CoD-CAST-GPU achieves a 12.44× speedup over the original CAST algorithm. Building on this, CoD-CAST+(Local) and the fully distributed CoD-CAST+ further extend the performance, with the final system achieving a 110.33× total speedup. Additionally, CoD-CAST+ is the first in the CAST family to support result visualization, making it an innovative and scalable solution for clustering analysis in ultra-large-scale data environments.

Article activity feed