CoD-CAST+: A Hierarchically Accelerated and Distributed Clustering Framework for Large-Scale Data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study presents CoD-CAST+, a distributed clustering algorithm tailored for large-scale and high-dimensional data analysis. Built upon the CoD-CAST-GPU architecture, CoD-CAST+ preserves the core idea of on-demand affinity computation while introducing a hierarchical acceleration strategy that integrates multi-threading, GPU parallelism, and inter-node coordination. A visualization module is embedded to enhance interpretability, and the system design is guided by a systematic review of clustering algorithms developed over the past decade. Experimental results on a dataset containing 2,048,000 gene records demonstrate that CoD-CAST-GPU achieves a 12.44× speedup over the original CAST algorithm. Building on this, CoD-CAST+(Local) and the fully distributed CoD-CAST+ further extend the performance, with the final system achieving a 110.33× total speedup. Additionally, CoD-CAST+ is the first in the CAST family to support result visualization, making it an innovative and scalable solution for clustering analysis in ultra-large-scale data environments.