Scaling k -Means for Multi-Million Frames: A Stratified NANI Approach for Large-Scale MD Simulations

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We present improved k- means clustering initialization strategies for molecular dynamics (MD) simulations, implemented as part of the N-ary Natural Initiation (NANI) method. Two new deterministic seeding strategies—strat_all and strat_reduced—extend the original NANI approaches and dramatically reduce the clustering runtime while preserving the quality of clustering results. These methods also preserve NANI’s reproducible partitioning of well-separated and compact clusters while avoiding the costly iterative seed selection procedures of previous implementations. Testing on the β-heptapeptide and the HP35 systems shows that these new flavors achieved Calinski–Harabasz (CH) and Davies–Bouldin (DB) scores comparable to the previous NANI variant, indicating that the efficiency gains come with no quality decrease. We also show how this new variant can be used to greatly speed up our previously proposed Hierarchical Extended Linkage Method (HELM). These enhancements extend the reach of NANI to accelerate large-scale MD analysis both in stand-alone k -means clustering and as a component of hybrid workflows, and remove a key barrier to routine, scalable, and reproducible exploration of complex conformational ensembles. The improved NANI implementation is accessible through our MDANCE package: https://github.com/mqcomplab/MDANCE .

Article activity feed