BitBIRCH Clustering Refinement Strategies
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Chemical libraries are becoming not only increasingly bigger, but they are doing so at an accelerated pace. Keeping up with this explosion in chemical data demands more than just hardware upgrades, we need dramatically more efficient algorithms as well. We have been working in this direction, with the introduction of the iSIM framework, which uses n -ary similarity to speed up the processing of very large sets. Recently, we showed how to use this technique to cluster billions of molecules with unprecedented efficiency through the BitBIRCH algorithm. In this Application Note we present a package fully-dedicated to expanding on the BitBIRCH method, including multiple options that give the user appreciable control over the tree structure, while dramatically improving the quality of the final partitions. Remarkably, this is achieved without compromising the efficiency of the original method. We also present new post-processing tools that help dissect the clustering information, as well as ample examples showcasing the new functionalities. BitBIRCH is publicly available at: https://github.com/mqcomplab/bitbirch .