BitBIRCH-Lean: chemical space in the palm of your workstation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

We present BitBIRCH-Lean, a fast, memory-efficient implementation of the Bit-BIRCH algorithm, designed for high-throughput clustering of huge molecular libraries (up to billions of drug-like molecules) on typical workstations. BitBIRCH-Lean considerably improves on the original BitBIRCH implementation by incorporating dynamic types and bit-packed fingerprints inside the clustering tree. Most operations in BitBIRCH-Lean are efficiently performed on compressed data, and optional C++ extension accelerate the bottleneck calculations, providing up to 2X speedup. Benchmark tests against GPU-accelerated methods highlight BitBIRCH-Lean as an efficient alternative for processing vast amounts of molecules. We further demonstrate the versatility of this new package by showcasing a parallel, multi-round variant of the Bit- BIRCH algorithm that exploits the gains in efficiency to cluster hundreds of millions of molecules in minutes, with no loss in cluster quality. The code is freely available at: https://github.com/mqcomplab/bblean .

Article activity feed