Revisiting CPUs for Protein Folding: Xeon-Based Acceleration of AlphaFold2
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein structure prediction via AlphaFold2 has revolutionized drug discovery, yet its end-to-end execution remains computationally intensive. While GPUs are traditionally favored for deep learning, the AlphaFold2 algorithm consists of heterogeneous phases — preprocessing with sparse database searches and model inference with low-arithmetic-intensity attention modules — that present unique architectural challenges. In this work, we address these bottlenecks by introducing Open-Omics-AlphaFold2, a highly optimized implementation for Intel ® Xeon ® CPU. By leveraging the CPU’s versatility in handling both sparse preprocessing algorithms and dense matrix operations via Intel Advanced Matrix Extensions (AMX), we accelerate the entire pipeline end-to-end. Our optimization strategy employs multi-level parallelism — spanning multiprocessing, multi-threading, and vectorization — alongside cacheaware tiling and operator fusion. Our results demonstrate that, on a Xeon CPU, Open-Omics-AlphaFold2 achieves 2 7.58 speedup for preprocessing and 19.8 29.2 speedup for model inference over baseline Deepmind-AlphaFold2. Moreover, for a proteome of 391 proteins, Open-Omics-AlphaFold2 running on a dual-socket Intel Xeon 6980P system achieves a remarkable 76% higher through-put over the state-of-the-art GPU-accelerated solution, FastFold, running on a single-socket Intel Xeon 6980P CPU with an NVIDIA H100 offioad.
Code availability
Baremetal: https://github.com/IntelLabs/open-omics-alphafold Containerized: https://github.com/IntelLabs/Open-Omics-Accelera tion-Framework/tree/main/pipelines/alphafold2-based-protein-folding