Revisiting CPUs for Protein Folding: Xeon-Based Acceleration of AlphaFold2

Narendra Chaudhary
Wei Yang
Dhiraj Kalamkar
Jianqian Zhou
Soumyadip Ghosh
Lei Xia
Manasi Tiwari
Alexander Heinecke
Bharat Kaul
Sanchit Misra

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Protein structure prediction via AlphaFold2 has revolutionized drug discovery, yet its end-to-end execution remains computationally intensive. While GPUs are traditionally favored for deep learning, the AlphaFold2 algorithm consists of heterogeneous phases — preprocessing with sparse database searches and model inference with low-arithmetic-intensity attention modules — that present unique architectural challenges. In this work, we address these bottlenecks by introducing Open-Omics-AlphaFold2, a highly optimized implementation for Intel ^® Xeon ^® CPU. By leveraging the CPU’s versatility in handling both sparse preprocessing algorithms and dense matrix operations via Intel Advanced Matrix Extensions (AMX), we accelerate the entire pipeline end-to-end. Our optimization strategy employs multi-level parallelism — spanning multiprocessing, multi-threading, and vectorization — alongside cacheaware tiling and operator fusion. Our results demonstrate that, on a Xeon CPU, Open-Omics-AlphaFold2 achieves 2 7.58 speedup for preprocessing and 19.8 29.2 speedup for model inference over baseline Deepmind-AlphaFold2. Moreover, for a proteome of 391 proteins, Open-Omics-AlphaFold2 running on a dual-socket Intel Xeon 6980P system achieves a remarkable 76% higher through-put over the state-of-the-art GPU-accelerated solution, FastFold, running on a single-socket Intel Xeon 6980P CPU with an NVIDIA H100 offioad.

Code availability

Baremetal: https://github.com/IntelLabs/open-omics-alphafold Containerized: https://github.com/IntelLabs/Open-Omics-Accelera tion-Framework/tree/main/pipelines/alphafold2-based-protein-folding

Version published to 10.64898/2026.05.27.728222 on bioRxiv
May 29, 2026

Rapid-PFP: Accelerating Prefix-Free Parsing with GPU Parallelism

This article has 5 authors:
1. Eddie Ferro
2. Tyler Pencinger
3. Oded Green
4. Mahsa Lotfollahi
5. Christina Boucher
This article has no evaluationsLatest version May 1, 2026
Machine learning-based prediction of memory requirements for metagenomic assembly in high-performance computing environments

This article has 7 authors:
1. Santiago Sanchez
2. Santino Faack
3. Martin Beracochea
4. Robert D. Finn
5. Björn Grüning
6. Bérénice Batut
7. Paul Zierep
This article has no evaluationsLatest version May 13, 2026
Just Add Structure: Protein Language Models Combined with Structural Equivariance Excel at Protein Tasks

This article has 5 authors:
1. Qurat-ul-ain
2. Carlos Outeiral
3. Matteo Cagiada
4. Yee Whye Teh
5. Charlotte M. Deane
This article has no evaluationsLatest version May 29, 2026

Discuss this preprint

Listed in

Abstract

Code availability

Article activity feed

Related articles

Rapid-PFP: Accelerating Prefix-Free Parsing with GPU Parallelism

Machine learning-based prediction of memory requirements for metagenomic assembly in high-performance computing environments

Just Add Structure: Protein Language Models Combined with Structural Equivariance Excel at Protein Tasks