Reliable Identification of Homodimers Using AlphaFold
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation: Protein-protein interactions are central for understanding biological processes. The ability to predict interaction partners is extremely valuable for avoiding costly, time-consuming experiments. It has been shown that AlphaFold has an unsurpassed ability to accurately evaluate interacting protein pairs. However, a protein can also form homomeric interactions, i.e. interact with itself. Results: We found that AlphaFold yielded a significantly higher false-positive rate for identifying homodimers than for heterodimers. True Positive Rate (TPR) at 1% False Positive Rate (FPR) drops from 63% for heterodimers to 18% for homodimers. When we investigated the high-scoring false positives, i.e., non-homodimers with high AlphaFold scores when predicted as such, we found that their homologs were enriched for homomultimeric proteins. Using a simple logistic regression model that combines AlphaFold scores with structural and homology information, we increased the TPR (at 1% FPR) to 42 +/- 8% (5-fold cross-validation) from 19%. If we excluded the homology information, we achieved a TPR of 28 +/- 7%, which is still better than using AlphaFold metrics. Availability and implementation: All data are available from Zenodo DOI:\10.5281/zenodo.17738668 and all code from https://github.com/SarahND97/alphafold-homodimers