cpiVAE: Robust and Interpretable Cross-Platform Proteomics Imputation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large-scale plasma proteomic studies often use different high-throughput affinity platforms, and measurements of the same protein across platforms are often discordant. Discordance hinders cross-study integration. Improving proteomics data integration would enable more powerful meta-analyses, improve statistical power for biomarker discovery, and provide a better understanding of proteome–phenotype relationships. Here we present a cross-platform proteomics imputation variational autoencoder (cpiVAE), a deep generative model for bidirectional imputation of protein abundances between two widely used platforms: Olink and SomaScan. Using a training cohort of paired measurements from the China Kadoorie Biobank (CKB), cpiVAE learns a joint latent representation that enables cross-platform imputation. The cpiVAE method improves benchmarks provided by established methods, k-nearest neighbors (KNN) Weighted Nearest Neighbors (WNN, from Seurat v4). The cpiVAE method achieves up to 30% higher correlation between imputed and true values than KNN and WNN. The cpiVAE method also generalizes well to an independent cohort from the Atherosclerosis Risk in Communities Study (ARIC). Without retraining, cpiVAE maintains high performance compared to benchmarks. Associations of imputed protein levels with clinical phenotypes closely mirror results using the actual measurements and increases power in a meta-analysis scenario. A post-hoc feature importance matrix enables interpretation of this AI model. Protein pair features extracted from cpiVAE have significant overlap with known associations in the Search Tool for the Retrieval of Interacting Genes (STRING) database. In summary, cpiVAE offers an accurate, generalizable, and interpretable solution for cross-platform proteomic imputation, enabling integrated analyses across studies with proteomics measured on different platforms. This user-friendly framework and pre-trained model weights are available under a BSD2 open source license at https://github.com/joelbaderlab/cpiVAE_v1 .

Article activity feed