Proteomizer: Leveraging the Transcriptome-Proteome Mismatch to Infer Novel Gene Regulatory Relations

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The correlation between transcriptomic (Tx) and proteomic (Px) profiles remains modest, typically around r = 0.5 across genes and r = 0.3 across samples, limiting the utility of transcriptomic data as a proxy for protein abundance. To address this, we introduce Proteomizer, a deep learning platform designed to infer a sample’s Px landscape from its Tx and miRNomic (Mx) profiles. Trained on 8,613 matched Tx-Mx-Px samples from TCGA and CPTAC, Proteomizer achieved a Tx-Px correlation of r = 0.68, representing the highest performance reported to date for this task. We further developed a Monte Carlo simulation framework to evaluate the impact of proteomization on differential expression analysis. Proteomizer substantially improved the accuracy of differential gene expression detection, with p-value precision increasing by up to 62-fold, and by as much as six orders of magnitude for a subset of genes enriched in mitochondrial and ribosomal functions. However, performance gains did not generalize to unseen tissue types or datasets generated using different protocols. Finally, we applied explainable AI (XAI) techniques to identify regulatory relations contributing to Tx-Px discrepancies. Our predictions from 100 highly annotated genes were cross-compared against by a literature-based biological knowledge graph of 322 million annotations: our explainers achieved a ROC-AUC of 0.74 in predicting miRNA-gene downregulation interactions. To our knowledge, this is the first study to systematically evaluate the biological relevance, limitations, and interpretability of proteomization models, establishing Proteomizer as a state-of-the-art tool for multiomic integration and hypothesis generation.

Graphical Abstract

miRNA: microRNA; Mx: miRNome; MS: mass spectrometry; PTM: post-translational modifications; Px: proteome; Px’: predicted proteome; r : Pearson correlation coefficient; RBP: RNA binding proteins; ROC-AUC: area under the receiver operating characteristic curve; RPPA: reverse-phase protein array; TF: transcription factors; Tx: transcriptome; XAI: explainable artificial intelligence.

Article activity feed