PRISM: A Proteomics Robust Imputation framework for Structure-aware Modeling of missingness

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Missing values in mass spectrometry (MS)-based proteomics, particularly within label-free quantification(LFQ) workflows, hinders downstream data analysis. These missing values are predominantly categorized as Missing Not At Random(MNAR),as they often arise when the signals ofl ow-abundance proteinsor peptides fall below the instrument’s limit of detection. To address this, we introduce PRISM, a novel imputation framework comprising two complementary deep learning models: a Denoising Convolutional Autoencoder (DCAE)and a Deep Matrix Factorization(DMF).The core innovation of the framework is its explicit modeling of the intrinsic MNAR missingness mechanism through a gradient downsampling strategy and a dual-task learning objective. We conducted a comprehensive evaluation on multiple real-world and synthetic proteomics datasets, assessing performance on metrics including numerical accuracy(RMSE),downstream classification task performance,and preservation of clustering structure. The results demonstrate that PRISM not only surpasses existing methods in imputation accuracybut,more importantly,also excels at preserving the inherent biological structure of the data,thereby supporting more accurate downstream prediction and exploratory analysis. Therefore, PRISM provides the proteomics research community with a more accurate toolkit that better preserves the original structure of data,thereby enhancing the credibility of biological discoveries obtained from incomplete data.

Article activity feed