msBayesImpute: A Versatile Framework for Addressing Missing Values in Biomedical Mass Spectrometry Proteomics Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Advancements in mass spectrometry (MS) technologies have significantly improved the ability to quantify proteins and analyse their modifications. However, MS-based proteomics datasets frequently encounter missing values due to a complex interplay of missing at random (MAR) and missing not at random (MNAR) mechanisms. If unaddressed, such missing data can result in information loss and biased outcomes in data pre-processing, such as normalisation, as well as subsequent analyses and interpretations. Few approaches effectively address both MAR and MNAR, and those that do often necessitate manual tuning of mixture percentages between them or rely on two-group experimental designs. To enhance the handling of missing values, we developed msBayesImpute, an innovative computational method that integrates Bayesian factorization with probabilistic dropout models. We evaluated msBayesImpute against several popular imputation methods using both simulated missing values and those generated through a dilution series experiment on samples from lung cancer patients. Our comprehensive benchmark demonstrated superior performance in reconstructing missing values, estimating normalization factors, and identifying differentially expressed proteins across varying levels of missingness. Notably, msBayesImpute does not require predefined experimental designs and is scalable to large-scale studies. This versatility positions msBayesImpute as an effective and robust tool for enhancing the utility of MS datasets in biological research.

Article activity feed