msBayesImpute: A Versatile Framework for Addressing Missing Values in Biomedical Mass Spectrometry Proteomics Data

Jiaojiao He
Barbara Helm
Franziska Gödtel
Katharina Büchner
Marcel Schilling
Marc A. Schneider
Laura V. Klotz
Hauke Winter
Britta Velten
Ursula Klingmüller
Junyan Lu

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Advancements in mass spectrometry (MS) technologies have significantly improved the ability to quantify proteins and analyse their modifications. However, MS-based proteomics datasets frequently encounter missing values due to a complex interplay of missing at random (MAR) and missing not at random (MNAR) mechanisms. If unaddressed, such missing data can result in information loss and biased outcomes in data pre-processing, such as normalisation, as well as subsequent analyses and interpretations. Few approaches effectively address both MAR and MNAR, and those that do often necessitate manual tuning of mixture percentages between them or rely on two-group experimental designs. To enhance the handling of missing values, we developed msBayesImpute, an innovative computational method that integrates Bayesian factorization with probabilistic dropout models. We evaluated msBayesImpute against several popular imputation methods using both simulated missing values and those generated through a dilution series experiment on samples from lung cancer patients. Our comprehensive benchmark demonstrated superior performance in reconstructing missing values, estimating normalization factors, and identifying differentially expressed proteins across varying levels of missingness. Notably, msBayesImpute does not require predefined experimental designs and is scalable to large-scale studies. This versatility positions msBayesImpute as an effective and robust tool for enhancing the utility of MS datasets in biological research.

Version published to 10.1101/2025.10.02.679746 on bioRxiv
Oct 4, 2025

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
Bayesian Network Structure Learning from Incomplete Breast Cancer Data Using Structural Expectation–Maximization

This article has 3 authors:
1. Navaee Lavasani Monireh
2. Rezaeitabar Vahid
3. Khayamzadeh Maryam
This article has no evaluationsLatest version Dec 10, 2025
PRESSnet: a novel framework for patient stratification and biomarker discovery using clinical knowledge graphs

This article has 11 authors:
1. Jake Cohen-Setton
2. Shruti Shikhare
3. Ioannis Kagiampakis
4. Domingo Salazar
5. Miguel Goncalves
6. Elizabeth Coker
7. Sanddhya Jayabalan
8. Damian Bikiel
9. Ben Sidders
10. Etai Jacob
11. Krishna Bulusu
This article has no evaluationsLatest version Dec 15, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

Bayesian Network Structure Learning from Incomplete Breast Cancer Data Using Structural Expectation–Maximization

PRESSnet: a novel framework for patient stratification and biomarker discovery using clinical knowledge graphs