Regularized Deep Neural Networks for Combining Heterogeneous Features of Peptides in Data Independent Acquisition Mass Spectrometry

Namgil Lee
Hojin Yoo
Dohyun Han
Heejung Yang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Data-independent acquisition (DIA) has gained much attention in mass spectrometry (MS)-based proteomics for its improved reproducibility and unbiased data acquisition. In DIA-MS, the spectral library is crucial in peptide identification. However, this method is limited to peptides previously identified via data-dependent acquisition (DDA) MS experiments. This study proposes a deep learning approach for generating spectral libraries, even for previously unseen peptides. While most deep learning-based methods rely on one-hot encoding representation for peptides, the proposed method incorporates physicochemical features, including atomic composition, hydrophobicity, flexibility, fractional surface probability, and aromaticity. We introduce sparsity regu-larized neural network layers to facilitate the selection and combination of important high-dimensional physicochemical features and improve prediction performance. Fur-thermore, we suggest a transfer learning strategy for training the proposed deep neural networks having multiple heterogeneous input channels. Numerical experiments using benchmark DDA-MS data demonstrated that the proposed deep learning model out-performed existing benchmark models, such as Prosit and DeepDIA, particularly in predicting retention times. And it was demonstrated that the proposed models with sparsity regularization identified more peptides from HeLa cell DIA data compared to the other deep learning models.

Version published to 10.1101/2025.06.09.658564 on bioRxiv
Jun 12, 2025

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery

This article has 3 authors:
1. Brandon Yee
2. Maximilian Rutkowski
3. Wilson Collins
This article has no evaluationsLatest version Jan 28, 2026
DCPM-ADMET: Fusion of Dual-channel Pre-trained Model and Molecular Fingerprints to enhance Drug ADMET Properties Prediction

This article has 7 authors:
1. Yuchen Zeng
2. Yue Qi
3. Leilei Zhang
4. Kaili Jiang
5. Xiaofei Zhou
6. Lu Liang
7. Jianping Lin
This article has no evaluationsLatest version Dec 19, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery

DCPM-ADMET: Fusion of Dual-channel Pre-trained Model and Molecular Fingerprints to enhance Drug ADMET Properties Prediction