A Task-Specific Transfer Learning Approach to Enhancing Small Molecule Retention Time Prediction with Limited Data

Yuhui Hong
Haixu Tang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Liquid chromatography (LC) is an essential technique for separating and identifying compounds in complex mixtures across various scientific fields. In LC, retention time (RT) is a crucial property for identifying small molecules, and its prediction has been extensively researched over recent decades. The wide array of columns and experimental conditions necessary for effectively separating diverse compounds presents a challenge. Consequently, advanced deep learning for retention time prediction in real-world scenarios is often hampered by limited training data that spans these varied experimental setups. While transfer learning (TL) can leverage knowledge from upstream datasets, it may not always provide an optimal initial point for specific downstream tasks. We consider six challenging benchmark datasets from different LC systems and experimental conditions (100-300 compounds each) where TL from RT datasets under standard condition fails to achieve satisfactory accuracy ( R ² ≥ 0.8), highlighting the need for more sophisticated TL strategies that can effectively adapt to the unique characteristics of target chromatographic systems under specific experimental conditions. We present a task-specific transfer learning (TSTL) strategy that pre-trains multiple models on distinct large-scale datasets, optimizing each for fine-tuned performance on the specific target task, then integrates them into a single model. Evaluated on five deep neural network architectures across these six datasets through 5-fold cross-validation, TSTL demonstrated significant performance improvements with the average R ² increasing from 0.587 to 0.825. Furthermore, TSTL consistently outperformed conventional TL across various sizes of training datasets, demonstrating superior data efficiency for RT prediction under various experimental conditions using limited training data.

TOC Graphic

Version published to 10.1101/2025.06.26.661631v1 on bioRxiv
Jun 27, 2025

DeepLC introduces transfer learning for accurate LC retention time prediction and adaptation to substantially different modifications and setups

This article has 12 authors:
1. Robbin Bouwmeester
2. Alireza Nameni
3. Arthur Declercq
4. Robbe Devreese
5. Kevin Velghe
6. Vladimir Gorshkov
7. Pelayo A. Penanes
8. Frank Kjeldsen
9. Magali Rompais
10. Christine Carapito
11. Ralf Gabriels
12. Lennart Martens
This article has no evaluationsLatest version Jun 3, 2025
Regularized Deep Neural Networks for Combining Heterogeneous Features of Peptides in Data Independent Acquisition Mass Spectrometry

This article has 4 authors:
1. Namgil Lee
2. Hojin Yoo
3. Dohyun Han
4. Heejung Yang
This article has no evaluationsLatest version Jun 12, 2025
LengthLogD: A Length-Stratified Ensemble Framework for Enhanced Peptide Lipophilicity Prediction via Multi-Scale Feature Integration

This article has 3 authors:
1. Shuang Wu
2. Meijie Wang
3. Lun Yu
This article has no evaluationsLatest version Jun 25, 2025

Listed in

Abstract

TOC Graphic

Article activity feed

Related articles

DeepLC introduces transfer learning for accurate LC retention time prediction and adaptation to substantially different modifications and setups

Regularized Deep Neural Networks for Combining Heterogeneous Features of Peptides in Data Independent Acquisition Mass Spectrometry

LengthLogD: A Length-Stratified Ensemble Framework for Enhanced Peptide Lipophilicity Prediction via Multi-Scale Feature Integration