Ontology pre-training improves machine learning-based predictions for metabolites

Charlotte Tumescheit
Martin Glauer
Simon Flügel
Martin Larralde
Fabian Neuhaus
Till Mossakowski
Janna Hastings

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Recent advances in the field of machine learning have shown that integration of expert knowledge improves performances, in particular for complex domains such as biology. Bio-ontologies offer a rich source of curated biological knowledge that can be harnessed to this end. Here, we describe an intuitive and generalisable approach to embed the knowledge contained in a classification hierarchy derived from a bio-ontology into a machine learning model as an intermediate training step between general-purpose pre-training and task-specific fine-tuning in a process that we call ‘ontology pre-training’. We show that this approach leads to an improvement in predictive performance and a reduction in training time for a broad range of prediction tasks relevant to understanding metabolite functions in living systems, using a range of datasets derived from MoleculeNet. We see the biggest improvement for regression tasks, e.g. prediction of lipophilicity and aqueous solubility of molecules, and a robust improvement for most classification tasks. Our approach can be adapted for a wide range of knowledge sources, models and prediction tasks.

Version published to 10.1101/2025.09.30.679573 on bioRxiv
Oct 2, 2025

Defining Peptides in ChEBI

This article has 8 authors:
1. Simon Flügel
2. Till Mossakowski
3. Fabian Neuhaus
4. Erik Pfanenstiel
5. Martin Glauer
6. Edgar Haak
7. Adnan Malik
8. Noel M O'Boyle
This article has no evaluationsLatest version Jan 28, 2026
Bayesian Optimization for Biochemical Discovery with LLMs

This article has 6 authors:
1. Rafael Gómez-Bombarelli
2. Mattias Akke
3. Soojung Yang
4. Jurgis Ruza
5. Jinyeop Song
6. Elton Pan
This article has no evaluationsLatest version Jan 22, 2026
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Defining Peptides in ChEBI

Bayesian Optimization for Biochemical Discovery with LLMs

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction