Deep Learning Foundation Models from Classical Molecular Descriptors

William Green
Jackson Burns
Akshat Shirish Zalte
Charlles Abreu
Jochen Sieg
Christian Feldmann
Miriam Mathea

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Fast and accurate data-driven prediction of molecular properties is pivotal to scientific advancements across myriad chemical domains. Deep learning methods have recently garnered much attention, despite their inability to outperform classical machine learning methods when tested on practical, real-world benchmarks with limited training data. This study seeks to bridge this gap with CheMeleon, a O(10M) parameter foundation model that enables directed message-passing neural networks to finally exceed the performance of classical methods. Evaluated on 58 benchmark datasets from Polaris and MoleculeACE, CheMeleon achieves a win rate of 75% on Polaris tasks, outperforming baselines like Random Forest (68%), fastprop (36%), and Chemprop (32%), and a 97% win rate on MoleculeACE assays, surpassing Random Forest (50%) and other foundation models. Unlike conventional pre-training approaches that rely on noisy experimental data or biased quantum mechanical simulations, CheMeleon utilizes low-noise molecular descriptors to learn rich and highly transferable molecular representations, suggesting a new avenue for foundation model pre-training.

Version published to 10.21203/rs.3.rs-8834086/v1 on Research Square
Mar 16, 2026

Benchmarking Molecular Representations for Aqueous Solubility Prediction: The Impact of Inductive Bias and Scaffold Splitting in Low-Data Regimes

This article has 1 author:
1. Mudassir Ur Rahman
This article has no evaluationsLatest version Mar 23, 2026
Multimodal Feature Fusion for Molecular Property Classification

This article has 8 authors:
1. Jing Liu
2. Yin Wang
3. Li Xue
4. Qiaorong Wu
5. Wenwei Tao
6. Yiwei Wang
7. Jianming Wu
8. Jiesi Luo
This article has no evaluationsLatest version Apr 10, 2026
ArcMol Enables Task-Adaptive Spherical Representation Learning for Molecular Property Prediction

This article has 7 authors:
1. Lijuan Chen
2. yurong zou
3. Zhongning Guo
4. Zihan zou
5. Duanyang Qin
6. Dingguo Xu
7. Taijin Wang
This article has no evaluationsLatest version Apr 9, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Benchmarking Molecular Representations for Aqueous Solubility Prediction: The Impact of Inductive Bias and Scaffold Splitting in Low-Data Regimes

Multimodal Feature Fusion for Molecular Property Classification

ArcMol Enables Task-Adaptive Spherical Representation Learning for Molecular Property Prediction