Challenges in Transferable Prediction of Solvation Free Energy: A Comparative Analysis of Molecular Representations and Machine Learning Methods

Dibyendu Maity
Suman Chakrabarty

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In-silico prediction of physicochemical properties such as solvation free energy is crucial for efficient drug discovery. However, accurate prediction remains challenging due to complexities inherent in molecular representations and model transferability. This study systematically evaluates the influence of different molecular representations, namely descriptor-based, fingerprint-based and graph-based, on the predictive performance and transferability of supervised machine learning (ML) models. Using three diverse datasets (MNSol, FreeSolv, and CombiSolv), we compared classical regression techniques (XGBoost, Random Forest, Support Vector Regression, Kernel Ridge Regression) against deep learning models, specifically the Chemically Interpretable Graph Interaction Network (CIGIN). Our findings indicate that while traditional models with interpretable descriptors provide insights into the important features, their transferability is limited by dataset size and chemical diversity. Molecular fingerprints show improved performance, and a Multilayer Perceptron (MLP) Regressor demonstrates better regularization with high-dimensional fingerprints compared to traditional models. The graph-based CIGIN model exhibits strong performance and chemical interpretability but faces challenges in generalizing to novel chemical entities absent in the training data, showing increased errors for molecules with long hydrocarbon chains or polyol moieties. This research highlights the critical interplay between data quality, molecular representation, and model choice in achieving accurate and transferable predictions of molecular properties, underscoring the need for further refinement in handling novel chemical space and incorporating physics-informed features.

Version published to 10.21203/rs.3.rs-6727155/v1 on Research Square
Jul 23, 2025

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025
Feature-Optimized Machine Learning Benchmarking for Protein Interface Prediction in Permanent Homodimer Complexes with Distinct Structural Features

This article has 4 authors:
1. Tayyip Topuz
2. Zeki Erdem
3. Halil Bisgin
4. E. Demet Akten
This article has no evaluationsLatest version Feb 2, 2026
Predictive Bioactivity Modeling and Structural Binding Analysis for the Identification of Potential SMYD3 Modulators

This article has 4 authors:
1. Abdullah R. Alzahrani
2. Zia Ur Rehman
3. Talha Jawaid
4. Abida Khan
This article has no evaluationsLatest version Jan 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

Feature-Optimized Machine Learning Benchmarking for Protein Interface Prediction in Permanent Homodimer Complexes with Distinct Structural Features

Predictive Bioactivity Modeling and Structural Binding Analysis for the Identification of Potential SMYD3 Modulators