Multimodal Feature Fusion for Molecular Property Classification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate molecular property prediction is a cornerstone of modern chemical science, driving progress in drug discovery, materials design, and environmental research. Yet, most existing models remain unimodal, while multimodal approaches often rely on simple aggregation, leaving much of the complementary chemical information underexploited. In this work, we present a multimodal feature fusion framework that unites the strengths of deep chemical language processing (CLP) models and molecular fingerprints, integrating sequential and structural representations for more comprehensive molecular characterization. Unlike previous heuristic combinations, our framework systematically investigates the principles of effective cross-modal fusion. We benchmark ten CLP architectures and eight fingerprint types through exhaustive combinatorial search to identify the most synergistic configurations. This exploration shows that aggregating multiple models does not necessarily improve performance; instead, successful fusion requires data-aware design guided by feature integration and complementarity. The proposed strategy effectively couples sequential features learned from SMILES with structural information captured by molecular fingerprints, resulting in a coherent and chemically interpretable molecular representation. Evaluated across 60 datasets from MoleculeNet and TOXRIC, our fusion models deliver consistent and substantial gains over state-of-the-art baselines. Beyond outperforming existing architectures, this work provides conceptual insights and practical guidelines for multimodal fusion in molecular property prediction, highlighting the importance of efficient fusion strategies in building robust and generalizable molecular models.

Article activity feed