Multimodal Feature Fusion for Molecular Property Classification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate molecular property prediction is a cornerstone of modern chemical science, driving progress in drug discovery, materials design, and environmental research. Yet, most existing models remain unimodal, while multimodal approaches often rely on simple aggregation, leaving much of the complementary chemical information underexploited. In this work, we present a multimodal feature fusion framework that unites the strengths of deep chemical language processing (CLP) models and molecular fingerprints, integrating sequential and structural representations for more comprehensive molecular characterization. Unlike previous heuristic combinations, our framework systematically investigates the principles of effective cross-modal fusion. We benchmark ten CLP architectures and eight fingerprint types through exhaustive combinatorial search to identify the most synergistic configurations. This exploration shows that aggregating multiple models does not necessarily improve performance; instead, successful fusion requires data-aware design guided by feature integration and complementarity. The proposed strategy effectively couples sequential features learned from SMILES with structural information captured by molecular fingerprints, resulting in a coherent and chemically interpretable molecular representation. Evaluated across 60 datasets from MoleculeNet and TOXRIC, our fusion models deliver consistent and substantial gains over state-of-the-art baselines. Beyond outperforming existing architectures, this work provides conceptual insights and practical guidelines for multimodal fusion in molecular property prediction, highlighting the importance of efficient fusion strategies in building robust and generalizable molecular models.