AdaptMol: Domain Adaptation for Molecular Image Recognition with Limited Supervision

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Optical Chemical Structure Recognition (OCSR) aims to convert two-dimensional molecular images into machine-readable formats such as SMILES strings. Despite significant progress in deep learning based approaches, current OCSR methods trained predominantly on synthetic data often fail to generalize to diverse real-world inputs with varying visual styles and acquisition conditions. Hand-drawn images represent a particularly challenging domain of molecular diagrams, exhibiting large variations in geometry and drawing styles. In this work, we propose an image-to-graph model \modelname, which enables effective transfer from synthetic to real-world data without requiring manual graph annotations in target domains. \modelname is an integrated pipeline that starts with training a base model on synthetic data, and then refines model representations through unsupervised domain adaptation and self-training. Our key insight is that bond features are domain-invariant in nature; they encode structural relationships between atoms that are independent of visual variations across domains. Thus, during domain adaptation, we align bond-level feature distributions via class-conditional Maximum Mean Discrepancy (MMD) to enforce cross-domain consistency. We further design a comprehensive data augmentation strategy to enhance the robustness of the base model, facilitating stable self-training on unlabeled target samples. We demonstrate our approach on hand-drawn molecules, achieving 82.6% accuracy—a 10.7-point improvement over the best prior method—while maintaining state-of-the-art performance on four benchmarks comprising molecular images from scientific literature and patent documents. This establishes a practical pipeline for molecular recognition that generalizes effectively across diverse real-world domains. Scientific contribution We propose AdaptMol, an image-to-graph model that predicts molecular structures as graphs of atoms and bonds, achieving effective transfer from synthetic to real-world molecular images without requiring target domain graph annotations. We combine class-conditional Maximum Mean Discrepancy to align bond features across domains with comprehensive data augmentation to increase training data variation, jointly improving base model accuracy sufficiently for self-training and addressing the critical failure mode of prior approaches that begin with insufficient accuracy. We further introduce dual position representation that supervises atom positions through both discrete coordinate tokens and continuous spatial heatmaps to reduce false positives in atom localization.

Article activity feed