Reusability report: A unified pre-trained deep learning framework for cross-task reaction performance prediction and synthesis planning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deep learning has substantially advanced reaction-yield prediction and synthesis-planning methodologies, yet achieving a unified architecture capable of transferring across these tasks remains a central challenge in chemical machine learning. RXNGraphormer introduces such a framework by combining a pretrained graph–transformer encoder with a delta-molecular reaction representation designed to support cross-task generalization. In this reusability report, we independently assess the reproducibility and practical applicability of RXNGraphormer using the released implementation, pretrained checkpoint and benchmark datasets. All major regression and sequence-generation results reported in the original study were consistently reproduced, including the relative difficulty patterns in out-of-sample evaluations, demonstrating the stability and transparency of the published workflow. To evaluate reusability, we examined the model’s transfer to multiple high-throughput datasets generated under standardized experimental conditions. In these settings, the pretrained encoder adapted efficiently and delivered strong predictive performance with minimal fine-tuning. When applied to a heterogeneous literature-derived benchmark, performance decreased, reflecting the inherent variability and structural noise characteristic of uncurated reaction corpora. Overall, our findings indicate that RXNGraphormer constitutes a reproducible and practically reusable chemical foundation model, capable of supporting both reaction-performance prediction and synthesis-planning tasks across diverse settings. These results further highlight the importance of harmonized reaction representations, curated experimental data and domain-specific refinement. Looking forward, continued progress in large-scale pretraining, interpretable reaction embeddings and standardized reaction corpora will be essential for extending the reach of unified chemical models to broader and more complex reaction spaces.

Article activity feed