Reusability report: A unified pre-trained deep learning framework for cross-task reaction performance prediction and synthesis planning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deep learning has substantially advanced reaction-yield prediction and synthesis-planning methodologies, yet achieving a unified architecture capable of transferring across these tasks remains a central challenge in chemical machine learning. RXNGraphormer introduces such a framework by combining a pretrained graph–transformer encoder with a delta-molecular reaction representation designed to support cross-task generalization. In this reusability report, we independently assess the reproducibility and practical applicability of RXNGraphormer using the released implementation, pretrained checkpoint and benchmark datasets. All major regression and sequence-generation results reported in the original study were consistently reproduced, including the relative difficulty patterns in out-of-sample evaluations, demonstrating the stability and transparency of the published workflow. To evaluate reusability, we examined the model’s transfer to multiple high-throughput datasets generated under standardized experimental conditions. In these settings, the pretrained encoder adapted efficiently and delivered strong predictive performance with minimal fine-tuning. When applied to a heterogeneous literature-derived benchmark, performance decreased, reflecting the inherent variability and structural noise characteristic of uncurated reaction corpora. Overall, our findings indicate that RXNGraphormer constitutes a reproducible and practically reusable chemical foundation model, capable of supporting both reaction-performance prediction and synthesis-planning tasks across diverse settings. These results further highlight the importance of harmonized reaction representations, curated experimental data and domain-specific refinement. Looking forward, continued progress in large-scale pretraining, interpretable reaction embeddings and standardized reaction corpora will be essential for extending the reach of unified chemical models to broader and more complex reaction spaces.