MCC-GCN: An Interpretable Graph Learning Framework for Multicomponent Crystal Classification and Discovery
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multicomponent crystals (MCCs), including cocrystals, salts, and solvates, are of interest in organic molecule design, yet their discovery remains largely empirical and inefficient. Existing computational approaches are limited to binary predictions and lack interpretability, restricting their ability to guide discovery and provide mechanistic insight. Here, we present MCC-GCN, an interpretable graph-based learning framework that reformulates MCC prediction as a multi-class problem and enables unified prediction and mechanistic interpretation of MCC formation. Trained on over 34,000 entries and refined through strategic fine-tuning, MCC-GCN demonstrates robust generalization to chemical domains beyond the Cambridge Structural Database. Benchmarking against five classical methods and three machine-learning baselines shows that MCC-GCN outperforms existing approaches in predictive accuracy and interpretability. Validation through an experimental campaign involving 64 prospective cases successfully identified 47 new MCCs. MCC-GCN provides a scalable, generalizable methodological framework for studying MCCs and offers a foundation for data-driven discovery in pharmacy, chemistry, and materials science.