Octahedral-Motif-Guided Design of Optoelectronic Semiconductors via Interpretable Machine Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Lead-based halide perovskites represent a prototypical family of high-performance semiconductors, whose remarkable optoelectronic properties stem from their corner-sharing metal-halide octahedra. This connection underscores the promise of systematically extracting the physical rules encoded in octahedral motifs to steer the discovery of new optoelectronic materials. Here we develop an octahedral-motif-centric, data-driven framework that couples interpretable machine learning with high-throughput first-principles calculations to accelerate the discovery of optoelectronic semiconductors. We construct motif-based descriptors that encode chemically resolved elemental identities, local coordination environments, and short-range structural contrasts to train a gradient boosting regression tree model for thermodynamic stability evaluation, achieving a low mean absolute error of 83 meV per atom on datasets comprising ~ 10 4 materials. We further elucidate the key motif-level factors and apply symbolic regression to extract compact analytical expressions governing material stability. Leveraging the developed model to accelerate materials discovery, we identify 19 thermodynamically stable semiconductors with favorable optoelectronic properties from a previously unexplored octahedra-containing chemical space comprising thousands of candidates. Among them Ca 2 GaCoO 5 was successfully synthesized and experimentally verified to exhibit a direct band gap and a strong visible-light photoresponse. These results validate the effectiveness of the interpretable, motif-guided machine learning framework for octahedra-containing semiconductors and demonstrate its potential for extension to other motif-based materials families.