A Two-Stage Fragment-Connection Framework for Infrared-Spectrum-Driven Molecular Structure Elucidation

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Structure elucidation of molecules from infrared spectra is difficult because the number of possible solutions is very large, direct end-to-end generation is hard to learn, and training is often unstable. In this work, we present a two-stage approach that encodes molecules as Breaking of Retrosynthetically Interesting Chemical Substructures (BRICS) fragments, learns fragment representations from infrared spectra, and then frames the structure elucidation task as a spectrum-based fragment-connection prediction problem between fragment pairs. A complexity-controlled evaluation protocol is proposed to minimize distortion caused by simple molecules with small numbers of fragment nodes. Oracle-fragment and strict-vocabulary analyses further separate fragment-coverage errors from connection errors. The proposed approach successfully produces valid structures and competitive hit rates. Results under the strict-vocabulary protocol show that fragment coverage remains a significant bottleneck, while results in the oracle-fragment setting show improved hit rates but not reduced reconstruction errors, confirming the existence of a dual bottleneck. Finally, the plateau observed between the 5.8k and 63.0k datasets suggests that the primary source of error shifts from fragment coverage and local prediction to higher-order discrimination and ranking for molecular connection generation as data size and molecular complexity increase. In summary, improving structure elucidation guided by infrared spectra requires not only state-of-the-art neural models but also suitable structural abstractions, controlled decoding, and evaluation measures that better reveal reconstruction bottlenecks.

Article activity feed