Integrating Coverage-Attention Mechanisms for Improved Chemical Image Recognition

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Emerging digital and intelligent technologies are revolutionizing text recognition and interpretation, markedly enhancing the ability to process textual content from diverse sources such as paper documents and photographs. Notably, these advancements have significantly propelled the field of chemical structure image recognition, where portable devices are increasingly crucial in transforming hand-drawn chemical structures into machine-readable formats and translating these complex structures into human-understandable representations while emphasizing their physical and chemical properties. This study introduces an innovative model for chemical structure recognition that effectively bridges the gap between hand-drawn representations and machine-interpretable data, facilitating the electronic documentation of intricate scenarios commonly encountered in educational and professional settings. The research highlights the critical role of high-quality, ample training data in optimizing the performance of deep neural networks, addressing current challenges such as data scarcity which hampers model robustness and generalization.\textbf{Method:} This paper presents a chemical structure recognition model employing an encoder-decoder architecture designed to enhance feature extraction and sequence decoding capabilities. The model utilizes a feature extraction network based on ResNet-50, integrated with a bi-directional long-short-term memory (BLSTM) row encoder to reinforce the spatial feature distribution within feature maps. Additionally, a coverage-attention mechanism is incorporated into the decoder to align input sequence information with output characters, optimizing memory and decoding capacities for extended sequences, thereby enabling precise generation of character representations such as SMILES from given chemical structure images.\textbf{Results:} The efficacy of the proposed model was validated through extensive training and testing on the Image2Mol and ChemPix models using multiple datasets, including the CASIA-CSDB. The model demonstrated superior recognition accuracy and robust generalization capabilities across diverse datasets, outperforming several benchmark models, including those trained on significantly larger image sets.\textbf{Conclusion:} The developed method leverages an encoder-decoder framework to successfully generate SMILES strings from chemical structure images, showcasing enhanced accuracy and generalization across varied datasets. The results underscore the potential of the proposed model in advancing chemical structure recognition, paving the way for its application in real-world scenarios where rapid and reliable interpretation of chemical images is paramount.

Article activity feed