结合注意力机制和编码器—解码器架构的化学结构识别方法

曾水玲; 李昭贤; 张嘉雄; 丁龙飞; 赵才荣

doi:10.11834/jig.230367

图像分析和识别 | 浏览量 : 0 下载量: 14 CSCD: 0

PDF
导出
分享
收藏
专辑

结合注意力机制和编码器—解码器架构的化学结构识别方法
Chemical structure recognition method based on attention mechanism and encoder-decoder architecture
2024年29卷第7期页码：1960-1969
纸质出版日期： 2024-07-16 ，
DOI： 10.11834/jig.230367
稿件说明：

移动端阅览

曾水玲，李昭贤，张嘉雄，丁龙飞，赵才荣. 2024. 结合注意力机制和编码器—解码器架构的化学结构识别方法. 中国图象图形学报， 29(07):1960-1969

Zeng Shuiling， Li Zhaoxian， Zhang Jiaxiong， Ding Longfei， Zhao Cairong. 2024. Chemical structure recognition method based on attention mechanism and encoder-decoder architecture. Journal of Image and Graphics， 29(07):1960-1969
曾水玲，李昭贤，张嘉雄，丁龙飞，赵才荣. 2024. 结合注意力机制和编码器—解码器架构的化学结构识别方法. 中国图象图形学报， 29(07):1960-1969 DOI： 10.11834/jig.230367.

Zeng Shuiling， Li Zhaoxian， Zhang Jiaxiong， Ding Longfei， Zhao Cairong. 2024. Chemical structure recognition method based on attention mechanism and encoder-decoder architecture. Journal of Image and Graphics， 29(07):1960-1969 DOI： 10.11834/jig.230367.

摘要

目的

化学结构识别是化学和计算机视觉领域的一个重要问题，传统光学化学结构识别技术在复杂化学结构识别任务中易发生信息丢失或误识别的现象，同时又因为化学物质的结构多样性常导致其无法解析，识别效果不佳。而基于深度学习的模型通常具有网络结构复杂度高、上下文信息易丢失和识别率低的问题。为此，提出一种结合注意力机制和编码器—解码器架构的化学结构识别方法。

方法

首先，使用改进的ResNet50（residual network）作为特征提取器抓取表征信息；其次，使用BLSTM（bi-directional long-short term memory）作为行编码器为ResNet50提取的表征信息加强空间信息；最后，使用去填充模块和基于覆盖注意力机制的LSTM（long short-term memory）网络作为模型解码器，对化学结构图像进行解码，将编码结果解码为SMILES（simplified molecular input line entry system）序列。

结果

在Indigo、ChemDraw、CLEF（Conference and Labs of the Evaluation Forum）、JPO（Japanese Patent Office）、UOB（University of Birmingham）、USPTO（United States Patent and Trademark Office）、Staker、ACS（American Chemistry Society）、CASIA-CSDB（Institute of Automation of Chinese Academy of Sciences—Chemical Structure Database）和Mini CASIA-CSDB数据集上，所提方法识别准确率分别为71.1%、70.21%、45.8%、30.3%、53.02%、58.21%、43.39%、46.3%、84.42%和85.78%，高于SwimOCSR、Image2Mol和ChemPix模型得分。

结论

与其他模型相比，本文方法通过少量训练集能够获得较高的识别准确率。

Abstract

Objective

Emerging digital and intelligent technologies have ushered in a new era of text recognition and interpretation. These advancements have greatly facilitated the ability to recognize and comprehend textual content originating from a variety of sources， including paper documents， photographs， and diverse contexts. One particularly noteworthy application of these technologies is in the field of chemical structure image recognition， where portable devices such as mobile phones and tablet PCs have become indispensable tools， playing a vital role in converting hand-drawn chemical structure images into machine-readable formats. They translate these intricate structures into human-readable representations， simultaneously highlighting relevant physical properties， chemical characteristics， and elemental compositions. These innovative models for chemical structure recognition serve as a bridge between hand-drawn representations and machine-interpretable data. This capability has made it feasible to electronically document complex scenarios， such as those encountered in classrooms and academic meetings. Notably， ongoing research has focused on developing encoder-decoder-based methods for mathematical expression recognition， which have shown promising results. However， the pivotal role of the quality and quantity of training data in shaping the performance of deep neural networks needs to be acknowledged. The current challenge lies in the absence of a comprehensive， high-quality dataset that is specifically tailored for chemical structure image recognition. This data deficiency poses a significant hurdle， impacting the optimization， generalization， and robustness of the models. Furthermore， the computational demands of real-time offline recognition on mobile devices remain a practical limitation.

Method

To address the aforementioned issues， we developed a chemical structure recognition model based on an encoder-decoder architecture. This model is capable of generating corresponding character representations， such as SMILES， from given chemical structure images. In the context of image-related tasks， the effectiveness of the encoder in extracting features from images and the decoder’s ability to decode feature sequences directly impact the performance of the recognition task. The encoder is designed to efficiently model the input images， while the decoder should be able to comprehensively extract various features from the images， obtain accurate feature distributions， and encode them to establish feature maps. Therefore， we designed a feature extraction network based on ResNet-50 in the encoder， which adequately captures the two-dimensional structural information of chemical structure images. Furthermore， to enhance the effectiveness of information in feature maps， we introduced a row encoder based on bi-directional long-short term memory（BLSTM）， reinforcing the spatial feature distribution weight through row encoding of feature maps. The decoder should be capable of accurately decoding the sequence information from the encoder’s output. To align input sequence information with output characters and improve the model’s memory and decoding capabilities for long sequences， we incorporated a coverage-attention mechanism into the decoder. Ultimately， the model can generate corresponding representations from input chemical structure images.

Result

For an objective evaluation of the performance of our model in this study， we conducted training on the Image2Mol and ChemPix models using the CASIA-CSDB （Institute of Automation， Chinese Academy of Sciences Chemical Structure Database） dataset. Subsequently， we performed performance testing on a range of datasets， including Indigo， ChemDraw， Conference and Labs of the Evaluation Forum（CLEF）， Japanese Patent Office（JPO）， University of Birmingham（UOB）， United States Patent and Trademark Office（USPTO）， Stacker， American Chemistry Society（ACS）， CASIA-CSDB， and Mini CASIA-CSDB. Results demonstrated that our model achieved higher recognition accuracy when trained on small datasets and exhibited robust generalization capabilities. Furthermore， we compared our model with untrainable models such as SwimOCSR， MSE-DUDL， ChemGrapher， Image2Graph， and MolScribe. The comparison revealed that our model also exhibited commendable performance when compared with models trained on millions of images.

Conclusion

A chemical structure recognition method is introduced based on an encoder-decoder architecture. The method allows for the generation of SMILES strings from given chemical structure images. Experimental results demonstrate that the model achieves higher recognition accuracy when trained on small datasets and exhibits strong generalization capabilities.

关键词

化学结构识别编码器—解码器注意力机制残差网络SMILES（simplified molecular input line entry system）

Keywords

chemical structure recognitionencoder-decoderattention mechanismresidual networkSMILES（simplified molecular input line entry system）

references

Beard E J and Cole J M. 2020. ChemSchematicResolver： a toolkit to decode 2D chemical diagrams with labels and R-groups into annotated chemical named entities. Journal of Chemical Information and Modeling， 60（4）： 2059-2072 ［DOI： 10.1021/acs.jcim.0c00042http://dx.doi.org/10.1021/acs.jcim.0c00042］

Bukhari S S， Iftikhar Z and Dengel A. 2019. Chemical structure recognition （CSR） system： automatic analysis of 2D chemical structures in document images//Proceedings of 2019 International Conference on Document Analysis and Recognition （ICDAR）. Sydney， Australia： IEEE： 1262-1267 ［DOI： 10.1109/icdar.2019.00-41http://dx.doi.org/10.1109/icdar.2019.00-41］

Clevert D A， Le T， Winter R and Montanari F. 2021. Img2Mol—accurate SMILES recognition from molecular graphical depictions. Chemical Science， 12（42）： 14174-14181 ［DOI： 10.1039/D1SC01839Fhttp://dx.doi.org/10.1039/D1SC01839F］

Deng Y T， Kanervisto A， Ling J and Rush A M. 2017. Image-to-markup generation with coarse-to-fine attention//Proceedings of the 34th International Conference on Machine Learning. Sydney， Australia： JMLR.org： 980-989

Ding L F， Zhao M B， Yin F， Zeng S L and Liu C L. 2022. A large-scale database for chemical structure recognition and preliminary evaluation//Proceedings of the 26th International Conference on Pattern Recognition （ICPR）. Montreal， Canada： IEEE： 1464-1470 ［DOI： 10.1109/icpr56361.2022.9956654http://dx.doi.org/10.1109/icpr56361.2022.9956654］

Domingos P and Richardson M. 2007. Markov logic： a unifying framework for statistical relational learning//Getoor L and Taskar B， eds. Introduction to Statistical Relational Learning. Cambridge， USA： MIT Press： 339-371 ［DOI： 10.7551/mitpress/7432.003.0014http://dx.doi.org/10.7551/mitpress/7432.003.0014］

Filippov I V and Nicklaus M C. 2009. Optical structure recognition software to recover chemical information： OSRA， an open source solution. Journal of Chemical Information and Modeling， 49（3）： 740-743 ［DOI： 10.1021/ci800067rhttp://dx.doi.org/10.1021/ci800067r］

Frasconi P， Gabbrielli F， Lippi M and Marinai S. 2014. Markov logic networks for optical chemical structure recognition. Journal of Chemical Information and Modeling， 54（8）： 2380-2390 ［DOI： 10.1021/ci5002197http://dx.doi.org/10.1021/ci5002197］

Hamdi Y， Boubaker H， Rabhi B， Qahtani A M， Alharithi F S， Almutiry O， Dhahri H and Alimi A M. 2022. Deep learned BLSTM for online handwriting modeling simulating the Beta-Elliptic approach. Engineering Science and Technology， an International Journal， 35： #101215 ［DOI： 10.1016/j.jestch.2022.101215http://dx.doi.org/10.1016/j.jestch.2022.101215］

Hochreiter S and Schmidhuber J. 1997. Long short-term memory. Neural Computation， 9（8）： 1735-1780 ［DOI： 10.1162/neco.1997.9.8.1735http://dx.doi.org/10.1162/neco.1997.9.8.1735］

Ibison P， Jacquot M， Kam F， Neville A G， Simpson R W， Tonnelier C， Venczel T and Johnson A P. 1993. Chemical literature data extraction： the CLiDE project. Journal of Chemical Information and Computer Sciences， 33（3）： 338-344 ［DOI： 10.1021/ci00013.a010http://dx.doi.org/10.1021/ci00013.a010］

Liu C L， Jin L W， Bai X， Li X H and Yin F. 2023. Frontiers of intelligent document analysis and recognition： review and prospects. Journal of Image and Graphics， 28（8）：2223-2252

刘成林，金连文，白翔，李晓辉，殷飞. 2023. 文档智能分析与识别前沿：回顾与展望.中国图象图形学报， 28（8）： 2223-2252

McDaniel J R and Balmuth J R. 1992. Kekule： OCR-optical chemical （structure） recognition. Journal of Chemical Information and Computer Sciences， 32（4）： 373-378 ［DOI： 10.1021/ci00008a018http://dx.doi.org/10.1021/ci00008a018］

Oldenhof M， Arany A， Moreau Y and Simm J. 2020. ChemGrapher： optical graph recognition of chemical compounds by deep learning. Journal of Chemical Information and Modeling， 60（10）： 4506-4517 ［DOI： 10.1021/acs.jcim.0c00459http://dx.doi.org/10.1021/acs.jcim.0c00459］

Peryea T， Katzel D， Zhao T， Southall N and Nguyen D T. 2019. MOLVEC： open source library for chemical structure recognition//Abstracts of Papers of the American Chemical Society. San Diego， USA： ACS： #258

Qian Y J， Guo J， Tu Z K， Li Z N， Coley C W and Barzilay R. 2023. MolScribe： robust molecular structure recognition with image-to-graph generation. Journal of Chemical Information and Modeling， 63（7）： 1925-1934 ［DOI： 10.1021/acs.jcim.2c01480http://dx.doi.org/10.1021/acs.jcim.2c01480］

Rajan K， Zielesny A and Steinbeck C. 2020. DECIMER： towards deep learning for chemical image recognition. Journal of Cheminformatics， 12（1）： #65 ［DOI： 10.1186/s13321-020-00469-whttp://dx.doi.org/10.1186/s13321-020-00469-w］

Rajan K， Zielesny A， Steinbeck C. 2021. DECIMER 1.0： deep learning for chemical image recognition using transformers. Journal of Cheminformatics， 13： 1-16 ［DOI： 10.1186/s13321-021-00538-8http://dx.doi.org/10.1186/s13321-021-00538-8］

Smolov V， Zentsev F and， Rybalkin M. 2011. Imago： open-source toolkit for 2D chemical structure image recognition//Proceedings of the 20th Text REtrieval Conference， Gaithersburg， USA： NIST Special Publication： 296-500

Staker J， Marshall K， Abel R and McQuaw C M. 2019. Molecular structure extraction from documents using deep learning. Journal of Chemical Information and Modeling， 59（3）： 1017-1029 ［DOI： 10.1021/acs.jcim.8b00669http://dx.doi.org/10.1021/acs.jcim.8b00669］

Tharatipyakul A， Numnark S， Wichadakul D and Ingsriswang S. 2012. ChemEx： information extraction system for chemical data curation. BMC Bioinformatics， 13（Suppl 17）： #S9 ［DOI： 10.1186/1471-2105-13-S17-S9http://dx.doi.org/10.1186/1471-2105-13-S17-S9］

Weir H， Thompson K， Woodward A， Braun A and Martínez T J. 2021. ChemPix： automated recognition of hand-drawn hydrocarbon structures using deep learning. Chemical Science， 12（31）： 10622-10633 ［DOI： 10.1039/D1SC02957Fhttp://dx.doi.org/10.1039/D1SC02957F］

Xu Z P， Li J H， Yang Z P， Li S L and Li H L. 2022. SwinOCSR： end-to-end optical chemical structure recognition using a Swin Transformer. Journal of Cheminformatics， 14（1）： #41 ［DOI： 10.1186/s13321-022-00624-5http://dx.doi.org/10.1186/s13321-022-00624-5］

Yang C， Du J， Xue M B and Zhang J S. 2023. An encoder-decoder based generation model for online handwritten mathematical expressions.Journal of Image and Graphics， 28（8）： 2356-2369

杨晨，杜俊，薛莫白，张建树. 2023. 用于在线手写公式合成的编解码网络.中国图象图形学报， 28（8）： 2356-2369

Zhang H W， Wang M， Hong R C and Chua T S. 2016. Play and rewind： optimizing binary representations of videos by self-supervised temporal hashing//Proceedings of the 24th ACM International Conference on Multimedia. Amsterdam， the Netherlands：Association for Computing Machinery： 781-790

文章被引用时，请邮件提醒。

提交

多尺度渐进式残差网络的图像去雨

改进U-Net型网络的遥感图像道路提取

红外与可见光图像特征动态选择的目标检测网络

注意力引导局部特征联合学习的人脸表情识别

基于多视图自适应3D骨架网络的工业装箱动作识别