离线手写化学式识别的端到端可训练系统

Xiaoxue Liu, Ting Zhang, Xinguo Yu
{"title":"离线手写化学式识别的端到端可训练系统","authors":"Xiaoxue Liu, Ting Zhang, Xinguo Yu","doi":"10.1109/ICDAR.2019.00098","DOIUrl":null,"url":null,"abstract":"In this paper, we propose an end-to-end trainable system for recognizing handwritten chemical formulae. This system recognize once a time a chemical formula, instead of one chemical symbol or a whole chemical equation, which is in line with people's writing habits, at the same time could help to develop methods for the complicated chemical equations recognition. The proposed system adopts the CNN+RNN+CTC framework, which is one of state of the art methods in imagebased sequence labelling tasks. We extend the capability of the CNN+RNN+CTC framework to interpret 2D spatial relationships (such as 'subscript' existing in chemical formula) by introducing additional labels to represent them. The system evaluated on a self-collected data set of 12,224 samples, achieves the recognition rate of 94.98% at the chemical formula level.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An End-to-End Trainable System for Offline Handwritten Chemical Formulae Recognition\",\"authors\":\"Xiaoxue Liu, Ting Zhang, Xinguo Yu\",\"doi\":\"10.1109/ICDAR.2019.00098\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose an end-to-end trainable system for recognizing handwritten chemical formulae. This system recognize once a time a chemical formula, instead of one chemical symbol or a whole chemical equation, which is in line with people's writing habits, at the same time could help to develop methods for the complicated chemical equations recognition. The proposed system adopts the CNN+RNN+CTC framework, which is one of state of the art methods in imagebased sequence labelling tasks. We extend the capability of the CNN+RNN+CTC framework to interpret 2D spatial relationships (such as 'subscript' existing in chemical formula) by introducing additional labels to represent them. The system evaluated on a self-collected data set of 12,224 samples, achieves the recognition rate of 94.98% at the chemical formula level.\",\"PeriodicalId\":325437,\"journal\":{\"name\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2019.00098\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在本文中,我们提出了一个端到端可训练的系统来识别手写的化学式。该系统一次识别一个化学式,而不是一个化学符号或整个化学方程,既符合人们的书写习惯,同时也有助于开发复杂化学方程识别的方法。该系统采用CNN+RNN+CTC框架,这是基于图像的序列标记任务中最先进的方法之一。我们扩展了CNN+RNN+CTC框架的能力,通过引入额外的标签来表示二维空间关系(如化学式中存在的“下标”)。该系统在自行采集的12224个样本数据集上进行了评估,在化学式水平上的识别率达到了94.98%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
An End-to-End Trainable System for Offline Handwritten Chemical Formulae Recognition
In this paper, we propose an end-to-end trainable system for recognizing handwritten chemical formulae. This system recognize once a time a chemical formula, instead of one chemical symbol or a whole chemical equation, which is in line with people's writing habits, at the same time could help to develop methods for the complicated chemical equations recognition. The proposed system adopts the CNN+RNN+CTC framework, which is one of state of the art methods in imagebased sequence labelling tasks. We extend the capability of the CNN+RNN+CTC framework to interpret 2D spatial relationships (such as 'subscript' existing in chemical formula) by introducing additional labels to represent them. The system evaluated on a self-collected data set of 12,224 samples, achieves the recognition rate of 94.98% at the chemical formula level.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Article Segmentation in Digitised Newspapers with a 2D Markov Model ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images DICE: Deep Intelligent Contextual Embedding for Twitter Sentiment Analysis Blind Source Separation Based Framework for Multispectral Document Images Binarization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1