首页 > 最新文献

2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献

英文 中文
Table Structure Extraction with Bi-Directional Gated Recurrent Unit Networks 基于双向门控循环单元网络的表结构提取
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00220
Saqib Ali Khan, Syed Khalid, M. Shahzad, F. Shafait
Tables present summarized and structured information to the reader, which makes table's structure extraction an important part of document understanding applications. However, table structure identification is a hard problem not only because of the large variation in the table layouts and styles, but also owing to the variations in the page layouts and the noise contamination levels. A lot of research has been done to identify table structure, most of which is based on applying heuristics with the aid of optical character recognition (OCR) to hand pick layout features of the tables. These methods fail to generalize well because of the variations in the table layouts and the errors generated by OCR. In this paper, we have proposed a robust deep learning based approach to extract rows and columns from a detected table in document images with a high precision. In the proposed solution, the table images are first pre-processed and then fed to a bi-directional Recurrent Neural Network with Gated Recurrent Units (GRU) followed by a fully-connected layer with softmax activation. The network scans the images from top-to-bottom as well as left-to-right and classifies each input as either a row-separator or a column-separator. We have benchmarked our system on publicly available UNLV as well as ICDAR 2013 datasets on which it outperformed the state-of-theart table structure extraction systems by a significant margin.
表为读者提供了汇总的、结构化的信息,这使得表的结构提取成为文档理解应用的重要组成部分。然而,表格结构识别是一个难题,不仅因为表格布局和样式的变化很大,而且由于页面布局和噪声污染水平的变化。在表结构识别方面已经做了大量的研究,其中大部分是基于光学字符识别(OCR)的启发式方法来手工挑选表的布局特征。由于表布局的变化和OCR产生的误差,这些方法不能很好地泛化。在本文中,我们提出了一种基于深度学习的鲁棒方法,以高精度从文档图像中的检测表中提取行和列。在提出的解决方案中,首先对表图像进行预处理,然后将其馈送到具有门控循环单元(GRU)的双向循环神经网络,然后是具有softmax激活的全连接层。该网络从上到下以及从左到右扫描图像,并将每个输入分类为行分隔符或列分隔符。我们已经在公开可用的UNLV和ICDAR 2013数据集上对我们的系统进行了基准测试,在这些数据集上,我们的系统表现得比最先进的表结构提取系统要好得多。
{"title":"Table Structure Extraction with Bi-Directional Gated Recurrent Unit Networks","authors":"Saqib Ali Khan, Syed Khalid, M. Shahzad, F. Shafait","doi":"10.1109/ICDAR.2019.00220","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00220","url":null,"abstract":"Tables present summarized and structured information to the reader, which makes table's structure extraction an important part of document understanding applications. However, table structure identification is a hard problem not only because of the large variation in the table layouts and styles, but also owing to the variations in the page layouts and the noise contamination levels. A lot of research has been done to identify table structure, most of which is based on applying heuristics with the aid of optical character recognition (OCR) to hand pick layout features of the tables. These methods fail to generalize well because of the variations in the table layouts and the errors generated by OCR. In this paper, we have proposed a robust deep learning based approach to extract rows and columns from a detected table in document images with a high precision. In the proposed solution, the table images are first pre-processed and then fed to a bi-directional Recurrent Neural Network with Gated Recurrent Units (GRU) followed by a fully-connected layer with softmax activation. The network scans the images from top-to-bottom as well as left-to-right and classifies each input as either a row-separator or a column-separator. We have benchmarked our system on publicly available UNLV as well as ICDAR 2013 datasets on which it outperformed the state-of-theart table structure extraction systems by a significant margin.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115176823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Offline Signature Verification using Structural Dynamic Time Warping 使用结构动态时间翘曲的离线签名验证
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00181
Michael Stauffer, Paul Maergner, Andreas Fischer, R. Ingold, Kaspar Riesen
In recent years, different approaches for handwriting recognition that are based on graph representations have been proposed (e.g. graph-based keyword spotting or signature verification). This trend is mostly due to the availability of novel fast graph matching algorithms, as well as the inherent flexibility and expressivity of graph data structures when compared to vectorial representations. That is, graphs are able to directly adapt their size and structure to the size and complexity of the respective handwritten entities. However, the vast majority of the proposed approaches match the graphs from a global perspective only. In the present paper, we propose to match the underlying graphs from different local perspectives and combine the resulting assignments by means of Dynamic Time Warping. Moreover, we show that the proposed approach can be readily combined with global matchings. In an experimental evaluation, we employ the novel method in a signature verification scenario on two widely used benchmark datasets. On both datasets, we empirically confirm that the proposed approach outperforms state-of-the-art methods with respect to both accuracy and runtime.
近年来,人们提出了不同的基于图表示的手写识别方法(如基于图的关键字识别或签名验证)。这种趋势主要是由于新的快速图匹配算法的可用性,以及与向量表示相比,图数据结构固有的灵活性和表达性。也就是说,图形能够直接调整其大小和结构以适应各自手写实体的大小和复杂性。然而,绝大多数建议的方法仅从全局角度匹配图。在本文中,我们提出从不同的局部角度匹配底层图,并利用动态时间翘曲的方法组合得到的赋值。此外,我们还证明了所提出的方法可以很容易地与全局匹配相结合。在实验评估中,我们在两个广泛使用的基准数据集的签名验证场景中使用了该新方法。在这两个数据集上,我们从经验上证实,所提出的方法在准确性和运行时间方面都优于最先进的方法。
{"title":"Offline Signature Verification using Structural Dynamic Time Warping","authors":"Michael Stauffer, Paul Maergner, Andreas Fischer, R. Ingold, Kaspar Riesen","doi":"10.1109/ICDAR.2019.00181","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00181","url":null,"abstract":"In recent years, different approaches for handwriting recognition that are based on graph representations have been proposed (e.g. graph-based keyword spotting or signature verification). This trend is mostly due to the availability of novel fast graph matching algorithms, as well as the inherent flexibility and expressivity of graph data structures when compared to vectorial representations. That is, graphs are able to directly adapt their size and structure to the size and complexity of the respective handwritten entities. However, the vast majority of the proposed approaches match the graphs from a global perspective only. In the present paper, we propose to match the underlying graphs from different local perspectives and combine the resulting assignments by means of Dynamic Time Warping. Moreover, we show that the proposed approach can be readily combined with global matchings. In an experimental evaluation, we employ the novel method in a signature verification scenario on two widely used benchmark datasets. On both datasets, we empirically confirm that the proposed approach outperforms state-of-the-art methods with respect to both accuracy and runtime.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"31 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115709617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An End-to-End Trainable System for Offline Handwritten Chemical Formulae Recognition 离线手写化学式识别的端到端可训练系统
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00098
Xiaoxue Liu, Ting Zhang, Xinguo Yu
In this paper, we propose an end-to-end trainable system for recognizing handwritten chemical formulae. This system recognize once a time a chemical formula, instead of one chemical symbol or a whole chemical equation, which is in line with people's writing habits, at the same time could help to develop methods for the complicated chemical equations recognition. The proposed system adopts the CNN+RNN+CTC framework, which is one of state of the art methods in imagebased sequence labelling tasks. We extend the capability of the CNN+RNN+CTC framework to interpret 2D spatial relationships (such as 'subscript' existing in chemical formula) by introducing additional labels to represent them. The system evaluated on a self-collected data set of 12,224 samples, achieves the recognition rate of 94.98% at the chemical formula level.
在本文中,我们提出了一个端到端可训练的系统来识别手写的化学式。该系统一次识别一个化学式,而不是一个化学符号或整个化学方程,既符合人们的书写习惯,同时也有助于开发复杂化学方程识别的方法。该系统采用CNN+RNN+CTC框架,这是基于图像的序列标记任务中最先进的方法之一。我们扩展了CNN+RNN+CTC框架的能力,通过引入额外的标签来表示二维空间关系(如化学式中存在的“下标”)。该系统在自行采集的12224个样本数据集上进行了评估,在化学式水平上的识别率达到了94.98%。
{"title":"An End-to-End Trainable System for Offline Handwritten Chemical Formulae Recognition","authors":"Xiaoxue Liu, Ting Zhang, Xinguo Yu","doi":"10.1109/ICDAR.2019.00098","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00098","url":null,"abstract":"In this paper, we propose an end-to-end trainable system for recognizing handwritten chemical formulae. This system recognize once a time a chemical formula, instead of one chemical symbol or a whole chemical equation, which is in line with people's writing habits, at the same time could help to develop methods for the complicated chemical equations recognition. The proposed system adopts the CNN+RNN+CTC framework, which is one of state of the art methods in imagebased sequence labelling tasks. We extend the capability of the CNN+RNN+CTC framework to interpret 2D spatial relationships (such as 'subscript' existing in chemical formula) by introducing additional labels to represent them. The system evaluated on a self-collected data set of 12,224 samples, achieves the recognition rate of 94.98% at the chemical formula level.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127396884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Text Line Segmentation in Historical Document Images Using an Adaptive U-Net Architecture 基于自适应U-Net结构的历史文档图像文本线分割
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00066
Olfa Mechi, Maroua Mehri, R. Ingold, N. Amara
On most document image transcription, indexing and retrieval systems, text line segmentation remains one of the most important preliminary task. Hence, the research community working in document image analysis is particularly interested in providing reliable text line segmentation methods. Recently, an increasing interest in using deep learning-based methods has been noted for solving various sub-fields and tasks related to the issues surrounding document image analysis. Thanks to the computer hardware and software evolution, several methods based on using deep architectures continue to outperform the pattern recognition issues and particularly those related to historical document image analysis. Thus, in this paper we present a novel deep learning-based method for text line segmentation of historical documents. The proposed method is based on using an adaptive U-Net architecture. Qualitative and numerical experiments are given using a large number of historical document images collected from the Tunisian national archives and different recent benchmarking datasets provided in the context of ICDAR and ICFHR competitions. Moreover, the results achieved are compared with those obtained using the state-of-the-art methods.
在大多数文档图像转录、索引和检索系统中,文本行分割仍然是最重要的初步任务之一。因此,从事文档图像分析的研究团体对提供可靠的文本行分割方法特别感兴趣。最近,人们对使用基于深度学习的方法来解决与文档图像分析相关的各种子领域和任务越来越感兴趣。由于计算机硬件和软件的发展,一些基于使用深度体系结构的方法在模式识别问题上,特别是与历史文档图像分析相关的问题上,继续表现出色。因此,在本文中,我们提出了一种新的基于深度学习的历史文档文本行分割方法。该方法基于自适应U-Net体系结构。定性和数值实验使用从突尼斯国家档案馆收集的大量历史文件图像和在ICDAR和ICFHR竞赛背景下提供的不同的最新基准数据集。此外,所获得的结果与使用最先进的方法得到的结果进行了比较。
{"title":"Text Line Segmentation in Historical Document Images Using an Adaptive U-Net Architecture","authors":"Olfa Mechi, Maroua Mehri, R. Ingold, N. Amara","doi":"10.1109/ICDAR.2019.00066","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00066","url":null,"abstract":"On most document image transcription, indexing and retrieval systems, text line segmentation remains one of the most important preliminary task. Hence, the research community working in document image analysis is particularly interested in providing reliable text line segmentation methods. Recently, an increasing interest in using deep learning-based methods has been noted for solving various sub-fields and tasks related to the issues surrounding document image analysis. Thanks to the computer hardware and software evolution, several methods based on using deep architectures continue to outperform the pattern recognition issues and particularly those related to historical document image analysis. Thus, in this paper we present a novel deep learning-based method for text line segmentation of historical documents. The proposed method is based on using an adaptive U-Net architecture. Qualitative and numerical experiments are given using a large number of historical document images collected from the Tunisian national archives and different recent benchmarking datasets provided in the context of ICDAR and ICFHR competitions. Moreover, the results achieved are compared with those obtained using the state-of-the-art methods.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127074771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Multi-label Connectionist Temporal Classification 多标签联结时间分类
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00161
Curtis Wigington, Brian L. Price, Scott D. Cohen
The Connectionist Temporal Classification (CTC) loss function [1] enables end-to-end training of a neural network for sequence-to-sequence tasks without the need for prior alignments between the input and output. CTC is traditionally used for training sequential, single-label problems; each element in the sequence has only one class. In this work, we show that CTC is not suitable for multi-label tasks and we present a novel Multi-label Connectionist Temporal Classification (MCTC) loss function for multi-label, sequence-to-sequence classification. Multi-label classes can represent meaningful attributes of a single element; for example, in Optical Music Recognition (OMR), a music note can have separate duration and pitch attributes. Our approach achieves state-of-the-art results on Joint Handwritten Text Recognition and Name Entity Recognition, Asian Character Recognition, and OMR.
Connectionist Temporal Classification (CTC)损失函数[1]使神经网络能够端到端训练序列到序列的任务,而不需要在输入和输出之间进行事先对齐。CTC传统上用于训练顺序的单标签问题;序列中的每个元素只有一个类。在这项工作中,我们证明了CTC不适合多标签任务,并提出了一种新的多标签连接时间分类(MCTC)损失函数,用于多标签,序列到序列分类。多标签类可以表示单个元素的有意义的属性;例如,在光学音乐识别(OMR)中,一个音符可以有单独的持续时间和音高属性。我们的方法在联合手写文本识别和名称实体识别、亚洲字符识别和OMR方面取得了最先进的结果。
{"title":"Multi-label Connectionist Temporal Classification","authors":"Curtis Wigington, Brian L. Price, Scott D. Cohen","doi":"10.1109/ICDAR.2019.00161","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00161","url":null,"abstract":"The Connectionist Temporal Classification (CTC) loss function [1] enables end-to-end training of a neural network for sequence-to-sequence tasks without the need for prior alignments between the input and output. CTC is traditionally used for training sequential, single-label problems; each element in the sequence has only one class. In this work, we show that CTC is not suitable for multi-label tasks and we present a novel Multi-label Connectionist Temporal Classification (MCTC) loss function for multi-label, sequence-to-sequence classification. Multi-label classes can represent meaningful attributes of a single element; for example, in Optical Music Recognition (OMR), a music note can have separate duration and pitch attributes. Our approach achieves state-of-the-art results on Joint Handwritten Text Recognition and Name Entity Recognition, Asian Character Recognition, and OMR.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125972365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Chemical Structure Recognition (CSR) System: Automatic Analysis of 2D Chemical Structures in Document Images 化学结构识别(CSR)系统:文档图像中二维化学结构的自动分析
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00-41
S. S. Bukhari, Zaryab Iftikhar, A. Dengel
In this era of advanced technology and automation, information extraction has become a very common practice for the analysis of data. A technique known as Optical Character Recognition (OCR) is used for recognition of text. The purpose is to extract textual data for automatic information analysis or natural language processing of document images. However, in the field of cheminformatics where it is required to recognize 2D molecular structures as they are published in research journals or patent documents, OCR is not adequate for processing, as chemical compounds can be represented both in textual as well as in graphical format. The digital representation of an image based chemical structure allows not only patent analysis teams to provide customize insights but also cheminformatic research groups to enhance their molecular structure databases, which further can be used for querying structure as well as sub-structural patterns. Some tools have been made for extraction and processing of image-based molecular structures. Optical Structure Recognition Application (OSRA) being one of the tools that partially fulfill the task of recognizing chemical structural in document images into chemical formats (SMILES, SDF, or MOL). However, it has few problems such as poor character recognition, false structure extraction, and slow processing. In this paper, we have developed a prototype Chemical Structure Recognition (CSR) system using modern and advanced image processing open-source libraries, which allows us to extract structural information of a chemical structure embedded in the form of a digital raster image. The CSR system is capable of processing chemical information contained in chemical structure image and generates the SMILES or MOL representation. For performance evaluation, we have used two different data sets to measure the potential of the CSR system. It yields better results than OSRA that depict accurate recognition, fast extraction, and correctness of great significance.
在这个技术先进和自动化的时代,信息提取已经成为数据分析的一种非常普遍的做法。一种被称为光学字符识别(OCR)的技术被用于文本识别。目的是提取文本数据,用于文档图像的自动信息分析或自然语言处理。然而,在化学信息学领域,需要识别在研究期刊或专利文件中发表的二维分子结构,OCR并不适合处理,因为化合物既可以以文本形式表示,也可以以图形形式表示。基于图像的化学结构的数字表示不仅允许专利分析团队提供定制化的见解,还允许化学信息学研究小组增强他们的分子结构数据库,进一步用于查询结构和子结构模式。一些基于图像的分子结构的提取和处理工具已经问世。光学结构识别应用程序(OSRA)是部分完成将文档图像中的化学结构识别为化学格式(SMILES, SDF或MOL)的工具之一。但该方法存在字符识别差、错误结构提取、处理速度慢等问题。在本文中,我们利用现代和先进的图像处理开源库开发了一个原型化学结构识别(CSR)系统,该系统允许我们提取以数字光栅图像形式嵌入的化学结构的结构信息。CSR系统能够处理化学结构图像中包含的化学信息,并生成smile或MOL表示。在绩效评估方面,我们使用了两个不同的数据集来衡量企业社会责任体系的潜力。与OSRA相比,该方法具有识别准确、提取速度快、正确性高等优点。
{"title":"Chemical Structure Recognition (CSR) System: Automatic Analysis of 2D Chemical Structures in Document Images","authors":"S. S. Bukhari, Zaryab Iftikhar, A. Dengel","doi":"10.1109/ICDAR.2019.00-41","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00-41","url":null,"abstract":"In this era of advanced technology and automation, information extraction has become a very common practice for the analysis of data. A technique known as Optical Character Recognition (OCR) is used for recognition of text. The purpose is to extract textual data for automatic information analysis or natural language processing of document images. However, in the field of cheminformatics where it is required to recognize 2D molecular structures as they are published in research journals or patent documents, OCR is not adequate for processing, as chemical compounds can be represented both in textual as well as in graphical format. The digital representation of an image based chemical structure allows not only patent analysis teams to provide customize insights but also cheminformatic research groups to enhance their molecular structure databases, which further can be used for querying structure as well as sub-structural patterns. Some tools have been made for extraction and processing of image-based molecular structures. Optical Structure Recognition Application (OSRA) being one of the tools that partially fulfill the task of recognizing chemical structural in document images into chemical formats (SMILES, SDF, or MOL). However, it has few problems such as poor character recognition, false structure extraction, and slow processing. In this paper, we have developed a prototype Chemical Structure Recognition (CSR) system using modern and advanced image processing open-source libraries, which allows us to extract structural information of a chemical structure embedded in the form of a digital raster image. The CSR system is capable of processing chemical information contained in chemical structure image and generates the SMILES or MOL representation. For performance evaluation, we have used two different data sets to measure the potential of the CSR system. It yields better results than OSRA that depict accurate recognition, fast extraction, and correctness of great significance.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126419361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Multiple Comparative Attention Network for Offline Handwritten Chinese Character Recognition 面向离线手写体汉字识别的多重比较注意网络
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00101
Qingquan Xu, X. Bai, Wenyu Liu
Recent advances in deep learning have made great progress in offline Handwritten Chinese Character Recognition (HCCR). However, most existing CNN-based methods only utilize global image features as contextual guidance to classify characters, while neglecting the local discriminative features which is very important for HCCR. To overcome this limitation, in this paper, we present a convolutional neural network with multiple comparative attention (MCANet) in order to produce separable local attention regions with discriminative feature across different categories. Concretely, our MCANet takes the last convolutional feature map as input and outputs multiple attention maps, a contrastive loss is used to restrict different attention selectively focus on different sub-regions. Moreover, we apply a region-level center loss to pull the features that learned from the same class and different regions closer to further obtain robust features invariant to large intra-class variance. Combining with classification loss, our method can learn which parts of images are relevant for recognizing characters and adaptively integrates information from different regions to make the final prediction. We conduct experiments on ICDAR2013 offline HCCR competition dataset with our proposed approach and achieves an accuracy of 97.66%, outperforming all single-network methods trained only on handwritten data.
近年来,深度学习在离线手写汉字识别(HCCR)方面取得了很大进展。然而,现有的基于cnn的方法大多只利用全局图像特征作为上下文指导对字符进行分类,而忽略了对HCCR非常重要的局部判别特征。为了克服这一限制,本文提出了一种具有多重比较注意的卷积神经网络(MCANet),以产生具有不同类别区分特征的可分离局部注意区域。具体来说,我们的MCANet将最后一个卷积特征图作为输入和输出多个注意图,使用对比损失来限制不同的注意选择性地集中在不同的子区域上。此外,我们应用区域级中心损失将从同一类和不同区域学习到的特征拉得更近,以进一步获得对大类内方差不变的鲁棒特征。结合分类损失,我们的方法可以学习图像中哪些部分与字符识别相关,并自适应地整合不同区域的信息进行最终预测。在ICDAR2013离线HCCR比赛数据集上进行了实验,准确率达到97.66%,优于所有仅在手写数据上训练的单网络方法。
{"title":"Multiple Comparative Attention Network for Offline Handwritten Chinese Character Recognition","authors":"Qingquan Xu, X. Bai, Wenyu Liu","doi":"10.1109/ICDAR.2019.00101","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00101","url":null,"abstract":"Recent advances in deep learning have made great progress in offline Handwritten Chinese Character Recognition (HCCR). However, most existing CNN-based methods only utilize global image features as contextual guidance to classify characters, while neglecting the local discriminative features which is very important for HCCR. To overcome this limitation, in this paper, we present a convolutional neural network with multiple comparative attention (MCANet) in order to produce separable local attention regions with discriminative feature across different categories. Concretely, our MCANet takes the last convolutional feature map as input and outputs multiple attention maps, a contrastive loss is used to restrict different attention selectively focus on different sub-regions. Moreover, we apply a region-level center loss to pull the features that learned from the same class and different regions closer to further obtain robust features invariant to large intra-class variance. Combining with classification loss, our method can learn which parts of images are relevant for recognizing characters and adaptively integrates information from different regions to make the final prediction. We conduct experiments on ICDAR2013 offline HCCR competition dataset with our proposed approach and achieves an accuracy of 97.66%, outperforming all single-network methods trained only on handwritten data.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126539694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
TextEdge: Multi-oriented Scene Text Detection via Region Segmentation and Edge Classification TextEdge:基于区域分割和边缘分类的多方向场景文本检测
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00067
Chen Du, Chunheng Wang, Yanna Wang, Zipeng Feng, Jiyuan Zhang
The semantic-segmentation-based scene text detection algorithms always use the bounding-box regions or their shrinks to represent the text pixels. However, the non-text pixel information in these regions easily results in the poor performance of text detection, because these semantic segmentation methods need accurate pixel-level annotated training data to achieve approving performance and they are sensitive to noise and interference. In this work, we propose a fully convolutional network (FCN) based method termed TextEdge for multi-oriented scene text detection. Compared with previous methods simply using bounding-box regions as a segmentation mask, TextEdge introduces the text-region edge map as a new segmentation mask. Edge information is more representative for text areas and is proved to be effective in improving detection performance. TextEdge is optimized in an end-to-end way with multi-task outputs: text and non-text classification, text-edge prediction and the text boundaries regression. Experiments on standard datasets demonstrate that the proposed method achieves state-of-the-art performance in both accuracy and efficiency. Specifically, it achieves an F-score of 0.88 on ICDAR 2013 dataset and 0.86 on ICDAR 2015 dataset.
基于语义分割的场景文本检测算法通常使用边界框区域或边界框区域的收缩来表示文本像素。然而,这些区域的非文本像素信息容易导致文本检测性能不佳,因为这些语义分割方法需要精确的像素级标注训练数据才能达到满意的性能,并且对噪声和干扰敏感。在这项工作中,我们提出了一种基于全卷积网络(FCN)的方法,称为TextEdge,用于多方向场景文本检测。与以往单纯使用边界框区域作为分割蒙版的方法相比,TextEdge引入了文本区域边缘映射作为新的分割蒙版。边缘信息在文本区域中更具代表性,被证明对提高检测性能是有效的。TextEdge以端到端方式优化,具有多任务输出:文本和非文本分类,文本边缘预测和文本边界回归。在标准数据集上的实验表明,该方法在精度和效率方面都达到了最先进的水平。具体来说,它在ICDAR 2013数据集上的f值为0.88,在ICDAR 2015数据集上的f值为0.86。
{"title":"TextEdge: Multi-oriented Scene Text Detection via Region Segmentation and Edge Classification","authors":"Chen Du, Chunheng Wang, Yanna Wang, Zipeng Feng, Jiyuan Zhang","doi":"10.1109/ICDAR.2019.00067","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00067","url":null,"abstract":"The semantic-segmentation-based scene text detection algorithms always use the bounding-box regions or their shrinks to represent the text pixels. However, the non-text pixel information in these regions easily results in the poor performance of text detection, because these semantic segmentation methods need accurate pixel-level annotated training data to achieve approving performance and they are sensitive to noise and interference. In this work, we propose a fully convolutional network (FCN) based method termed TextEdge for multi-oriented scene text detection. Compared with previous methods simply using bounding-box regions as a segmentation mask, TextEdge introduces the text-region edge map as a new segmentation mask. Edge information is more representative for text areas and is proved to be effective in improving detection performance. TextEdge is optimized in an end-to-end way with multi-task outputs: text and non-text classification, text-edge prediction and the text boundaries regression. Experiments on standard datasets demonstrate that the proposed method achieves state-of-the-art performance in both accuracy and efficiency. Specifically, it achieves an F-score of 0.88 on ICDAR 2013 dataset and 0.86 on ICDAR 2015 dataset.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121651803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
KeyWord Spotting using Siamese Triplet Deep Neural Networks 基于Siamese三重态深度神经网络的关键词识别
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00187
Yasmine Serdouk, V. Eglin, S. Bres, Mylène Pardoen
Deep neural networks has shown great success in computer vision fields by achieving considerable state-of-the-art results and are beginning to arouse big interest in the document analysis community. In this paper, we present a novel siamese deep network of three inputs that allows retrieving the most similar words to a given query. The proposed system follows a query-by-example approach according to a segmentation-based technique and aims to learn suitable representations of handwritten word images, for which a simple Euclidean distance could perform the matching. The results obtained for the George Washington dataset show the potential and the effectiveness of the proposed keyword spotting system.
深度神经网络在计算机视觉领域取得了巨大的成功,取得了相当先进的成果,并开始引起文档分析社区的极大兴趣。在本文中,我们提出了一个新颖的三个输入的暹罗深度网络,它允许检索与给定查询最相似的单词。该系统采用基于分割技术的逐例查询方法,旨在学习手写单词图像的合适表示,简单的欧几里得距离可以完成匹配。乔治华盛顿数据集的结果显示了所提出的关键字定位系统的潜力和有效性。
{"title":"KeyWord Spotting using Siamese Triplet Deep Neural Networks","authors":"Yasmine Serdouk, V. Eglin, S. Bres, Mylène Pardoen","doi":"10.1109/ICDAR.2019.00187","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00187","url":null,"abstract":"Deep neural networks has shown great success in computer vision fields by achieving considerable state-of-the-art results and are beginning to arouse big interest in the document analysis community. In this paper, we present a novel siamese deep network of three inputs that allows retrieving the most similar words to a given query. The proposed system follows a query-by-example approach according to a segmentation-based technique and aims to learn suitable representations of handwritten word images, for which a simple Euclidean distance could perform the matching. The results obtained for the George Washington dataset show the potential and the effectiveness of the proposed keyword spotting system.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130443681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Relation Network Based Approach to Curved Text Detection 基于关系网络的曲线文本检测方法
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00118
Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo
In this paper, a new relation network based approach to curved text detection is proposed by formulating it as a visual relationship detection problem. The key idea is to decompose curved text detection into two subproblems, namely detection of text primitives and prediction of link relationship for each nearby text primitive pair. Specifically, an anchor-free region proposal network based text detector is first used to detect text primitives of different scales from different feature maps of a feature pyramid network, from which a manageable number of text primitive pairs are selected. Then, a relation network is used to predict whether each text primitive pair belongs to a same text instance. Finally, isolated text primitives are grouped into curved text instances based on link relationships of text primitive pairs. Because pairwise link prediction has used features extracted from the bounding boxes of each text primitive and their union, the relation network can effectively leverage wider context information to improve link prediction accuracy. Furthermore, since the link relationships of relatively distant text primitives can be predicted robustly, our relation network based text detector is capable of detecting text instances with large inter-character spaces. Consequently, our proposed approach achieves superior performance on not only two public curved text detection datasets, namely Total-Text and SCUT-CTW1500, but also a multi-oriented text detection dataset, namely MSRA-TD500.
本文提出了一种新的基于关系网络的曲线文本检测方法,将其表述为视觉关系检测问题。关键思想是将弯曲文本检测分解为两个子问题,即文本原语检测和相邻文本原语对的链接关系预测。具体而言,首先采用基于无锚区建议网络的文本检测器,从特征金字塔网络的不同特征映射中检测不同尺度的文本原语,并从中选择可管理数量的文本原语对。然后,使用关系网络来预测每个文本原语对是否属于同一个文本实例。最后,根据文本原语对的链接关系,将孤立的文本原语分组为弯曲的文本实例。由于两两链接预测使用了从每个文本原语的边界框及其联合中提取的特征,因此关系网络可以有效地利用更广泛的上下文信息来提高链接预测的准确性。此外,由于相对较远的文本原语之间的链接关系可以被稳健地预测,我们基于关系网络的文本检测器能够检测具有较大字符间空间的文本实例。因此,我们提出的方法不仅在Total-Text和SCUT-CTW1500两个公共曲线文本检测数据集上,而且在MSRA-TD500这个多方向文本检测数据集上都取得了优异的性能。
{"title":"A Relation Network Based Approach to Curved Text Detection","authors":"Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo","doi":"10.1109/ICDAR.2019.00118","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00118","url":null,"abstract":"In this paper, a new relation network based approach to curved text detection is proposed by formulating it as a visual relationship detection problem. The key idea is to decompose curved text detection into two subproblems, namely detection of text primitives and prediction of link relationship for each nearby text primitive pair. Specifically, an anchor-free region proposal network based text detector is first used to detect text primitives of different scales from different feature maps of a feature pyramid network, from which a manageable number of text primitive pairs are selected. Then, a relation network is used to predict whether each text primitive pair belongs to a same text instance. Finally, isolated text primitives are grouped into curved text instances based on link relationships of text primitive pairs. Because pairwise link prediction has used features extracted from the bounding boxes of each text primitive and their union, the relation network can effectively leverage wider context information to improve link prediction accuracy. Furthermore, since the link relationships of relatively distant text primitives can be predicted robustly, our relation network based text detector is capable of detecting text instances with large inter-character spaces. Consequently, our proposed approach achieves superior performance on not only two public curved text detection datasets, namely Total-Text and SCUT-CTW1500, but also a multi-oriented text detection dataset, namely MSRA-TD500.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116584918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2019 International Conference on Document Analysis and Recognition (ICDAR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1