首页 > 最新文献

2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献

英文 中文
A Meaningful Information Extraction System for Interactive Analysis of Documents 面向文档交互分析的有意义信息抽取系统
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00024
Julien Maître, M. Ménard, Guillaume Chiron, A. Bouju, Nicolas Sidère
This paper is related to a project aiming at discovering weak signals from different streams of information, possibly sent by whistleblowers. The study presented in this paper tackles the particular problem of clustering topics at multi-levels from multiple documents, and then extracting meaningful descriptors, such as weighted lists of words for document representations in a multi-dimensions space. In this context, we present a novel idea which combines Latent Dirichlet Allocation and Word2vec (providing a consistency metric regarding the partitioned topics) as potential method for limiting the "a priori" number of cluster K usually needed in classical partitioning approaches. We proposed 2 implementations of this idea, respectively able to: (1) finding the best K for LDA in terms of topic consistency; (2) gathering the optimal clusters from different levels of clustering. We also proposed a non-traditional visualization approach based on a multi-agents system which combines both dimension reduction and interactivity.
本文与一个项目有关,该项目旨在发现来自不同信息流的微弱信号,这些信息流可能是由举报人发送的。本文的研究解决了从多个文档中进行多层次主题聚类的特殊问题,然后提取有意义的描述符,例如多维空间中用于文档表示的加权词表。在这种情况下,我们提出了一种新的想法,它结合了潜狄利克雷分配和Word2vec(提供关于分区主题的一致性度量)作为限制经典分区方法中通常需要的聚类K的“先验”数量的潜在方法。我们提出了这一思想的两种实现,分别能够:(1)在主题一致性方面找到LDA的最佳K;(2)从不同层次的聚类中聚出最优聚类。我们还提出了一种基于多智能体系统的非传统可视化方法,该方法结合了降维和交互性。
{"title":"A Meaningful Information Extraction System for Interactive Analysis of Documents","authors":"Julien Maître, M. Ménard, Guillaume Chiron, A. Bouju, Nicolas Sidère","doi":"10.1109/ICDAR.2019.00024","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00024","url":null,"abstract":"This paper is related to a project aiming at discovering weak signals from different streams of information, possibly sent by whistleblowers. The study presented in this paper tackles the particular problem of clustering topics at multi-levels from multiple documents, and then extracting meaningful descriptors, such as weighted lists of words for document representations in a multi-dimensions space. In this context, we present a novel idea which combines Latent Dirichlet Allocation and Word2vec (providing a consistency metric regarding the partitioned topics) as potential method for limiting the \"a priori\" number of cluster K usually needed in classical partitioning approaches. We proposed 2 implementations of this idea, respectively able to: (1) finding the best K for LDA in terms of topic consistency; (2) gathering the optimal clusters from different levels of clustering. We also proposed a non-traditional visualization approach based on a multi-agents system which combines both dimension reduction and interactivity.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133812088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Table-of-Contents Generation on Contemporary Documents 当代文献目录生成
Pub Date : 2019-09-01 DOI: 10.1109/icdar.2019.00025
Najah-Imane Bentabet, Rémi Juge, Sira Ferradans
The generation of precise and detailed Table-Of-Contents (TOC) from a document is a problem of major importance for document understanding and information extraction. Despite its importance, it is still a challenging task, especially for non-standardized documents with rich layout information such as commercial documents. In this paper, we present a new neural-based pipeline for TOC generation applicable to any searchable document. Unlike previous methods, we do not use semantic labeling nor assume the presence of parsable TOC pages in the document. Moreover, we analyze the influence of using external knowledge encoded as a template. We empirically show that this approach is only useful in a very low resource environment. Finally, we propose a new domain-specific data set that sheds some light on the difficulties of TOC generation in real-world documents. The proposed method shows better performance than the state-of-the-art on a public data set and on the newly released data set.
从文档中生成精确、详细的目录(Table-Of-Contents, TOC)是文档理解和信息提取的一个重要问题。尽管它很重要,但它仍然是一项具有挑战性的任务,特别是对于具有丰富布局信息的非标准化文档,如商业文档。在本文中,我们提出了一种新的基于神经的TOC生成管道,适用于任何可搜索的文档。与以前的方法不同,我们不使用语义标记,也不假设文档中存在可解析的TOC页面。此外,我们还分析了使用外部知识编码作为模板的影响。我们的经验表明,这种方法只在资源非常少的环境中有用。最后,我们提出了一个新的特定于领域的数据集,它揭示了现实世界文档中TOC生成的一些困难。该方法在公共数据集和新发布的数据集上都表现出比现有方法更好的性能。
{"title":"Table-of-Contents Generation on Contemporary Documents","authors":"Najah-Imane Bentabet, Rémi Juge, Sira Ferradans","doi":"10.1109/icdar.2019.00025","DOIUrl":"https://doi.org/10.1109/icdar.2019.00025","url":null,"abstract":"The generation of precise and detailed Table-Of-Contents (TOC) from a document is a problem of major importance for document understanding and information extraction. Despite its importance, it is still a challenging task, especially for non-standardized documents with rich layout information such as commercial documents. In this paper, we present a new neural-based pipeline for TOC generation applicable to any searchable document. Unlike previous methods, we do not use semantic labeling nor assume the presence of parsable TOC pages in the document. Moreover, we analyze the influence of using external knowledge encoded as a template. We empirically show that this approach is only useful in a very low resource environment. Finally, we propose a new domain-specific data set that sheds some light on the difficulties of TOC generation in real-world documents. The proposed method shows better performance than the state-of-the-art on a public data set and on the newly released data set.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132803367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A Genetic-Based Search for Adaptive Table Recognition in Spreadsheets 电子表格中自适应表识别的遗传搜索
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00206
Elvis Koci, Maik Thiele, Oscar Romero, Wolfgang Lehner
Spreadsheets are very successful content generation tools, used in almost every enterprise to create a wealth of information. However, this information is often intermingled with various formatting, layout, and textual metadata, making it hard to identify and interpret the tabular payload. Previous works proposed to solve this problem by mainly using heuristics. Although fast to implement, these approaches fail to capture the high variability of user-generated spreadsheet tables. Therefore, in this paper, we propose a supervised approach that is able to adapt to arbitrary spreadsheet datasets. We use a graph model to represent the contents of a sheet, which carries layout and spatial features. Subsequently, we apply genetic-based approaches for graph partitioning, to recognize the parts of the graph corresponding to tables in the sheet. The search for tables is guided by an objective function, which is tuned to match the specific characteristics of a given dataset. We present the feasibility of this approach with an experimental evaluation, on a large, real-world spreadsheet corpus.
电子表格是非常成功的内容生成工具,几乎每个企业都使用它来创建丰富的信息。然而,这些信息经常与各种格式、布局和文本元数据混杂在一起,使得难以识别和解释表格有效负载。以往的研究主要是利用启发式方法来解决这个问题。尽管实现速度很快,但这些方法无法捕捉到用户生成的电子表格的高度可变性。因此,在本文中,我们提出了一种能够适应任意电子表格数据集的监督方法。我们使用图形模型来表示工作表的内容,其中包含布局和空间特征。随后,我们应用基于遗传的方法进行图划分,以识别与表中表对应的图的部分。表的搜索由一个目标函数指导,该目标函数被调优以匹配给定数据集的特定特征。我们提出了这种方法的可行性与实验评估,在一个大的,现实世界的电子表格语料库。
{"title":"A Genetic-Based Search for Adaptive Table Recognition in Spreadsheets","authors":"Elvis Koci, Maik Thiele, Oscar Romero, Wolfgang Lehner","doi":"10.1109/ICDAR.2019.00206","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00206","url":null,"abstract":"Spreadsheets are very successful content generation tools, used in almost every enterprise to create a wealth of information. However, this information is often intermingled with various formatting, layout, and textual metadata, making it hard to identify and interpret the tabular payload. Previous works proposed to solve this problem by mainly using heuristics. Although fast to implement, these approaches fail to capture the high variability of user-generated spreadsheet tables. Therefore, in this paper, we propose a supervised approach that is able to adapt to arbitrary spreadsheet datasets. We use a graph model to represent the contents of a sheet, which carries layout and spatial features. Subsequently, we apply genetic-based approaches for graph partitioning, to recognize the parts of the graph corresponding to tables in the sheet. The search for tables is guided by an objective function, which is tuned to match the specific characteristics of a given dataset. We present the feasibility of this approach with an experimental evaluation, on a large, real-world spreadsheet corpus.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115633820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Learning Free Line Detection in Manuscripts using Distance Transform Graph 利用距离变换图学习手稿中的自由线检测
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00044
M. Kassis, Jihad El-Sana
We present a fully automated learning free method, for line detection in manuscripts. We begin by separating components that span over multiple lines, then we remove noise, and small connected components such as diacritics. We apply a distance transform on the image to create the image skeleton. The skeleton is pruned, its vertexes and edges are detected, in order to generate the initial document graph. We calculate the vertex v-score using its t-score and l-score quantifying its distance from being an absolute link in a line. In a greedy manner we classify each edge in the graph either a link, a bridge or a conflict edge. We merge every two edges classified as link together, then we merge the conflict edges next. Finally we remove the bridge edges from the graph generating the final form of the graph. Each edge in the graph equals to one extracted line. We applied the method on the DIVA-hisDB dataset on both public and private sections. The public section participated in the recently conducted Layout Analysis for Challenging Medieval Manuscripts Competition, and we have achieved results surpassing the vast majority of these systems.
我们提出了一种完全自动化的免费学习方法,用于手稿中的线条检测。我们首先分离跨多条线的组件,然后去除噪声和小的连接组件,如变音符号。我们在图像上应用距离变换来创建图像骨架。对骨架进行剪枝,检测其顶点和边缘,从而生成初始文档图。我们使用顶点的t-score和l-score来计算顶点的v-score,它们量化了顶点与直线上绝对链接的距离。以贪婪的方式,我们将图中的每条边分类为链接、桥或冲突边。我们先合并每两条被分类为连接的边,然后再合并冲突的边。最后,我们从图中移除桥边,生成图的最终形式。图中的每条边等于一条提取的线。我们在公共和私有部分的DIVA-hisDB数据集上应用了该方法。公共部门参加了最近进行的中世纪手稿挑战布局分析比赛,我们取得了超过绝大多数这些系统的成绩。
{"title":"Learning Free Line Detection in Manuscripts using Distance Transform Graph","authors":"M. Kassis, Jihad El-Sana","doi":"10.1109/ICDAR.2019.00044","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00044","url":null,"abstract":"We present a fully automated learning free method, for line detection in manuscripts. We begin by separating components that span over multiple lines, then we remove noise, and small connected components such as diacritics. We apply a distance transform on the image to create the image skeleton. The skeleton is pruned, its vertexes and edges are detected, in order to generate the initial document graph. We calculate the vertex v-score using its t-score and l-score quantifying its distance from being an absolute link in a line. In a greedy manner we classify each edge in the graph either a link, a bridge or a conflict edge. We merge every two edges classified as link together, then we merge the conflict edges next. Finally we remove the bridge edges from the graph generating the final form of the graph. Each edge in the graph equals to one extracted line. We applied the method on the DIVA-hisDB dataset on both public and private sections. The public section participated in the recently conducted Layout Analysis for Challenging Medieval Manuscripts Competition, and we have achieved results surpassing the vast majority of these systems.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115092749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
GARN: A Novel Generative Adversarial Recognition Network for End-to-End Scene Character Recognition 一种新的端到端场景字符识别生成对抗识别网络
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00115
Hao Kong, Dongqi Tang, Xi Meng, Tong Lu
Deep neural networks have shown their powerful ability in scene character recognition tasks; however, in real life applications, it is often hard to find a large amount of high-quality scene character images for training these networks. In this paper, we proposed a novel end-to-end network named Generative Adversarial Recognition Networks (GARN) for accurate natural scene character recognition in an end-to-end way. The proposed GARN consists of a generation part and a classification part. For the generation part, the purpose is to produce diverse realistic samples to help the classifier overcome the overfitting problem. While in the classification part, a multinomial classifier is trained along with the generator in the form of a game to achieve better character recognition performance. That is, the proposed GARN has the ability to augment scene character data by its generation part and recognize scene characters by its classification part. It is trained in an adversarial way to improve recognition performance. The experimental results on benchmark datasets and the comparisons with the state-of-the-art methods show the effectiveness of the proposed GARN in scene character recognition.
深度神经网络在场景字符识别任务中显示出强大的能力;然而,在现实应用中,通常很难找到大量高质量的场景人物图像来训练这些网络。在本文中,我们提出了一种新的端到端网络,称为生成对抗识别网络(GARN),用于端到端精确的自然场景字符识别。本文提出的GARN由生成部分和分类部分组成。对于生成部分,目的是产生多样化的真实样本,以帮助分类器克服过拟合问题。而在分类部分,以游戏的形式与生成器一起训练多项式分类器,以获得更好的字符识别性能。即,本文提出的GARN具有通过生成部分增强场景字符数据和通过分类部分识别场景字符的能力。它以对抗的方式进行训练,以提高识别性能。在基准数据集上的实验结果以及与现有方法的比较表明了所提出的GARN在场景字符识别中的有效性。
{"title":"GARN: A Novel Generative Adversarial Recognition Network for End-to-End Scene Character Recognition","authors":"Hao Kong, Dongqi Tang, Xi Meng, Tong Lu","doi":"10.1109/ICDAR.2019.00115","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00115","url":null,"abstract":"Deep neural networks have shown their powerful ability in scene character recognition tasks; however, in real life applications, it is often hard to find a large amount of high-quality scene character images for training these networks. In this paper, we proposed a novel end-to-end network named Generative Adversarial Recognition Networks (GARN) for accurate natural scene character recognition in an end-to-end way. The proposed GARN consists of a generation part and a classification part. For the generation part, the purpose is to produce diverse realistic samples to help the classifier overcome the overfitting problem. While in the classification part, a multinomial classifier is trained along with the generator in the form of a game to achieve better character recognition performance. That is, the proposed GARN has the ability to augment scene character data by its generation part and recognize scene characters by its classification part. It is trained in an adversarial way to improve recognition performance. The experimental results on benchmark datasets and the comparisons with the state-of-the-art methods show the effectiveness of the proposed GARN in scene character recognition.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116430745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction ICDAR2019扫描收据OCR和信息提取竞赛
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00244
Zheng Huang, Kai Chen, Jianhua He, X. Bai, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar
The ICDAR 2019 Challenge on "Scanned receipts OCR and key information extraction" (SROIE) covers important aspects related to the automated analysis of scanned receipts. The SROIE tasks play a key role in many document analysis systems and hold significant commercial potential. Although a lot of work has been published over the years on administrative document analysis, the community has advanced relatively slowly, as most datasets have been kept private. One of the key contributions of SROIE to the document analysis community is to offer a first, standardized dataset of 1000 whole scanned receipt images and annotations, as well as an evaluation procedure for such tasks. The Challenge is structured around three tasks, namely Scanned Receipt Text Localization (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3). The competition opened on 10th February, 2019 and closed on 5th May, 2019. We received 29, 24 and 18 valid submissions received for the three competition tasks, respectively. This report presents the competition datasets, define the tasks and the evaluation protocols, offer detailed submission statistics, as well as an analysis of the submitted performance. While the tasks of text localization and recognition seem to be relatively easy to tackle, it is interesting to observe the variety of ideas and approaches proposed for the information extraction task. According to the submissions' performance we believe there is still margin for improving information extraction performance, although the current dataset would have to grow substantially in following editions. Given the success of the SROIE competition evidenced by the wide interest generated and the healthy number of submissions from academic, research institutes and industry over different countries, we consider that the SROIE competition can evolve into a useful resource for the community, drawing further attention and promoting research and development efforts in this field.
2019年ICDAR关于“扫描收据OCR和关键信息提取”(SROIE)的挑战涵盖了与扫描收据自动分析相关的重要方面。SROIE任务在许多文档分析系统中发挥关键作用,并具有重要的商业潜力。尽管多年来已经发表了大量关于行政文件分析的工作,但由于大多数数据集都是保密的,因此社区进展相对缓慢。SROIE对文档分析社区的主要贡献之一是提供了第一个包含1000个完整扫描收据图像和注释的标准化数据集,以及此类任务的评估程序。挑战赛围绕三个任务展开,即扫描收据文本本地化(任务1)、扫描收据OCR(任务2)和扫描收据关键信息提取(任务3)。竞赛于2019年2月10日开始,2019年5月5日结束。我们分别收到了29份、24份和18份有效的参赛作品。该报告展示了竞赛数据集,定义了任务和评估协议,提供了详细的提交统计数据,以及对提交的表现的分析。虽然文本定位和识别任务似乎相对容易解决,但观察为信息提取任务提出的各种想法和方法是有趣的。根据提交的表现,我们认为仍有改进信息提取性能的余地,尽管当前的数据集将在接下来的版本中大幅增长。鉴于该比赛的成功,来自不同国家的学术、研究机构和业界都对比赛产生了广泛的兴趣,并提交了大量的参赛作品,我们认为该比赛可以成为社会的有用资源,吸引更多的关注,并促进该领域的研究和发展工作。
{"title":"ICDAR2019 Competition on Scanned Receipt OCR and Information Extraction","authors":"Zheng Huang, Kai Chen, Jianhua He, X. Bai, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar","doi":"10.1109/ICDAR.2019.00244","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00244","url":null,"abstract":"The ICDAR 2019 Challenge on \"Scanned receipts OCR and key information extraction\" (SROIE) covers important aspects related to the automated analysis of scanned receipts. The SROIE tasks play a key role in many document analysis systems and hold significant commercial potential. Although a lot of work has been published over the years on administrative document analysis, the community has advanced relatively slowly, as most datasets have been kept private. One of the key contributions of SROIE to the document analysis community is to offer a first, standardized dataset of 1000 whole scanned receipt images and annotations, as well as an evaluation procedure for such tasks. The Challenge is structured around three tasks, namely Scanned Receipt Text Localization (Task 1), Scanned Receipt OCR (Task 2) and Key Information Extraction from Scanned Receipts (Task 3). The competition opened on 10th February, 2019 and closed on 5th May, 2019. We received 29, 24 and 18 valid submissions received for the three competition tasks, respectively. This report presents the competition datasets, define the tasks and the evaluation protocols, offer detailed submission statistics, as well as an analysis of the submitted performance. While the tasks of text localization and recognition seem to be relatively easy to tackle, it is interesting to observe the variety of ideas and approaches proposed for the information extraction task. According to the submissions' performance we believe there is still margin for improving information extraction performance, although the current dataset would have to grow substantially in following editions. Given the success of the SROIE competition evidenced by the wide interest generated and the healthy number of submissions from academic, research institutes and industry over different countries, we consider that the SROIE competition can evolve into a useful resource for the community, drawing further attention and promoting research and development efforts in this field.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117073203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 152
A Multi-oriented Chinese Keyword Spotter Guided by Text Line Detection 基于文本行检测的多方向中文关键词定位器
Pub Date : 2019-09-01 DOI: 10.1109/icdar.2019.00112
Pei Xu, Shan Huang, Hongzhen Wang, Hao Song, Shen Huang, Qi Ju
Chinese keyword spotting is a challenging task as there is no visual blank for Chinese words. Different from English words which are split naturally by visual blanks, Chinese words are generally split only by semantic information. In this paper, we propose a new Chinese keyword spotter for natural images, which is inspired by Mask R-CNN. We propose to predict the keyword masks guided by text line detection. Firstly, proposals of text lines are generated by Faster R-CNN; Then, text line masks and keyword masks are predicted by segmentation in the proposals. In this way, the text lines and keywords are predicted in parallel. We create two Chinese keyword datasets based on RCTW-17 and ICPR MTWI2018 to verify the effectiveness of our method.
中文关键词识别是一项具有挑战性的任务,因为中文单词没有视觉空白。不同于英语词汇是通过视觉空白自然分割的,汉语词汇一般只通过语义信息进行分割。在本文中,我们提出了一种新的自然图像中文关键词识别器,该识别器的灵感来自Mask R-CNN。我们建议通过文本行检测来预测关键字掩码。首先,使用Faster R-CNN生成文本行建议;然后,对文本行掩码和关键字掩码进行分割预测。通过这种方式,文本行和关键字被并行预测。我们基于RCTW-17和ICPR MTWI2018创建了两个中文关键字数据集来验证我们方法的有效性。
{"title":"A Multi-oriented Chinese Keyword Spotter Guided by Text Line Detection","authors":"Pei Xu, Shan Huang, Hongzhen Wang, Hao Song, Shen Huang, Qi Ju","doi":"10.1109/icdar.2019.00112","DOIUrl":"https://doi.org/10.1109/icdar.2019.00112","url":null,"abstract":"Chinese keyword spotting is a challenging task as there is no visual blank for Chinese words. Different from English words which are split naturally by visual blanks, Chinese words are generally split only by semantic information. In this paper, we propose a new Chinese keyword spotter for natural images, which is inspired by Mask R-CNN. We propose to predict the keyword masks guided by text line detection. Firstly, proposals of text lines are generated by Faster R-CNN; Then, text line masks and keyword masks are predicted by segmentation in the proposals. In this way, the text lines and keywords are predicted in parallel. We create two Chinese keyword datasets based on RCTW-17 and ICPR MTWI2018 to verify the effectiveness of our method.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115228273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast Text/non-Text Image Classification with Knowledge Distillation 基于知识蒸馏的快速文本/非文本图像分类
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00234
Miao Zhao, Rui-Qi Wang, Fei Yin, Xu-Yao Zhang, Lin-Lin Huang, J. Ogier
How to efficiently judge whether a natural image contains texts or not is an important problem. Since text detection and recognition algorithms are usually time-consuming, and it is unnecessary to run them on images that do not contain any texts. In this paper, we investigate this problem from two perspectives: the speed and the accuracy. First, to achieve high speed for efficient filtering large number of images especially on CPU, we propose using small and shallow convolutional neural network, where the features from different layers are adaptively pooled into certain sizes to overcome difficulties caused by multiple scales and various locations. Although this can achieve high speed but its accuracy is not satisfactory due to limited capacity of small network. Therefore, our second contribution is using the knowledge distillation to improve the accuracy of the small network, by constructing a larger and deeper neural network as teacher network to instruct the learning process of the small network. With the above two strategies, we can achieve both high speed and high accuracy for filtering scene text images. Experimental results on a benchmark dataset have shown the effectiveness of our method: the teacher network yields state-of-the-art performance, and the distilled small network achieves high performance while maintaining high speed which is 176 times faster on CPU and 3.8 times faster on GPU than a compared benchmark method.
如何有效地判断一幅自然图像是否含有文本是一个重要的问题。由于文本检测和识别算法通常是耗时的,并且没有必要在不包含任何文本的图像上运行它们。本文从速度和准确性两个方面对这一问题进行了研究。首先,为了实现对大量图像的高速高效过滤,特别是在CPU上,我们提出使用小而浅的卷积神经网络,将来自不同层的特征自适应地汇集成一定的大小,以克服多尺度和不同位置带来的困难。虽然可以达到较高的速度,但由于小型网络容量的限制,其精度不能令人满意。因此,我们的第二个贡献是利用知识蒸馏来提高小网络的准确性,通过构建一个更大更深的神经网络作为教师网络来指导小网络的学习过程。通过以上两种策略,我们可以实现对场景文本图像的高速和高精度过滤。在基准数据集上的实验结果表明了我们方法的有效性:教师网络产生了最先进的性能,而蒸馏的小网络在保持高速的同时实现了高性能,在CPU上比基准方法快176倍,在GPU上比基准方法快3.8倍。
{"title":"Fast Text/non-Text Image Classification with Knowledge Distillation","authors":"Miao Zhao, Rui-Qi Wang, Fei Yin, Xu-Yao Zhang, Lin-Lin Huang, J. Ogier","doi":"10.1109/ICDAR.2019.00234","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00234","url":null,"abstract":"How to efficiently judge whether a natural image contains texts or not is an important problem. Since text detection and recognition algorithms are usually time-consuming, and it is unnecessary to run them on images that do not contain any texts. In this paper, we investigate this problem from two perspectives: the speed and the accuracy. First, to achieve high speed for efficient filtering large number of images especially on CPU, we propose using small and shallow convolutional neural network, where the features from different layers are adaptively pooled into certain sizes to overcome difficulties caused by multiple scales and various locations. Although this can achieve high speed but its accuracy is not satisfactory due to limited capacity of small network. Therefore, our second contribution is using the knowledge distillation to improve the accuracy of the small network, by constructing a larger and deeper neural network as teacher network to instruct the learning process of the small network. With the above two strategies, we can achieve both high speed and high accuracy for filtering scene text images. Experimental results on a benchmark dataset have shown the effectiveness of our method: the teacher network yields state-of-the-art performance, and the distilled small network achieves high performance while maintaining high speed which is 176 times faster on CPU and 3.8 times faster on GPU than a compared benchmark method.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123514711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
HITHCD-2018: Handwritten Chinese Character Database of 21K-Category HITHCD-2018: 21k类手写体汉字数据库
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00222
Tonghua Su, Wei Pan, Lijuan Yu
Current state of handwritten Chinese character recognition (HCCR) conducted on well-confined character set, far from meeting industrial requirements. The paper describes the creation of a large-scale handwritten Chinese character database. Constructing the database is an effort to scale up Chinese handwritten character classification task to cover the full list of GBK character set specification. It consists of 21-thousand Chinese character categories and 20-million character images, larger than previous databases both in scale and diversity. We present solutions to the challenges of collecting and annotating such large-scale handwritten character samples. We elaborately design the sampling strategy, extract salient signals in a systematic way, annotate the tremendous characters through three distinct stages. Experiments are conducted the generalization to other handwritten character databases and our database demonstrates great values. Surely, its scale opens unprecedented opportunities both in evaluation of character recognition algorithms and in developing new techniques.
目前的手写体汉字识别(HCCR)是在受限的字符集上进行的,远远不能满足工业需求。本文描述了一个大型手写体汉字数据库的建立。该数据库的建立是为了将中文手写体字符分类任务扩展到GBK字符集规范的完整列表。它包括2.1万个汉字类别和2000万个汉字图像,在规模和多样性上都超过了以往的数据库。我们提出了收集和注释这种大规模手写字符样本的挑战的解决方案。我们精心设计了采样策略,系统地提取了显著信号,并通过三个不同的阶段标注了大量的特征。将该方法推广到其他手写体字符数据库进行了实验,验证了该数据库的应用价值。当然,它的规模为字符识别算法的评估和新技术的开发提供了前所未有的机会。
{"title":"HITHCD-2018: Handwritten Chinese Character Database of 21K-Category","authors":"Tonghua Su, Wei Pan, Lijuan Yu","doi":"10.1109/ICDAR.2019.00222","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00222","url":null,"abstract":"Current state of handwritten Chinese character recognition (HCCR) conducted on well-confined character set, far from meeting industrial requirements. The paper describes the creation of a large-scale handwritten Chinese character database. Constructing the database is an effort to scale up Chinese handwritten character classification task to cover the full list of GBK character set specification. It consists of 21-thousand Chinese character categories and 20-million character images, larger than previous databases both in scale and diversity. We present solutions to the challenges of collecting and annotating such large-scale handwritten character samples. We elaborately design the sampling strategy, extract salient signals in a systematic way, annotate the tremendous characters through three distinct stages. Experiments are conducted the generalization to other handwritten character databases and our database demonstrates great values. Surely, its scale opens unprecedented opportunities both in evaluation of character recognition algorithms and in developing new techniques.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130339133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Handwritten Chinese Text Recognizer Applying Multi-level Multimodal Fusion Network 基于多层次多模态融合网络的手写体中文文本识别
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00235
Yuhuan Xiu, Qingqing Wang, Hongjian Zhan, Man Lan, Yue Lu
Handwritten Chinese text recognition (HCTR) has received extensive attention from the community of pattern recognition in the past decades. Most existing deep learning methods consist of two stages, i.e., training a text recognition network on the base of visual information, followed by incorporating language constrains with various language models. Therefore, the inherent linguistic semantic information is often neglected when designing the recognition network. To tackle this problem, in this work, we propose a novel multi-level multimodal fusion network and properly embed it into an attention-based LSTM so that both the visual information and the linguistic semantic information can be fully leveraged when predicting sequential outputs from the feature vectors. Experimental results on the ICDAR-2013 competition dataset demonstrate a comparable result with the state-of-the-art approaches.
近几十年来,手写体中文文本识别(HCTR)受到模式识别界的广泛关注。大多数现有的深度学习方法包括两个阶段,即基于视觉信息训练文本识别网络,然后将语言约束与各种语言模型结合起来。因此,在设计识别网络时,往往忽略了固有的语言语义信息。为了解决这一问题,我们提出了一种新的多层次多模态融合网络,并将其适当嵌入到基于注意力的LSTM中,以便在预测特征向量的顺序输出时充分利用视觉信息和语言语义信息。在ICDAR-2013竞赛数据集上的实验结果表明,与最先进的方法具有可比性。
{"title":"A Handwritten Chinese Text Recognizer Applying Multi-level Multimodal Fusion Network","authors":"Yuhuan Xiu, Qingqing Wang, Hongjian Zhan, Man Lan, Yue Lu","doi":"10.1109/ICDAR.2019.00235","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00235","url":null,"abstract":"Handwritten Chinese text recognition (HCTR) has received extensive attention from the community of pattern recognition in the past decades. Most existing deep learning methods consist of two stages, i.e., training a text recognition network on the base of visual information, followed by incorporating language constrains with various language models. Therefore, the inherent linguistic semantic information is often neglected when designing the recognition network. To tackle this problem, in this work, we propose a novel multi-level multimodal fusion network and properly embed it into an attention-based LSTM so that both the visual information and the linguistic semantic information can be fully leveraged when predicting sequential outputs from the feature vectors. Experimental results on the ICDAR-2013 competition dataset demonstrate a comparable result with the state-of-the-art approaches.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129747217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2019 International Conference on Document Analysis and Recognition (ICDAR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1