首页 > 最新文献

2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献

英文 中文
A Synthetic Recipe for OCR OCR的合成配方
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00143
David Etter, Stephen Rawls, Cameron Carpenter, Gregory Sell
Synthetic data generation for optical character recognition (OCR) promises unlimited training data at zero annotation cost. With enough fonts and seed text, we should be able to generate data to train a model that approaches or exceeds the performance with real annotated data. Unfortunately, this is not always the reality. Unconstrained image settings, such as internet memes, scanned web pages, or newspapers, present diverse scripts, fonts, layouts, and complex backgrounds, which cause models trained with synthetic data to break down. In this work, we investigate the synthetic image generation problem on a large multilingual set of unconstrained document images. Our work presents a comprehensive evaluation of the impact of synthetic data attributes on model performance. The results provide a recipe for synthetic data generation that will help guide future research.
光学字符识别(OCR)的合成数据生成承诺以零标注成本获得无限的训练数据。有了足够的字体和种子文本,我们应该能够生成数据来训练一个接近或超过真实带注释数据性能的模型。不幸的是,事实并非总是如此。不受约束的图像设置,如网络表情包、扫描的网页或报纸,呈现出不同的脚本、字体、布局和复杂的背景,这导致用合成数据训练的模型崩溃。在这项工作中,我们研究了一个大型多语言无约束文档图像集上的合成图像生成问题。我们的工作提出了综合数据属性对模型性能影响的综合评估。这些结果为合成数据的生成提供了一个方法,将有助于指导未来的研究。
{"title":"A Synthetic Recipe for OCR","authors":"David Etter, Stephen Rawls, Cameron Carpenter, Gregory Sell","doi":"10.1109/ICDAR.2019.00143","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00143","url":null,"abstract":"Synthetic data generation for optical character recognition (OCR) promises unlimited training data at zero annotation cost. With enough fonts and seed text, we should be able to generate data to train a model that approaches or exceeds the performance with real annotated data. Unfortunately, this is not always the reality. Unconstrained image settings, such as internet memes, scanned web pages, or newspapers, present diverse scripts, fonts, layouts, and complex backgrounds, which cause models trained with synthetic data to break down. In this work, we investigate the synthetic image generation problem on a large multilingual set of unconstrained document images. Our work presents a comprehensive evaluation of the impact of synthetic data attributes on model performance. The results provide a recipe for synthetic data generation that will help guide future research.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127728273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT ICDAR2019基于任意形状文本的鲁棒阅读挑战- RRC-ArT
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00252
Chee-Kheng Chng, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, Chuanming Fang, Shuaitao Zhang, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin
This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting. A total of 78 submissions from 46 unique teams/individuals were received for this competition. The top performing score of each challenge is as follows: i) T1 - 82.65%, ii) T2.1 - 74.3%, iii) T2.2 - 85.32%, iv) T3.1 - 53.86%, and v) T3.2 - 54.91%. Apart from the results, this paper also details the ArT dataset, tasks description, evaluation metrics and participants' methods. The dataset, the evaluation kit as well as the results are publicly available at the challenge website.
本文报道了ICDAR2019关于任意形状文本的鲁棒阅读挑战- RRC-ArT,该挑战包括三个主要挑战:i)场景文本检测,ii)场景文本识别和iii)场景文本识别。本次比赛共收到来自46个独特团队/个人的78份参赛作品。各挑战的最高表现分数为:i) T1 - 82.65%, ii) T2.1 - 74.3%, iii) T2.2 - 85.32%, iv) T3.1 - 53.86%, v) T3.2 - 54.91%。除结果外,本文还详细介绍了ArT数据集、任务描述、评估指标和参与者方法。数据集、评估工具包以及结果都可以在挑战网站上公开获取。
{"title":"ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT","authors":"Chee-Kheng Chng, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, Chuanming Fang, Shuaitao Zhang, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin","doi":"10.1109/ICDAR.2019.00252","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00252","url":null,"abstract":"This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting. A total of 78 submissions from 46 unique teams/individuals were received for this competition. The top performing score of each challenge is as follows: i) T1 - 82.65%, ii) T2.1 - 74.3%, iii) T2.2 - 85.32%, iv) T3.1 - 53.86%, and v) T3.2 - 54.91%. Apart from the results, this paper also details the ArT dataset, tasks description, evaluation metrics and participants' methods. The dataset, the evaluation kit as well as the results are publicly available at the challenge website.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133243687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 124
A Comparative Study of Attention-Based Encoder-Decoder Approaches to Natural Scene Text Recognition 基于注意的自然场景文本识别编码器-解码器方法的比较研究
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00151
Fu'ze Cong, Wenping Hu, Qiang Huo, Li Guo
Attention-based encoder-decoder approaches have shown promising results in scene text recognition. In the literature, models with different encoders, decoders and attention mechanisms have been proposed and compared on isolated word recognition tasks, where the models are trained on either synthetic word images or a small set of real-world images. In this paper, we investigate different components of the attention based framework and compare its performance with a CNN-DBLSTM-CTC based approach on large-scale real-world scene text sentence recognition tasks. We train character models by using more than 1.6M real-world text lines and compare their performance on test sets collected from a variety of real-world scenarios. Our results show that (1) attention on a two-dimensional feature map can yield better performance than one-dimensional one and an RNN based decoder performs better than CNN based one; (2) attention-based approaches can achieve higher recognition accuracy than CNN-DBLSTM-CTC based approaches on isolated word recognition tasks, but perform worse on sentence recognition tasks; (3) it is more effective and efficient for CNN-DBLSTM-CTC based approaches to leverage an explicit language model to boost recognition accuracy.
基于注意力的编码器-解码器方法在场景文本识别中显示出良好的效果。在文献中,已经提出了具有不同编码器、解码器和注意机制的模型,并对孤立的单词识别任务进行了比较,其中模型在合成单词图像或一小部分真实图像上进行了训练。在本文中,我们研究了基于注意力的框架的不同组成部分,并将其与基于CNN-DBLSTM-CTC的方法在大规模真实场景文本句子识别任务中的性能进行了比较。我们通过使用超过160万真实世界的文本行来训练角色模型,并比较它们在从各种真实世界场景收集的测试集上的表现。研究结果表明:(1)对二维特征映射的关注比一维特征映射的关注效果更好,基于RNN的解码器比基于CNN的解码器效果更好;(2)在孤立词识别任务上,基于注意力的方法比基于CNN-DBLSTM-CTC的方法具有更高的识别准确率,但在句子识别任务上表现较差;(3)基于CNN-DBLSTM-CTC的方法利用显式语言模型提高识别精度更为有效和高效。
{"title":"A Comparative Study of Attention-Based Encoder-Decoder Approaches to Natural Scene Text Recognition","authors":"Fu'ze Cong, Wenping Hu, Qiang Huo, Li Guo","doi":"10.1109/ICDAR.2019.00151","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00151","url":null,"abstract":"Attention-based encoder-decoder approaches have shown promising results in scene text recognition. In the literature, models with different encoders, decoders and attention mechanisms have been proposed and compared on isolated word recognition tasks, where the models are trained on either synthetic word images or a small set of real-world images. In this paper, we investigate different components of the attention based framework and compare its performance with a CNN-DBLSTM-CTC based approach on large-scale real-world scene text sentence recognition tasks. We train character models by using more than 1.6M real-world text lines and compare their performance on test sets collected from a variety of real-world scenarios. Our results show that (1) attention on a two-dimensional feature map can yield better performance than one-dimensional one and an RNN based decoder performs better than CNN based one; (2) attention-based approaches can achieve higher recognition accuracy than CNN-DBLSTM-CTC based approaches on isolated word recognition tasks, but perform worse on sentence recognition tasks; (3) it is more effective and efficient for CNN-DBLSTM-CTC based approaches to leverage an explicit language model to boost recognition accuracy.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133041894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Deep Splitting and Merging for Table Structure Decomposition 表结构分解的深度拆分和合并
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00027
Chris Tensmeyer, Vlad I. Morariu, Brian L. Price, Scott D. Cohen, Tony R. Martinez
Given the large variety and complexity of tables, table structure extraction is a challenging task in automated document analysis systems. We present a pair of novel deep learning models (Split and Merge models) that given an input image, 1) predicts the basic table grid pattern and 2) predicts which grid elements should be merged to recover cells that span multiple rows or columns. We propose projection pooling as a novel component of the Split model and grid pooling as a novel part of the Merge model. While most Fully Convolutional Networks rely on local evidence, these unique pooling regions allow our models to take advantage of the global table structure. We achieve state-of-the-art performance on the public ICDAR 2013 Table Competition dataset of PDF documents. On a much larger private dataset which we used to train the models, we significantly outperform both a state-ofthe-art deep model and a major commercial software system.
由于表的多样性和复杂性,表结构提取在自动化文档分析系统中是一项具有挑战性的任务。我们提出了一对新的深度学习模型(拆分和合并模型),给定输入图像,1)预测基本表网格模式,2)预测应该合并哪些网格元素以恢复跨多行或多列的单元格。我们提出投影池作为Split模型的新组件,网格池作为Merge模型的新组件。虽然大多数全卷积网络依赖于局部证据,但这些独特的池化区域允许我们的模型利用全局表结构。我们在PDF文档的公共ICDAR 2013表竞争数据集上实现了最先进的性能。在我们用来训练模型的更大的私有数据集上,我们的表现明显优于最先进的深度模型和主要的商业软件系统。
{"title":"Deep Splitting and Merging for Table Structure Decomposition","authors":"Chris Tensmeyer, Vlad I. Morariu, Brian L. Price, Scott D. Cohen, Tony R. Martinez","doi":"10.1109/ICDAR.2019.00027","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00027","url":null,"abstract":"Given the large variety and complexity of tables, table structure extraction is a challenging task in automated document analysis systems. We present a pair of novel deep learning models (Split and Merge models) that given an input image, 1) predicts the basic table grid pattern and 2) predicts which grid elements should be merged to recover cells that span multiple rows or columns. We propose projection pooling as a novel component of the Split model and grid pooling as a novel part of the Merge model. While most Fully Convolutional Networks rely on local evidence, these unique pooling regions allow our models to take advantage of the global table structure. We achieve state-of-the-art performance on the public ICDAR 2013 Table Competition dataset of PDF documents. On a much larger private dataset which we used to train the models, we significantly outperform both a state-ofthe-art deep model and a major commercial software system.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133812336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Improving Text Recognition using Optical and Language Model Writer Adaptation 利用光学和语言模型书写器自适应改进文本识别
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00190
Yann Soullard, Wassim Swaileh, Pierrick Tranouez, T. Paquet, Clément Chatelain
State-of-the-art methods for handwriting text recognition are based on deep learning approaches and language modeling that require large data sets during training. In practice, there are some applications where the system processes mono-writer documents, and would thus benefit from being trained on examples from that writer. However, this is not common to have numerous examples coming from just one writer. In this paper, we propose an approach to adapt both the optical model and the language model to a particular writer, from a generic system trained on large data sets with a variety of examples. We show the benefits of the optical and language model writer adaptation. Our approach reaches competitive results on the READ 2018 data set, which is dedicated to model adaptation to particular writers.
最先进的手写文本识别方法基于深度学习方法和语言建模,这些方法在训练期间需要大量数据集。在实践中,有一些应用程序,其中系统处理单作者文档,因此将受益于来自该作者的示例的训练。然而,这是不常见的,有大量的例子来自一个作家。在本文中,我们提出了一种方法,使光学模型和语言模型适应于特定的作者,从一个通用系统训练的大型数据集与各种各样的例子。我们展示了视觉和语言模式作家适应的好处。我们的方法在READ 2018数据集上取得了有竞争力的结果,该数据集致力于对特定作家的模型适应。
{"title":"Improving Text Recognition using Optical and Language Model Writer Adaptation","authors":"Yann Soullard, Wassim Swaileh, Pierrick Tranouez, T. Paquet, Clément Chatelain","doi":"10.1109/ICDAR.2019.00190","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00190","url":null,"abstract":"State-of-the-art methods for handwriting text recognition are based on deep learning approaches and language modeling that require large data sets during training. In practice, there are some applications where the system processes mono-writer documents, and would thus benefit from being trained on examples from that writer. However, this is not common to have numerous examples coming from just one writer. In this paper, we propose an approach to adapt both the optical model and the language model to a particular writer, from a generic system trained on large data sets with a variety of examples. We show the benefits of the optical and language model writer adaptation. Our approach reaches competitive results on the READ 2018 data set, which is dedicated to model adaptation to particular writers.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130365346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Learning Free Line Detection in Manuscripts using Distance Transform Graph 利用距离变换图学习手稿中的自由线检测
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00044
M. Kassis, Jihad El-Sana
We present a fully automated learning free method, for line detection in manuscripts. We begin by separating components that span over multiple lines, then we remove noise, and small connected components such as diacritics. We apply a distance transform on the image to create the image skeleton. The skeleton is pruned, its vertexes and edges are detected, in order to generate the initial document graph. We calculate the vertex v-score using its t-score and l-score quantifying its distance from being an absolute link in a line. In a greedy manner we classify each edge in the graph either a link, a bridge or a conflict edge. We merge every two edges classified as link together, then we merge the conflict edges next. Finally we remove the bridge edges from the graph generating the final form of the graph. Each edge in the graph equals to one extracted line. We applied the method on the DIVA-hisDB dataset on both public and private sections. The public section participated in the recently conducted Layout Analysis for Challenging Medieval Manuscripts Competition, and we have achieved results surpassing the vast majority of these systems.
我们提出了一种完全自动化的免费学习方法,用于手稿中的线条检测。我们首先分离跨多条线的组件,然后去除噪声和小的连接组件,如变音符号。我们在图像上应用距离变换来创建图像骨架。对骨架进行剪枝,检测其顶点和边缘,从而生成初始文档图。我们使用顶点的t-score和l-score来计算顶点的v-score,它们量化了顶点与直线上绝对链接的距离。以贪婪的方式,我们将图中的每条边分类为链接、桥或冲突边。我们先合并每两条被分类为连接的边,然后再合并冲突的边。最后,我们从图中移除桥边,生成图的最终形式。图中的每条边等于一条提取的线。我们在公共和私有部分的DIVA-hisDB数据集上应用了该方法。公共部门参加了最近进行的中世纪手稿挑战布局分析比赛,我们取得了超过绝大多数这些系统的成绩。
{"title":"Learning Free Line Detection in Manuscripts using Distance Transform Graph","authors":"M. Kassis, Jihad El-Sana","doi":"10.1109/ICDAR.2019.00044","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00044","url":null,"abstract":"We present a fully automated learning free method, for line detection in manuscripts. We begin by separating components that span over multiple lines, then we remove noise, and small connected components such as diacritics. We apply a distance transform on the image to create the image skeleton. The skeleton is pruned, its vertexes and edges are detected, in order to generate the initial document graph. We calculate the vertex v-score using its t-score and l-score quantifying its distance from being an absolute link in a line. In a greedy manner we classify each edge in the graph either a link, a bridge or a conflict edge. We merge every two edges classified as link together, then we merge the conflict edges next. Finally we remove the bridge edges from the graph generating the final form of the graph. Each edge in the graph equals to one extracted line. We applied the method on the DIVA-hisDB dataset on both public and private sections. The public section participated in the recently conducted Layout Analysis for Challenging Medieval Manuscripts Competition, and we have achieved results surpassing the vast majority of these systems.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115092749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Multi-oriented Chinese Keyword Spotter Guided by Text Line Detection 基于文本行检测的多方向中文关键词定位器
Pub Date : 2019-09-01 DOI: 10.1109/icdar.2019.00112
Pei Xu, Shan Huang, Hongzhen Wang, Hao Song, Shen Huang, Qi Ju
Chinese keyword spotting is a challenging task as there is no visual blank for Chinese words. Different from English words which are split naturally by visual blanks, Chinese words are generally split only by semantic information. In this paper, we propose a new Chinese keyword spotter for natural images, which is inspired by Mask R-CNN. We propose to predict the keyword masks guided by text line detection. Firstly, proposals of text lines are generated by Faster R-CNN; Then, text line masks and keyword masks are predicted by segmentation in the proposals. In this way, the text lines and keywords are predicted in parallel. We create two Chinese keyword datasets based on RCTW-17 and ICPR MTWI2018 to verify the effectiveness of our method.
中文关键词识别是一项具有挑战性的任务,因为中文单词没有视觉空白。不同于英语词汇是通过视觉空白自然分割的,汉语词汇一般只通过语义信息进行分割。在本文中,我们提出了一种新的自然图像中文关键词识别器,该识别器的灵感来自Mask R-CNN。我们建议通过文本行检测来预测关键字掩码。首先,使用Faster R-CNN生成文本行建议;然后,对文本行掩码和关键字掩码进行分割预测。通过这种方式,文本行和关键字被并行预测。我们基于RCTW-17和ICPR MTWI2018创建了两个中文关键字数据集来验证我们方法的有效性。
{"title":"A Multi-oriented Chinese Keyword Spotter Guided by Text Line Detection","authors":"Pei Xu, Shan Huang, Hongzhen Wang, Hao Song, Shen Huang, Qi Ju","doi":"10.1109/icdar.2019.00112","DOIUrl":"https://doi.org/10.1109/icdar.2019.00112","url":null,"abstract":"Chinese keyword spotting is a challenging task as there is no visual blank for Chinese words. Different from English words which are split naturally by visual blanks, Chinese words are generally split only by semantic information. In this paper, we propose a new Chinese keyword spotter for natural images, which is inspired by Mask R-CNN. We propose to predict the keyword masks guided by text line detection. Firstly, proposals of text lines are generated by Faster R-CNN; Then, text line masks and keyword masks are predicted by segmentation in the proposals. In this way, the text lines and keywords are predicted in parallel. We create two Chinese keyword datasets based on RCTW-17 and ICPR MTWI2018 to verify the effectiveness of our method.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115228273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Genetic-Based Search for Adaptive Table Recognition in Spreadsheets 电子表格中自适应表识别的遗传搜索
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00206
Elvis Koci, Maik Thiele, Oscar Romero, Wolfgang Lehner
Spreadsheets are very successful content generation tools, used in almost every enterprise to create a wealth of information. However, this information is often intermingled with various formatting, layout, and textual metadata, making it hard to identify and interpret the tabular payload. Previous works proposed to solve this problem by mainly using heuristics. Although fast to implement, these approaches fail to capture the high variability of user-generated spreadsheet tables. Therefore, in this paper, we propose a supervised approach that is able to adapt to arbitrary spreadsheet datasets. We use a graph model to represent the contents of a sheet, which carries layout and spatial features. Subsequently, we apply genetic-based approaches for graph partitioning, to recognize the parts of the graph corresponding to tables in the sheet. The search for tables is guided by an objective function, which is tuned to match the specific characteristics of a given dataset. We present the feasibility of this approach with an experimental evaluation, on a large, real-world spreadsheet corpus.
电子表格是非常成功的内容生成工具,几乎每个企业都使用它来创建丰富的信息。然而,这些信息经常与各种格式、布局和文本元数据混杂在一起,使得难以识别和解释表格有效负载。以往的研究主要是利用启发式方法来解决这个问题。尽管实现速度很快,但这些方法无法捕捉到用户生成的电子表格的高度可变性。因此,在本文中,我们提出了一种能够适应任意电子表格数据集的监督方法。我们使用图形模型来表示工作表的内容,其中包含布局和空间特征。随后,我们应用基于遗传的方法进行图划分,以识别与表中表对应的图的部分。表的搜索由一个目标函数指导,该目标函数被调优以匹配给定数据集的特定特征。我们提出了这种方法的可行性与实验评估,在一个大的,现实世界的电子表格语料库。
{"title":"A Genetic-Based Search for Adaptive Table Recognition in Spreadsheets","authors":"Elvis Koci, Maik Thiele, Oscar Romero, Wolfgang Lehner","doi":"10.1109/ICDAR.2019.00206","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00206","url":null,"abstract":"Spreadsheets are very successful content generation tools, used in almost every enterprise to create a wealth of information. However, this information is often intermingled with various formatting, layout, and textual metadata, making it hard to identify and interpret the tabular payload. Previous works proposed to solve this problem by mainly using heuristics. Although fast to implement, these approaches fail to capture the high variability of user-generated spreadsheet tables. Therefore, in this paper, we propose a supervised approach that is able to adapt to arbitrary spreadsheet datasets. We use a graph model to represent the contents of a sheet, which carries layout and spatial features. Subsequently, we apply genetic-based approaches for graph partitioning, to recognize the parts of the graph corresponding to tables in the sheet. The search for tables is guided by an objective function, which is tuned to match the specific characteristics of a given dataset. We present the feasibility of this approach with an experimental evaluation, on a large, real-world spreadsheet corpus.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115633820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images tableet:端到端表检测和扫描文档图像表数据提取的深度学习模型
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00029
Shubham Paliwal, D. Vishwanath, R. Rahul, Monika Sharma, L. Vig
With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices is becoming more acute. A major hurdle to this objective is that these images often contain information in the form of tables and extracting data from tabular sub-images presents a unique set of challenges. This includes accurate detection of the tabular region within an image, and subsequently detecting and extracting information from the rows and columns of the detected table. While some progress has been made in table detection, extracting the table contents is still a challenge since this involves more fine grained table structure(rows & columns) recognition. Prior approaches have attempted to solve the table detection and structure recognition problems independently using two separate models. In this paper, we propose TableNet: a novel end-to-end deep learning model for both table detection and structure recognition. The model exploits the interdependence between the twin tasks of table detection and table structure recognition to segment out the table and column regions. This is followed by semantic rule-based row extraction from the identified tabular sub-regions. The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets obtaining state of the art results. Additionally, we demonstrate that feeding additional semantic features further improves model performance and that the model exhibits transfer learning across datasets. Another contribution of this paper is to provide additional table structure annotations for the Marmot data, which currently only has annotations for table detection.
随着移动电话和扫描仪被广泛用于拍摄和上传文件,从零售收据、保险索赔表格和财务发票等非结构化文件图像中提取信息的需求变得越来越迫切。实现这一目标的一个主要障碍是,这些图像通常包含表格形式的信息,从表格子图像中提取数据面临一系列独特的挑战。这包括对图像中的表格区域进行准确检测,然后从检测到的表格的行和列中检测和提取信息。虽然在表检测方面取得了一些进展,但提取表内容仍然是一个挑战,因为这涉及到更细粒度的表结构(行和列)识别。先前的方法试图使用两个独立的模型分别解决表检测和结构识别问题。在本文中,我们提出了TableNet:一种新颖的端到端深度学习模型,用于表检测和结构识别。该模型利用表检测和表结构识别这两个任务之间的相互依赖关系,分割出表和列区域。然后从已识别的表格子区域中进行基于语义规则的行提取。在公开可用的ICDAR 2013和Marmot Table数据集上对所提出的模型和提取方法进行了评估,获得了最先进的结果。此外,我们证明了提供额外的语义特征进一步提高了模型的性能,并且模型显示了跨数据集的迁移学习。本文的另一个贡献是为Marmot数据提供了额外的表结构注释,目前Marmot数据只有表检测注释。
{"title":"TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images","authors":"Shubham Paliwal, D. Vishwanath, R. Rahul, Monika Sharma, L. Vig","doi":"10.1109/ICDAR.2019.00029","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00029","url":null,"abstract":"With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices is becoming more acute. A major hurdle to this objective is that these images often contain information in the form of tables and extracting data from tabular sub-images presents a unique set of challenges. This includes accurate detection of the tabular region within an image, and subsequently detecting and extracting information from the rows and columns of the detected table. While some progress has been made in table detection, extracting the table contents is still a challenge since this involves more fine grained table structure(rows & columns) recognition. Prior approaches have attempted to solve the table detection and structure recognition problems independently using two separate models. In this paper, we propose TableNet: a novel end-to-end deep learning model for both table detection and structure recognition. The model exploits the interdependence between the twin tasks of table detection and table structure recognition to segment out the table and column regions. This is followed by semantic rule-based row extraction from the identified tabular sub-regions. The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets obtaining state of the art results. Additionally, we demonstrate that feeding additional semantic features further improves model performance and that the model exhibits transfer learning across datasets. Another contribution of this paper is to provide additional table structure annotations for the Marmot data, which currently only has annotations for table detection.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114382703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 107
Blind Source Separation Based Framework for Multispectral Document Images Binarization 基于盲源分离的多光谱文档图像二值化框架
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00237
Abderrahmane Rahiche, A. Bakhta, M. Cheriet
In this paper, we propose a novel Blind Source Separation (BSS) based framework for multispectral (MS) document images binarization. This framework takes advantage of the multidimensional data representation of MS images and makes use of the Graph regularized Non-negative Matrix Factorization (GNMF) to decompose MS document images into their different constituting components, i.e., foreground (text, ink), background (paper, parchment), degradation information, etc. The proposed framework is validated on two different real-world data sets of manuscript images showing a high capability of dealing with: variable numbers of bands regardless of the acquisition protocol, different types of degradations, and illumination non-uniformity while outperforming the results reported in the state-of-the-art. Although the focus was put on the binary separation (i.e., foreground/background), the proposed framework is also used for the decomposition of document images into different components, i.e., background, text, and degradation, which allows full sources separation, whereby further analysis and characterization of each component can be possible. A comparative study is performed using Independent Component Analysis (ICA) and Principal Component Analysis (PCA) methods. Our framework is also validated on another third dataset of MS images of natural objects to demonstrate its generalizability beyond document samples.
本文提出了一种基于盲源分离(BSS)的多光谱文档图像二值化框架。该框架利用MS图像的多维数据表示,利用图正则化非负矩阵分解(GNMF)将MS文档图像分解为其不同的构成成分,即前景(文本、墨水)、背景(纸张、羊皮纸)、退化信息等。所提出的框架在两种不同的真实世界手稿图像数据集上进行了验证,显示出高处理能力:无论采集协议如何,可变数量的波段,不同类型的退化和光照不均匀性,同时优于最新技术报告的结果。虽然重点放在二值分离(即前景/背景)上,但提议的框架也用于将文档图像分解为不同的组件,即背景、文本和退化,这允许完整的源分离,从而可以进一步分析和表征每个组件。使用独立成分分析(ICA)和主成分分析(PCA)方法进行了比较研究。我们的框架还在另一个自然物体的MS图像数据集上进行了验证,以证明其超越文档样本的泛化性。
{"title":"Blind Source Separation Based Framework for Multispectral Document Images Binarization","authors":"Abderrahmane Rahiche, A. Bakhta, M. Cheriet","doi":"10.1109/ICDAR.2019.00237","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00237","url":null,"abstract":"In this paper, we propose a novel Blind Source Separation (BSS) based framework for multispectral (MS) document images binarization. This framework takes advantage of the multidimensional data representation of MS images and makes use of the Graph regularized Non-negative Matrix Factorization (GNMF) to decompose MS document images into their different constituting components, i.e., foreground (text, ink), background (paper, parchment), degradation information, etc. The proposed framework is validated on two different real-world data sets of manuscript images showing a high capability of dealing with: variable numbers of bands regardless of the acquisition protocol, different types of degradations, and illumination non-uniformity while outperforming the results reported in the state-of-the-art. Although the focus was put on the binary separation (i.e., foreground/background), the proposed framework is also used for the decomposition of document images into different components, i.e., background, text, and degradation, which allows full sources separation, whereby further analysis and characterization of each component can be possible. A comparative study is performed using Independent Component Analysis (ICA) and Principal Component Analysis (PCA) methods. Our framework is also validated on another third dataset of MS images of natural objects to demonstrate its generalizability beyond document samples.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114442377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2019 International Conference on Document Analysis and Recognition (ICDAR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1