首页 > 最新文献

2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献

英文 中文
Online Writer Identification using GMM Based Feature Representation and Writer-Specific Weights 基于GMM的特征表示和作者特定权重的在线作者识别
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00124
V. Venugopal, S. Sundaram
This paper focuses on a method to ascertain the identity of an online handwritten document. The proposed methodology makes use of a set of descriptors that are derived from features obtained in a probabilistic sense. In this regard, we employ a GMM-based feature representation where in each point-based feature vector in the online trace is represented by a vector. Each element of the aforementioned vector quantify the membership to a particular Gaussian in the GMM. A differing aspect is in the proposal of a weighting scheme that measures the influence of each Gaussian of a writer in the probabilistic space. For deriving these weights, we rely on the information obtained from a histogram, by formulating a function of the sum-pooled posterior probabilities obtained across all the enrolled documents in the database. The identification is performed by an ensemble of SVMs where each SVM is modelled for a given writer. The experiments are performed on the publicly available IAM Online handwriting database and the results are competitive with respect to prior works in literature.
本文研究了一种确定在线手写文档身份的方法。所提出的方法利用了一组描述符,这些描述符是从概率意义上获得的特征中派生出来的。在这方面,我们采用基于gmm的特征表示,其中在线轨迹中每个基于点的特征向量由一个向量表示。上述向量的每个元素量化了GMM中特定高斯的隶属度。一个不同的方面是提出了一种加权方案,该方案测量作者在概率空间中的每个高斯分布的影响。为了获得这些权重,我们依赖于从直方图中获得的信息,通过制定数据库中所有登记文档中获得的后验概率之和的函数。识别由支持向量机的集合执行,其中每个支持向量机都为给定的编写器建模。实验是在公开可用的IAM在线手写数据库上进行的,其结果与文献中先前的作品相比具有竞争力。
{"title":"Online Writer Identification using GMM Based Feature Representation and Writer-Specific Weights","authors":"V. Venugopal, S. Sundaram","doi":"10.1109/ICDAR.2019.00124","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00124","url":null,"abstract":"This paper focuses on a method to ascertain the identity of an online handwritten document. The proposed methodology makes use of a set of descriptors that are derived from features obtained in a probabilistic sense. In this regard, we employ a GMM-based feature representation where in each point-based feature vector in the online trace is represented by a vector. Each element of the aforementioned vector quantify the membership to a particular Gaussian in the GMM. A differing aspect is in the proposal of a weighting scheme that measures the influence of each Gaussian of a writer in the probabilistic space. For deriving these weights, we rely on the information obtained from a histogram, by formulating a function of the sum-pooled posterior probabilities obtained across all the enrolled documents in the database. The identification is performed by an ensemble of SVMs where each SVM is modelled for a given writer. The experiments are performed on the publicly available IAM Online handwriting database and the results are competitive with respect to prior works in literature.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121315456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
TH-GAN: Generative Adversarial Network Based Transfer Learning for Historical Chinese Character Recognition 基于生成对抗网络的历史汉字识别迁移学习
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00037
Junyang Cai, Liangrui Peng, Yejun Tang, Changsong Liu, Pengchao Li
Historical Chinese character recognition faces problems including low image quality and lack of labeled training samples. We propose a generative adversarial network (GAN) based transfer learning method to ease these problems. The proposed TH-GAN architecture includes a discriminator and a generator. The network structure of the discriminator is based on a convolutional neural network (CNN). Inspired by Wasserstein GAN, the loss function of the discriminator aims to measure the probabilistic distribution distance of the generated images and the target images. The network structure of the generator is a CNN based encoder-decoder. The loss function of the generator aims to minimize the distribution distance between the real samples and the generated samples. In order to preserve the complex glyph structure of a historical Chinese character, a weighted mean squared error (MSE) criterion by incorporating both the edge and the skeleton information in the ground truth image is proposed as the weighted pixel loss in the generator. These loss functions are used for joint training of the discriminator and the generator. Experiments are conducted on two tasks to evaluate the performance of the proposed TH-GAN. The first task is carried out on style transfer mapping for multi-font printed traditional Chinese character samples. The second task is carried out on transfer learning for historical Chinese character samples by adding samples generated by TH-GAN. Experimental results show that the proposed TH-GAN is effective.
历史汉字识别面临图像质量低、缺乏标记训练样本等问题。我们提出了一种基于生成对抗网络(GAN)的迁移学习方法来缓解这些问题。提出的TH-GAN结构包括一个鉴别器和一个发生器。鉴别器的网络结构基于卷积神经网络(CNN)。受Wasserstein GAN的启发,鉴别器的损失函数旨在度量生成图像与目标图像的概率分布距离。发生器的网络结构是基于CNN的编码器-解码器。生成器的损失函数旨在使真实样本与生成样本之间的分布距离最小。为了保留历史汉字复杂的字形结构,提出了一种结合真实图像边缘和骨架信息的加权均方误差(MSE)准则,作为生成器中的加权像素损失。这些损失函数用于鉴别器和生成器的联合训练。在两个任务上进行了实验来评估所提出的TH-GAN的性能。第一个任务是对多字体印刷繁体字样本进行样式迁移映射。第二项任务是通过添加由TH-GAN生成的样本,对历史汉字样本进行迁移学习。实验结果表明,所提出的TH-GAN是有效的。
{"title":"TH-GAN: Generative Adversarial Network Based Transfer Learning for Historical Chinese Character Recognition","authors":"Junyang Cai, Liangrui Peng, Yejun Tang, Changsong Liu, Pengchao Li","doi":"10.1109/ICDAR.2019.00037","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00037","url":null,"abstract":"Historical Chinese character recognition faces problems including low image quality and lack of labeled training samples. We propose a generative adversarial network (GAN) based transfer learning method to ease these problems. The proposed TH-GAN architecture includes a discriminator and a generator. The network structure of the discriminator is based on a convolutional neural network (CNN). Inspired by Wasserstein GAN, the loss function of the discriminator aims to measure the probabilistic distribution distance of the generated images and the target images. The network structure of the generator is a CNN based encoder-decoder. The loss function of the generator aims to minimize the distribution distance between the real samples and the generated samples. In order to preserve the complex glyph structure of a historical Chinese character, a weighted mean squared error (MSE) criterion by incorporating both the edge and the skeleton information in the ground truth image is proposed as the weighted pixel loss in the generator. These loss functions are used for joint training of the discriminator and the generator. Experiments are conducted on two tasks to evaluate the performance of the proposed TH-GAN. The first task is carried out on style transfer mapping for multi-font printed traditional Chinese character samples. The second task is carried out on transfer learning for historical Chinese character samples by adding samples generated by TH-GAN. Experimental results show that the proposed TH-GAN is effective.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121331539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Zero Shot Learning Based Script Identification in the Wild 零射击学习基于脚本识别在野外
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00162
Prateek Keserwani, K. De, P. Roy, U. Pal
The text recognition system for natural images or video frames containing multilingual text needs a method to first identify the written script and then recognize the word in the identified script. However, the occurrence of some scripts is rare as compared to others. Due to the availability of a few samples of the rare script, the supervised learning of the deep neural networks is difficult. To overcome this problem, we have proposed a zero-shot learning based method for script identification. We have also proposed architecture for script identification which fuses the global feature vector and the semantic embedding vector. The semantic embedding of the script is obtained by using the spatial dependency of the stroke's sequence via the recurrent neural network. The proposed architecture shows superior results as compared to the baseline approaches.
对于包含多语言文本的自然图像或视频帧的文本识别系统,需要一种首先识别书面文字,然后识别被识别文字中的单词的方法。然而,与其他脚本相比,某些脚本的出现是罕见的。由于稀有脚本样本的有限性,深度神经网络的监督学习是困难的。为了克服这个问题,我们提出了一种基于零射击学习的脚本识别方法。我们还提出了融合全局特征向量和语义嵌入向量的脚本识别体系结构。通过递归神经网络,利用笔画序列的空间依赖关系,获得文字的语义嵌入。与基线方法相比,所建议的体系结构显示出更好的结果。
{"title":"Zero Shot Learning Based Script Identification in the Wild","authors":"Prateek Keserwani, K. De, P. Roy, U. Pal","doi":"10.1109/ICDAR.2019.00162","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00162","url":null,"abstract":"The text recognition system for natural images or video frames containing multilingual text needs a method to first identify the written script and then recognize the word in the identified script. However, the occurrence of some scripts is rare as compared to others. Due to the availability of a few samples of the rare script, the supervised learning of the deep neural networks is difficult. To overcome this problem, we have proposed a zero-shot learning based method for script identification. We have also proposed architecture for script identification which fuses the global feature vector and the semantic embedding vector. The semantic embedding of the script is obtained by using the spatial dependency of the stroke's sequence via the recurrent neural network. The proposed architecture shows superior results as compared to the baseline approaches.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128883314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An Interactive and Generative Approach for Chinese Shanshui Painting Document 中国山水绘画文献的互动生成方法
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00136
Aven Le Zhou, Qiu-Feng Wang, Kaizhu Huang, C. Lo
Chinese Shanshui is a landscape painting document mainly drawing mountain and water, which is popular in Chinese culture. However, it is very challenging to create this by general people. In this paper, we propose an interactive and generative approach to automatically generate the Chinese Shanshui painting documents based on users' input, where the users only need to sketch simple lines to represent their ideal landscape without any professional Shanshui painting skills. This sketch-to-Shanshui translation is optimized by the model of cycle Generative Adversarial Networks (GAN). To evaluate the proposed approach, we collected a large set of both sketch data and Chinese Shanshui painting data to train the model of cycle-GAN, and developed an interactive system called Shanshui-DaDA (i.e., Design and Draw with AI) to generate Chinese Shanshui painting documents in real-time. The experimental results show that this system can generate satisfied Chinese Shanshui painting documents by general users.
中国山水是一种以山水画为主的山水画文献,在中国文化中很流行。然而,这是非常具有挑战性的创造一般人。在本文中,我们提出了一种基于用户输入的交互式生成方法来自动生成中国山水绘画文档,用户无需任何专业的山水绘画技能,只需要绘制简单的线条来代表他们理想的风景。利用循环生成对抗网络(GAN)模型对草图到山水的翻译进行了优化。为了评估所提出的方法,我们收集了大量草图数据和中国山水画数据来训练循环gan模型,并开发了一个名为山水dada(即AI设计和绘制)的交互系统来实时生成中国山水画文档。实验结果表明,该系统能够生成一般用户满意的中国山水绘画文档。
{"title":"An Interactive and Generative Approach for Chinese Shanshui Painting Document","authors":"Aven Le Zhou, Qiu-Feng Wang, Kaizhu Huang, C. Lo","doi":"10.1109/ICDAR.2019.00136","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00136","url":null,"abstract":"Chinese Shanshui is a landscape painting document mainly drawing mountain and water, which is popular in Chinese culture. However, it is very challenging to create this by general people. In this paper, we propose an interactive and generative approach to automatically generate the Chinese Shanshui painting documents based on users' input, where the users only need to sketch simple lines to represent their ideal landscape without any professional Shanshui painting skills. This sketch-to-Shanshui translation is optimized by the model of cycle Generative Adversarial Networks (GAN). To evaluate the proposed approach, we collected a large set of both sketch data and Chinese Shanshui painting data to train the model of cycle-GAN, and developed an interactive system called Shanshui-DaDA (i.e., Design and Draw with AI) to generate Chinese Shanshui painting documents in real-time. The experimental results show that this system can generate satisfied Chinese Shanshui painting documents by general users.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131636914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
DeepText: Detecting Text from the Wild with Multi-ASPP-Assembled DeepLab DeepText:使用多asp组装的DeepLab从野外检测文本
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00042
Qingqing Wang, W. Jia, Xiangjian He, Yue Lu, M. Blumenstein, Ye Huang, Shujing Lyu
In this paper, we address the issue of scene text detection in the way of direct regression and successfully adapt an effective semantic segmentation model, DeepLab v3+ [1], for this application. In order to handle texts with arbitrary orientations and sizes and improve the recall of small texts, we propose to extract features of multiple scales by inserting multiple Atrous Spatial Pyramid Pooling (ASPP) layers to the DeepLab after the feature maps with different resolutions. Then, we set multiple auxiliary IoU losses at the decoding stage and make auxiliary connections from the intermediate encoding layers to the decoder to assist network training and enhance the discrimination ability of lower encoding layers. Experiments conducted on the benchmark scene text dataset ICDAR2015 demonstrate the superior performance of our proposed network, named as DeepText, over the state-of-the-art approaches.
在本文中,我们以直接回归的方式解决了场景文本检测问题,并成功地为该应用采用了有效的语义分割模型DeepLab v3+[1]。为了处理任意方向和大小的文本,提高小文本的召回率,我们提出在不同分辨率的特征图之后,在DeepLab中插入多个空间金字塔池(ASPP)层来提取多尺度的特征。然后,我们在解码阶段设置多个辅助IoU损耗,并在中间编码层与解码器之间进行辅助连接,以辅助网络训练,增强较低编码层的识别能力。在基准场景文本数据集ICDAR2015上进行的实验表明,我们提出的网络(称为DeepText)优于最先进的方法。
{"title":"DeepText: Detecting Text from the Wild with Multi-ASPP-Assembled DeepLab","authors":"Qingqing Wang, W. Jia, Xiangjian He, Yue Lu, M. Blumenstein, Ye Huang, Shujing Lyu","doi":"10.1109/ICDAR.2019.00042","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00042","url":null,"abstract":"In this paper, we address the issue of scene text detection in the way of direct regression and successfully adapt an effective semantic segmentation model, DeepLab v3+ [1], for this application. In order to handle texts with arbitrary orientations and sizes and improve the recall of small texts, we propose to extract features of multiple scales by inserting multiple Atrous Spatial Pyramid Pooling (ASPP) layers to the DeepLab after the feature maps with different resolutions. Then, we set multiple auxiliary IoU losses at the decoding stage and make auxiliary connections from the intermediate encoding layers to the decoder to assist network training and enhance the discrimination ability of lower encoding layers. Experiments conducted on the benchmark scene text dataset ICDAR2015 demonstrate the superior performance of our proposed network, named as DeepText, over the state-of-the-art approaches.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131079124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ICDAR 2019 Competition on Harvesting Raw Tables from Infographics (CHART-Infographics) ICDAR 2019从信息图表中获取原始表格竞赛(CHART-Infographics)
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00203
Kenny Davila, B. Kota, S. Setlur, V. Govindaraju, Chris Tensmeyer, Sumit Shekhar, Ritwick Chaudhry
This work summarizes the results of the first Competition on Harvesting Raw Tables from Infographics (ICDAR 2019 CHART-Infographics). The complex process of automatic chart recognition is divided into multiple tasks for the purpose of this competition, including Chart Image Classification (Task 1), Text Detection and Recognition (Task 2), Text Role Classification (Task 3), Axis Analysis (Task 4), Legend Analysis (Task 5), Plot Element Detection and Classification (Task 6.a), Data Extraction (Task 6.b), and End-to-End Data Extraction (Task 7). We provided a large synthetic training set and evaluated submitted systems using newly proposed metrics on both synthetic charts and manually-annotated real charts taken from scientific literature. A total of 8 groups registered for the competition out of which 5 submitted results for tasks 1-5. The results show that some tasks can be performed highly accurately on synthetic data, but all systems did not perform as well on real world charts. The data, annotation tools, and evaluation scripts have been publicly released for academic use.
这项工作总结了第一届从信息图表中获取原始表格竞赛(ICDAR 2019 CHART-Infographics)的结果。本次比赛将复杂的自动图表识别过程分为多个任务,包括图表图像分类(任务1)、文本检测与识别(任务2)、文本角色分类(任务3)、轴分析(任务4)、图例分析(任务5)、情节元素检测与分类(任务6.a)、数据提取(任务6.b)、和端到端数据提取(任务7)。我们提供了一个大型的合成训练集,并使用合成图表和取自科学文献的手动注释的真实图表上新提出的指标来评估提交的系统。共有8个小组报名参加比赛,其中5个小组提交了任务1-5的结果。结果表明,有些任务可以在合成数据上非常准确地执行,但所有系统在真实世界的图表上的表现都不尽如人意。数据、注释工具和评估脚本已经公开发布,供学术使用。
{"title":"ICDAR 2019 Competition on Harvesting Raw Tables from Infographics (CHART-Infographics)","authors":"Kenny Davila, B. Kota, S. Setlur, V. Govindaraju, Chris Tensmeyer, Sumit Shekhar, Ritwick Chaudhry","doi":"10.1109/ICDAR.2019.00203","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00203","url":null,"abstract":"This work summarizes the results of the first Competition on Harvesting Raw Tables from Infographics (ICDAR 2019 CHART-Infographics). The complex process of automatic chart recognition is divided into multiple tasks for the purpose of this competition, including Chart Image Classification (Task 1), Text Detection and Recognition (Task 2), Text Role Classification (Task 3), Axis Analysis (Task 4), Legend Analysis (Task 5), Plot Element Detection and Classification (Task 6.a), Data Extraction (Task 6.b), and End-to-End Data Extraction (Task 7). We provided a large synthetic training set and evaluated submitted systems using newly proposed metrics on both synthetic charts and manually-annotated real charts taken from scientific literature. A total of 8 groups registered for the competition out of which 5 submitted results for tasks 1-5. The results show that some tasks can be performed highly accurately on synthetic data, but all systems did not perform as well on real world charts. The data, annotation tools, and evaluation scripts have been publicly released for academic use.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134010434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Parameter-Free Table Detection Method 无参数表检测方法
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00079
Laiphangbam Melinda, C. Bhagvati
In this paper, we propose two parameter-free table detection methods: one for the closed tables and other for open tables. The unifying idea is multigaussian analysis. Multigaussian analysis of text height histograms classifies the document content into text and non-text blocks. Closed tables are classified as non-text and their identification from the non-text blocks is similar to many earlier methods that remove the separators. We do not need any parameters to identify rows and columns and discriminate them from text blocks because of multigaussian analysis. Open tables are initially classified as text blocks and are detected by extending the multigaussian analysis to the heights and widths of text blocks. The text-blocks are grouped into three categories by multigaussian analysis. These groups are used to classify table cells and distinguish them from text blocks. Table blocks are merged to obtain the table region. Evaluation on various Indic script newspapers and ICDAR2013 table competition dataset shows that our methods achieve more than 90% in table recognition. The strength of our algorithm is that it is a parameter-free approach and requires no training dataset.
本文提出了两种无参数表检测方法:一种用于封闭表,另一种用于开放表。统一的思想是多高斯分析。文本高度直方图的多高斯分析将文档内容分为文本块和非文本块。封闭表被归类为非文本表,它们与非文本块的识别类似于许多早期删除分隔符的方法。由于多高斯分析,我们不需要任何参数来识别行和列,并将它们与文本块区分开来。打开的表最初被分类为文本块,并通过将多高斯分析扩展到文本块的高度和宽度来检测。通过多高斯分析将文本块分为三类。这些组用于对表格单元格进行分类,并将它们与文本块区分开来。合并表块以获得表区域。对各种印度文字报纸和ICDAR2013表竞赛数据集的评估表明,我们的方法在表识别方面达到了90%以上。我们算法的优势在于它是一种无参数的方法,不需要训练数据集。
{"title":"Parameter-Free Table Detection Method","authors":"Laiphangbam Melinda, C. Bhagvati","doi":"10.1109/ICDAR.2019.00079","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00079","url":null,"abstract":"In this paper, we propose two parameter-free table detection methods: one for the closed tables and other for open tables. The unifying idea is multigaussian analysis. Multigaussian analysis of text height histograms classifies the document content into text and non-text blocks. Closed tables are classified as non-text and their identification from the non-text blocks is similar to many earlier methods that remove the separators. We do not need any parameters to identify rows and columns and discriminate them from text blocks because of multigaussian analysis. Open tables are initially classified as text blocks and are detected by extending the multigaussian analysis to the heights and widths of text blocks. The text-blocks are grouped into three categories by multigaussian analysis. These groups are used to classify table cells and distinguish them from text blocks. Table blocks are merged to obtain the table region. Evaluation on various Indic script newspapers and ICDAR2013 table competition dataset shows that our methods achieve more than 90% in table recognition. The strength of our algorithm is that it is a parameter-free approach and requires no training dataset.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"17 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134105087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Two Stream Deep Network for Document Image Classification 文档图像分类的二流深度网络
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00227
M. Asim, Muhammad Usman Ghani Khan, M. I. Malik, K. Razzaque, A. Dengel, Sheraz Ahmed
This paper presents a novel two-stream approach for document image classification. The proposed approach leverages textual and visual modalities to classify document images into ten categories, including letter, memo, news article, etc. In order to alleviate dependency of textual stream on performance of underlying OCR (which is the case with general content based document image classifiers), we utilize a filter based feature-ranking algorithm. This algorithm ranks the features of each class based on their ability to discriminate document images and selects a set of top 'K' features that are retained for further processing. In parallel, the visual stream uses deep CNN models to extract structural features of document images.Finally, textual and visual streams are concatenated together using an average ensembling method. Experimental results reveal that the proposed approach outperforms the state-of-the-art system with a significant margin of 4.5% on publicly available Tobacco-3482 dataset.
提出了一种新的双流文档图像分类方法。该方法利用文本和视觉模式将文档图像分为十类,包括信件、备忘录、新闻文章等。为了减轻文本流对底层OCR性能的依赖(这是一般基于内容的文档图像分类器的情况),我们使用了基于过滤器的特征排序算法。该算法根据每个类别区分文档图像的能力对其特征进行排名,并选择一组最重要的“K”特征,这些特征将被保留以供进一步处理。同时,视觉流使用深度CNN模型提取文档图像的结构特征。最后,使用平均集成方法将文本流和视觉流连接在一起。实验结果表明,在公开可获得的Tobacco-3482数据集上,所提出的方法优于最先进的系统,其显著幅度为4.5%。
{"title":"Two Stream Deep Network for Document Image Classification","authors":"M. Asim, Muhammad Usman Ghani Khan, M. I. Malik, K. Razzaque, A. Dengel, Sheraz Ahmed","doi":"10.1109/ICDAR.2019.00227","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00227","url":null,"abstract":"This paper presents a novel two-stream approach for document image classification. The proposed approach leverages textual and visual modalities to classify document images into ten categories, including letter, memo, news article, etc. In order to alleviate dependency of textual stream on performance of underlying OCR (which is the case with general content based document image classifiers), we utilize a filter based feature-ranking algorithm. This algorithm ranks the features of each class based on their ability to discriminate document images and selects a set of top 'K' features that are retained for further processing. In parallel, the visual stream uses deep CNN models to extract structural features of document images.Finally, textual and visual streams are concatenated together using an average ensembling method. Experimental results reveal that the proposed approach outperforms the state-of-the-art system with a significant margin of 4.5% on publicly available Tobacco-3482 dataset.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134063184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Article Segmentation in Digitised Newspapers with a 2D Markov Model 基于二维马尔可夫模型的数字化报纸文章分割
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00165
Andrew Naoum, J. Nothman, J. Curran
Document analysis and recognition is increasingly used to digitise collections of historical books, newspapers and other periodicals. In the digital humanities, it is often the goal to apply information retrieval (IR) and natural language processing (NLP) techniques to help researchers analyse and navigate these digitised archives. The lack of article segmentation is impairing many IR and NLP systems, which assume text is split into ordered, error-free documents. We define a document analysis and image processing task for segmenting digitised newspapers into articles and other content, e.g. adverts, and we automatically create a dataset of 11602 articles. Using this dataset, we develop and evaluate an innovative 2D Markov model that encodes reading order and substantially outperforms the current state-of-the-art, reaching similar accuracy to human annotators.
文献分析和识别越来越多地用于历史书籍、报纸和其他期刊的数字化收藏。在数字人文学科中,应用信息检索(IR)和自然语言处理(NLP)技术来帮助研究人员分析和浏览这些数字化档案往往是目标。缺乏文章分割正在损害许多IR和NLP系统,这些系统假设文本被分割成有序的,无错误的文档。我们定义了一个文档分析和图像处理任务,用于将数字化报纸分割为文章和其他内容,例如广告,我们自动创建了一个包含11602篇文章的数据集。使用此数据集,我们开发和评估了一个创新的2D马尔可夫模型,该模型对阅读顺序进行编码,并且大大优于当前最先进的技术,达到与人类注释器相似的精度。
{"title":"Article Segmentation in Digitised Newspapers with a 2D Markov Model","authors":"Andrew Naoum, J. Nothman, J. Curran","doi":"10.1109/ICDAR.2019.00165","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00165","url":null,"abstract":"Document analysis and recognition is increasingly used to digitise collections of historical books, newspapers and other periodicals. In the digital humanities, it is often the goal to apply information retrieval (IR) and natural language processing (NLP) techniques to help researchers analyse and navigate these digitised archives. The lack of article segmentation is impairing many IR and NLP systems, which assume text is split into ordered, error-free documents. We define a document analysis and image processing task for segmenting digitised newspapers into articles and other content, e.g. adverts, and we automatically create a dataset of 11602 articles. Using this dataset, we develop and evaluate an innovative 2D Markov model that encodes reading order and substantially outperforms the current state-of-the-art, reaching similar accuracy to human annotators.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"38 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114031343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Unsupervised OCR Model Evaluation Using GAN 基于GAN的无监督OCR模型评估
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00-42
Abhash Sinha, Martin Jenckel, S. S. Bukhari, A. Dengel
Optical Character Recognition (OCR) has achieved its state-of-the-art performance with the use of Deep Learning for character recognition. Deep Learning techniques need large amount of data along with ground truth. Out of the available data, small portion of it has to be used for validation purpose as well. Preparing ground truth for historical documents is expensive and hence availability of data is of utmost concern. Jenckel et al. jenckel came up with an idea of using all the available data for training the OCR model and for the purpose of validation, they generated the input image from Softmax layer of the OCR model; using the decoder setup which can be used to compare with the original input image to validate the OCR model. In this paper, we have explored the possibilities of using Generative Adversial Networks (GANs) gan for generating the image directly from the text obtained from OCR model instead of using the Softmax layer which is not always accessible for all the Deep Learning based OCR models. Using text directly to generate the input image back gives us the advantage to use this pipeline for any OCR models even whose Softmax layer is not accessible. In the results section, we have shown that the current state of using GANs for unsupervised OCR model evaluation.
光学字符识别(OCR)通过使用深度学习进行字符识别,达到了最先进的性能。深度学习技术需要大量的数据和基础事实。在可用的数据中,有一小部分也必须用于验证目的。为历史文献准备事实是昂贵的,因此数据的可用性是最重要的。Jenckel et al. Jenckel提出了使用所有可用数据来训练OCR模型的想法,为了验证,他们从OCR模型的Softmax层生成输入图像;使用解码器设置,可用于与原始输入图像进行比较,以验证OCR模型。在本文中,我们探索了使用生成对抗网络(gan) gan直接从OCR模型获得的文本生成图像的可能性,而不是使用Softmax层,这对于所有基于深度学习的OCR模型来说都是不可访问的。使用文本直接生成输入图像给我们的好处是使用这个管道的任何OCR模型,即使Softmax层是不可访问的。在结果部分,我们展示了使用gan进行无监督OCR模型评估的现状。
{"title":"Unsupervised OCR Model Evaluation Using GAN","authors":"Abhash Sinha, Martin Jenckel, S. S. Bukhari, A. Dengel","doi":"10.1109/ICDAR.2019.00-42","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00-42","url":null,"abstract":"Optical Character Recognition (OCR) has achieved its state-of-the-art performance with the use of Deep Learning for character recognition. Deep Learning techniques need large amount of data along with ground truth. Out of the available data, small portion of it has to be used for validation purpose as well. Preparing ground truth for historical documents is expensive and hence availability of data is of utmost concern. Jenckel et al. jenckel came up with an idea of using all the available data for training the OCR model and for the purpose of validation, they generated the input image from Softmax layer of the OCR model; using the decoder setup which can be used to compare with the original input image to validate the OCR model. In this paper, we have explored the possibilities of using Generative Adversial Networks (GANs) gan for generating the image directly from the text obtained from OCR model instead of using the Softmax layer which is not always accessible for all the Deep Learning based OCR models. Using text directly to generate the input image back gives us the advantage to use this pipeline for any OCR models even whose Softmax layer is not accessible. In the results section, we have shown that the current state of using GANs for unsupervised OCR model evaluation.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114848202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2019 International Conference on Document Analysis and Recognition (ICDAR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1