2016 12th IAPR Workshop on Document Analysis Systems (DAS)最新文献

英文中文

A Segmentation-Free Handwritten Word Spotting Approach by Relaxed Feature Matching 一种基于松弛特征匹配的无分词手写词识别方法

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.40

A. Hast, A. Fornés

The automatic recognition of historical handwritten documents is still considered a challenging task. For this reason, word spotting emerges as a good alternative for making the information contained in these documents available to the user. Word spotting is defined as the task of retrieving all instances of the query word in a document collection, becoming a useful tool for information retrieval. In this paper we propose a segmentation-free word spotting approach able to deal with large document collections. Our method is inspired on feature matching algorithms that have been applied to image matching and retrieval. Since handwritten words have different shape, there is no exact transformation to be obtained. However, the sufficient degree of relaxation is achieved by using a Fourier based descriptor and an alternative approach to RANSAC called PUMA. The proposed approach is evaluated on historical marriage records, achieving promising results.

历史手写文档的自动识别仍然被认为是一项具有挑战性的任务。由于这个原因，单词点选成为使用户可以使用这些文档中包含的信息的一个很好的替代方法。单词查找被定义为检索文档集合中查询词的所有实例的任务，成为信息检索的有用工具。在本文中，我们提出了一种能够处理大型文档集合的无分词点词方法。我们的方法受到了已经应用于图像匹配和检索的特征匹配算法的启发。由于手写文字的形状不同，因此无法得到精确的变换。然而，通过使用基于傅里叶的描述符和RANSAC的另一种称为PUMA的方法，可以实现足够程度的松弛。通过历史婚姻记录对该方法进行了评估，取得了令人满意的结果。

引用次数: 13

Entity Local Structure Graph Matching for Mislabeling Correction 纠错标注的实体局部结构图匹配

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.36

Nihel Kooli, A. Belaïd, Aurélie Joseph, V. P. d'Andecy

This paper proposes an entity local structure comparison approach based on inexact subgraph matching. The comparison results are used for mislabeling correction in the local structure. The latter represents a set of entity attribute labels which are physically close in a document image. It is modeled by an attributed graph describing the content and presentation features of the labels by the nodes and the geometrical features by the arcs. A local structure graph is matched with a structure model which represents a set of local structure model graphs. The structure model is initially built using a set of well chosen local structures based on a graph clustering algorithm and is then incrementally updated. The subgraph matching adopts a specific cost function that integrates the feature dissimilarities. The matched model graph is used to extract the missed labels, prune the extraneous ones and correct the erroneous label fields in the local structure. The evaluation of the structure comparison approach on 525 local structures extracted from 200 business documents achieves about 90% for recall and 95% for precision. The mislabeling correction rates in these local structures vary between 73% and 100%.

提出了一种基于非精确子图匹配的实体局部结构比较方法。比较结果用于局部结构的错标校正。后者表示一组实体属性标签，它们在文档图像中物理上接近。它通过一个属性图来建模，用节点来描述标签的内容和表示特征，用圆弧来描述几何特征。一个局部结构图与一个结构模型相匹配，该结构模型表示一组局部结构模型图。首先使用一组基于图聚类算法的精心选择的局部结构来构建结构模型，然后增量更新。子图匹配采用了一个特定的代价函数，该函数将特征不相似度进行了综合。利用匹配的模型图提取缺失的标签，修剪多余的标签，纠正局部结构中错误的标签字段。对从200个商务文档中提取的525个局部结构进行结构比较评价，查全率达到90%，查准率达到95%。这些局部结构的错误标记纠正率在73%到100%之间。

{"title":"Entity Local Structure Graph Matching for Mislabeling Correction","authors":"Nihel Kooli, A. Belaïd, Aurélie Joseph, V. P. d'Andecy","doi":"10.1109/DAS.2016.36","DOIUrl":"https://doi.org/10.1109/DAS.2016.36","url":null,"abstract":"This paper proposes an entity local structure comparison approach based on inexact subgraph matching. The comparison results are used for mislabeling correction in the local structure. The latter represents a set of entity attribute labels which are physically close in a document image. It is modeled by an attributed graph describing the content and presentation features of the labels by the nodes and the geometrical features by the arcs. A local structure graph is matched with a structure model which represents a set of local structure model graphs. The structure model is initially built using a set of well chosen local structures based on a graph clustering algorithm and is then incrementally updated. The subgraph matching adopts a specific cost function that integrates the feature dissimilarities. The matched model graph is used to extract the missed labels, prune the extraneous ones and correct the erroneous label fields in the local structure. The evaluation of the structure comparison approach on 525 local structures extracted from 200 business documents achieves about 90% for recall and 95% for precision. The mislabeling correction rates in these local structures vary between 73% and 100%.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132026281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Banknote Counterfeit Detection through Background Texture Printing Analysis 基于背景纹理印刷分析的纸币伪钞检测

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.34

A. Berenguel, O. R. Terrades, J. Lladós, C. Cañero

This paper is focused on the detection of counterfeit photocopy banknotes. The main difficulty is to work on a real industrial scenario without any constraint about the acquisition device and with a single image. The main contributions of this paper are twofold: first the adaptation and performance evaluation of existing approaches to classify the genuine and photocopy banknotes using background texture printing analysis, which have not been applied into this context before. Second, a new dataset of Euro banknotes images acquired with several cameras under different luminance conditions to evaluate these methods. Experiments on the proposed algorithms show that mixing SIFT features and sparse coding dictionaries achieves quasi perfect classification using a linear SVM with the created dataset. Approaches using dictionaries to cover all possible texture variations have demonstrated to be robust and outperform the state-of-the-art methods using the proposed benchmark.

本文主要研究了伪造影印钞的检测问题。主要的困难是在一个真实的工业场景中工作，没有任何关于采集设备的限制，只有一个图像。本文的主要贡献有两个方面:首先，采用背景纹理印刷分析对现有的真钞和影印钞分类方法进行了适应性和性能评估，这些方法之前尚未应用于此背景。其次，用不同亮度条件下的多台相机采集的欧元纸币图像数据集来评估这些方法。实验表明，混合SIFT特征和稀疏编码字典的算法在创建的数据集上使用线性支持向量机实现了准完美分类。使用字典来涵盖所有可能的纹理变化的方法已被证明是鲁棒的，并且优于使用所提出的基准的最先进的方法。

引用次数: 19

Online Arabic Handwriting Recognition with Dropout Applied in Deep Recurrent Neural Networks 基于深度递归神经网络的在线阿拉伯手写识别

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.49

R. Maalej, Najiba Tagougui, M. Kherallah

Lately, Online Arabic Handwriting Recognition has been gaining more interest because of the advances in technology such as the handwriting capturing devices and impressive mobile computers. And since we always try to improve recognition rates, we propose in this work a new system based on a deep recurrent neural networks on which the dropout technique was applied. Our approach is very practical in sequence modelling due to their recurrent connections, also it can learn intricate relationship between input and output layers because of many non-linear hidden layers. In addition to these contributions, our system is protected against overfitting due to powerful performance of dropout. This proposed system was tested with a large dataset ADAB to show its performance against difficult conditions as the variety of writers, the large vocabulary and diversity of style.

最近，由于技术的进步，如手写捕捉设备和令人印象深刻的移动计算机，在线阿拉伯语手写识别已经获得了更多的兴趣。由于我们一直在努力提高识别率，我们在这项工作中提出了一个基于深度递归神经网络的新系统，并在其上应用了dropout技术。由于序列之间的循环联系，该方法在序列建模中非常实用，并且由于存在许多非线性隐藏层，该方法可以学习输入和输出层之间复杂的关系。除了这些贡献之外，由于dropout的强大性能，我们的系统可以防止过拟合。该系统在大型数据集ADAB上进行了测试，以显示其在作家多样性、大词汇量和风格多样性等困难条件下的性能。

引用次数: 27

A Simple and Effective Solution for Script Identification in the Wild 一个简单有效的脚本识别解决方案

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.57

A. Singh, Anand Mishra, P. Dabral, C. V. Jawahar

We present an approach for automatically identifying the script of the text localized in the scene images. Our approach is inspired by the advancements in mid-level features. We represent the text images using mid-level features which are pooled from densely computed local features. Once text images are represented using the proposed mid-level feature representation, we use an off-the-shelf classifier to identify the script of the text image. Our approach is efficient and requires very less labeled data. We evaluate the performance of our method on a recently introduced CVSI dataset, demonstrating that the proposed approach can correctly identify script of 96.70% of the text images. In addition, we also introduce and benchmark a more challenging Indian Language Scene Text (ILST) dataset for evaluating the performance of our method.

我们提出了一种自动识别场景图像中本地化文本脚本的方法。我们的方法受到了中级功能进步的启发。我们使用从密集计算的局部特征池中提取的中级特征来表示文本图像。一旦使用提议的中级特征表示表示文本图像，我们就使用现成的分类器来识别文本图像的脚本。我们的方法是有效的，并且需要很少的标记数据。我们在最近引入的CVSI数据集上评估了我们的方法的性能，表明所提出的方法可以正确识别96.70%的文本图像的脚本。此外，我们还介绍了一个更具挑战性的印度语言场景文本(ILST)数据集，并对其进行了基准测试，以评估我们的方法的性能。

引用次数: 20

Preserving Text Content from Historical Handwritten Documents 保存历史手写文件中的文本内容

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.77

Arpita Chakraborty, M. Blumenstein

We propose a holistic, dynamic method to preserve text content with zero tolerance while removing marginal noise for historical handwritten document images. The key idea is to identify and analyze the region between the sharp peak at the edge and page frame of the text content at each margin. Depending on the proximity of the sharp peak to the text, the text content is then extracted from the document image. This method automatically adapts thresholds for each single document image and is directly applicable to gray-scale images. The proposed method is evaluated on four diverse handwritten historical datasets: Queensland State Archive (QSA), Saint Gall, Parzival and the Prosecution Project. Experimental results show that the proposed method achieves higher accuracy compared with other methods tested on the Saint Gall and Parzival datasets, whilst for the other two Australian datasets, which have been introduced here for the first time, the results are very encouraging.

我们提出了一种整体的、动态的方法来零容忍地保留文本内容，同时去除历史手写文档图像的边缘噪声。关键思想是识别和分析文本内容在每个页边距的边缘尖峰和页面框架之间的区域。根据尖锐峰值与文本的接近程度，然后从文档图像中提取文本内容。该方法可自动调整单个文档图像的阈值，并直接适用于灰度图像。所提出的方法在四个不同的手写历史数据集上进行了评估:昆士兰州立档案馆(QSA)、Saint Gall、Parzival和起诉项目。实验结果表明，本文提出的方法在Saint Gall和Parzival数据集上取得了较高的精度，而在本文首次介绍的另外两个澳大利亚数据集上取得了令人鼓舞的结果。

引用次数: 5

Performance of an Off-Line Signature Verification Method Based on Texture Features on a Large Indic-Script Signature Dataset 基于纹理特征的离线签名验证方法在大型索引-脚本签名数据集上的性能

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.48

S. Pal, Alireza Alaei, U. Pal, M. Blumenstein

In this paper, a signature verification method based on texture features involving off-line signatures written in two different Indian scripts is proposed. Both Local Binary Patterns (LBP) and Uniform Local Binary Patterns (ULBP), as powerful texture feature extraction techniques, are used for characterizing off-line signatures. The Nearest Neighbour (NN) technique is considered as the similarity metric for signature verification in the proposed method. To evaluate the proposed verification approach, a large Bangla and Hindi off-line signature dataset (BHSig260) comprising 6240 (260×24) genuine signatures and 7800 (260×30) skilled forgeries was introduced and further used for experimentation. We further used the GPDS-100 signature dataset for a comparison. The experiments were conducted, and the verification accuracies were separately computed for the LBP and ULBP texture features. There were no remarkable changes in the results obtained applying the LBP and ULBP features for verification when the BHSig260 and GPDS-100 signature datasets were used for experimentation.

本文提出了一种基于纹理特征的两种不同印度文字离线签名验证方法。局部二值模式(LBP)和统一局部二值模式(ULBP)作为一种强大的纹理特征提取技术，被用于离线签名的特征提取。该方法将最近邻(NN)技术作为签名验证的相似性度量。为了评估提出的验证方法，引入了一个大型孟加拉语和印地语离线签名数据集(BHSig260)，其中包括6240个(260×24)真实签名和7800个(260×30)熟练的伪造签名，并进一步用于实验。我们进一步使用GPDS-100签名数据集进行比较。进行实验，分别计算LBP和ULBP纹理特征的验证精度。当使用BHSig260和GPDS-100签名数据集进行实验时，应用LBP和ULBP特征进行验证的结果没有显著变化。

引用次数: 67

Marginal Noise Reduction in Historical Handwritten Documents -- A Survey 历史手写体文献边缘噪声降噪研究综述

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.78

Arpita Chakraborty, M. Blumenstein

This paper presents a survey on different approaches for removing the marginal noise from document images, and anlaysing the research challenges of those methods relating to handwritten historical datasets. In this survey, historical documents collected from Australian Archives and Libraries are introduced and the associated layout complexities of those document images are also described. Benchmarking other historical databases related to this work is also discussed. This survey discusses the difficulties and suitability of the state-of-the-art methods to remove marginal noise as well as preserving the text content from handwritten historical documents. This survey helps researchers to identify appropriate methods according to the associated marginal noise and also illustrates their drawbacks in order to make suggestions for developing approaches, which are more general and robust for any datasets.

本文综述了文献图像边缘噪声去除的不同方法，并分析了这些方法在手写体历史数据集上的研究挑战。在本调查中，介绍了从澳大利亚档案馆和图书馆收集的历史文件，并描述了这些文件图像的相关布局复杂性。还讨论了与此工作相关的其他历史数据库的基准测试。本调查讨论的困难和适用性的最先进的方法，以消除边缘噪声，以及保存文本内容从手写的历史文件。这项调查有助于研究人员根据相关的边际噪声确定适当的方法，也说明了它们的缺点，以便为开发方法提出建议，这些方法对任何数据集都更通用和健壮。

引用次数: 3

What You See is What You Get? Automatic Image Verification for Online News Content 所见即所得?在线新闻内容的自动图像验证

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.75

Sarah Elkasrawi, A. Dengel, Ahmed Abdelsamad, S. S. Bukhari

Consuming news over online media has witnessed rapid growth in recent years, especially with the increasing popularity of social media. However, the ease and speed with which users can access and share information online facilitated the dissemination of false or unverified information. One way of assessing the credibility of online news stories is by examining the attached images. These images could be fake, manipulated or not belonging to the context of the accompanying news story. Previous attempts to news verification provided the user with a set of related images for manual inspection. In this work, we present a semi-automatic approach to assist news-consumers in instantaneously assessing the credibility of information in hypertext news articles by means of meta-data and feature analysis of images in the articles. In the first phase, we use a hybrid approach including image and text clustering techniques for checking the authenticity of an image. In the second phase, we use a hierarchical feature analysis technique for checking the alteration in an image, where different sets of features, such as edges and SURF, are used. In contrast to recently reported manual news verification, our presented work shows a quantitative measurement on a custom dataset. Results revealed an accuracy of 72.7% for checking the authenticity of attached images with a dataset of 55 articles. Finding alterations in images resulted in an accuracy of 88% for a dataset of 50 images.

近年来，尤其是随着社交媒体的日益普及，在线媒体上的新闻消费增长迅速。然而，用户在网上访问和分享信息的便利和速度促进了虚假或未经核实的信息的传播。评估网络新闻报道可信度的一种方法是检查附带的图片。这些图片可能是假的、经过处理的，或者不属于相关新闻报道的背景。之前的新闻验证尝试为用户提供了一组相关图像，供人工检查。在这项工作中，我们提出了一种半自动方法，通过对文章中的图像进行元数据和特征分析，帮助新闻消费者即时评估超文本新闻文章中信息的可信度。在第一阶段，我们使用混合方法，包括图像和文本聚类技术来检查图像的真实性。在第二阶段，我们使用分层特征分析技术来检查图像中的变化，其中使用了不同的特征集，如边缘和SURF。与最近报道的手动新闻验证相反，我们提出的工作显示了对自定义数据集的定量测量。结果显示，在55篇文章的数据集上，检查附加图像真实性的准确率为72.7%。在50张图像的数据集中，发现图像变化的准确率达到88%。

{"title":"What You See is What You Get? Automatic Image Verification for Online News Content","authors":"Sarah Elkasrawi, A. Dengel, Ahmed Abdelsamad, S. S. Bukhari","doi":"10.1109/DAS.2016.75","DOIUrl":"https://doi.org/10.1109/DAS.2016.75","url":null,"abstract":"Consuming news over online media has witnessed rapid growth in recent years, especially with the increasing popularity of social media. However, the ease and speed with which users can access and share information online facilitated the dissemination of false or unverified information. One way of assessing the credibility of online news stories is by examining the attached images. These images could be fake, manipulated or not belonging to the context of the accompanying news story. Previous attempts to news verification provided the user with a set of related images for manual inspection. In this work, we present a semi-automatic approach to assist news-consumers in instantaneously assessing the credibility of information in hypertext news articles by means of meta-data and feature analysis of images in the articles. In the first phase, we use a hybrid approach including image and text clustering techniques for checking the authenticity of an image. In the second phase, we use a hierarchical feature analysis technique for checking the alteration in an image, where different sets of features, such as edges and SURF, are used. In contrast to recently reported manual news verification, our presented work shows a quantitative measurement on a custom dataset. Results revealed an accuracy of 72.7% for checking the authenticity of attached images with a dataset of 55 articles. Finding alterations in images resulted in an accuracy of 88% for a dataset of 50 images.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117343938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Isolated Handwritten Digit Recognition Using oBIFs and Background Features 使用obif和背景特征的孤立手写数字识别

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.10

A. Gattal, Chawki Djeddi, Y. Chibani, I. Siddiqi

This study demonstrates how the combination of oriented Basic Image Features (oBIFs) with the background concavity features can be effectively employed to enhance the performance of isolated digit recognition systems. The features are extracted without any size normalization from the complete image as well as from different regions of the image by applying a uniform grid sampling to the image. Classification is carried out using one-against-all support vector machine (SVM) while the experimental study is conducted on the standard CVL single digit database. A series of evaluations using different feature configurations and combinations realized high recognition rates which are compared with the state-of-the-art methods on this subject.

本研究证明了定向基本图像特征(obif)与背景凹凸性特征的结合可以有效地提高孤立数字识别系统的性能。通过对图像进行均匀网格采样，在不进行尺寸归一化的情况下，从完整的图像中提取特征，并从图像的不同区域提取特征。采用单对全支持向量机进行分类，并在标准CVL个位数数据库上进行实验研究。采用不同特征配置和组合的一系列评估实现了较高的识别率，并与当前的方法进行了比较。

引用次数: 28

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀