首页 > 最新文献

2016 12th IAPR Workshop on Document Analysis Systems (DAS)最新文献

英文 中文
High Performance OCR for Camera-Captured Blurred Documents with LSTM Networks 用LSTM网络对相机捕获的模糊文件进行高性能OCR
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.69
Fallak Asad, A. Ul-Hasan, F. Shafait, A. Dengel
Documents are routinely captured by digital cameras in today's age owing to the availability of high quality cameras in smart phones. However, recognition of camera-captured documents is substantially more challenging as compared to traditional flat bed scanned documents due to the distortions introduced by the cameras. One of the major performancelimiting artifacts is the motion and out-of-focus blur that is often induced in the document during the capturing process. Existing approaches try to detect presence of blur in the document to inform the user for re-capturing the image. This paper reports, for the first time, an Optical Character Recognition (OCR) system that can directly recognize blurred documents on which the stateof-the-art OCR systems are unable to provide usable results. Our presented system is based on the Long Short-Term Memory (LSTM) networks and has shown promising character recognition results on both the motion-blurred and out-of-focus blurred images. One important feature of this work is that the LSTM networks have been applied directly to the gray-scale document images to avoid error-prone binarization of blurred documents. Experiments are conducted on publicly available SmartDoc-QA dataset that contains a wide variety of image blur degradations. Our presented system achieves 12.3% character error rate on the test documents, which is an over three-fold reduction in the error rate (38.9%) of the best-performing contemporary OCR system (ABBYY Fine Reader) on the same data.
在当今这个时代,由于智能手机上的高质量摄像头的可用性,文件通常由数码相机拍摄。然而,与传统的平板扫描文件相比,由于相机带来的扭曲,识别相机捕获的文件更具挑战性。主要的性能限制因素之一是在捕获过程中经常在文档中引起的运动和失焦模糊。现有的方法试图检测文档中是否存在模糊,以通知用户重新捕获图像。本文首次报道了一种光学字符识别(OCR)系统,该系统可以直接识别最先进的OCR系统无法提供可用结果的模糊文件。我们提出的系统基于长短期记忆(LSTM)网络,在运动模糊和失焦模糊图像上都显示出良好的字符识别效果。本工作的一个重要特点是LSTM网络直接应用于灰度文档图像,避免了容易出错的模糊文档二值化。实验是在公开可用的SmartDoc-QA数据集上进行的,该数据集包含各种各样的图像模糊退化。我们所提出的系统在测试文档上实现了12.3%的字符错误率,这比当前表现最好的OCR系统(ABBYY Fine Reader)在相同数据上的错误率(38.9%)降低了三倍以上。
{"title":"High Performance OCR for Camera-Captured Blurred Documents with LSTM Networks","authors":"Fallak Asad, A. Ul-Hasan, F. Shafait, A. Dengel","doi":"10.1109/DAS.2016.69","DOIUrl":"https://doi.org/10.1109/DAS.2016.69","url":null,"abstract":"Documents are routinely captured by digital cameras in today's age owing to the availability of high quality cameras in smart phones. However, recognition of camera-captured documents is substantially more challenging as compared to traditional flat bed scanned documents due to the distortions introduced by the cameras. One of the major performancelimiting artifacts is the motion and out-of-focus blur that is often induced in the document during the capturing process. Existing approaches try to detect presence of blur in the document to inform the user for re-capturing the image. This paper reports, for the first time, an Optical Character Recognition (OCR) system that can directly recognize blurred documents on which the stateof-the-art OCR systems are unable to provide usable results. Our presented system is based on the Long Short-Term Memory (LSTM) networks and has shown promising character recognition results on both the motion-blurred and out-of-focus blurred images. One important feature of this work is that the LSTM networks have been applied directly to the gray-scale document images to avoid error-prone binarization of blurred documents. Experiments are conducted on publicly available SmartDoc-QA dataset that contains a wide variety of image blur degradations. Our presented system achieves 12.3% character error rate on the test documents, which is an over three-fold reduction in the error rate (38.9%) of the best-performing contemporary OCR system (ABBYY Fine Reader) on the same data.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125716320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Automatic Selection of Parameters for Document Image Enhancement Using Image Quality Assessment 使用图像质量评估的文档图像增强参数的自动选择
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.53
Ritu Garg, S. Chaudhury
Performance of most of the recognition engines for document images is effected by quality of the image being processed and the selection of parameter values for the pre-processing algorithm. Usually the choice of such parameters is done empirically. In this paper, we propose a novel framework for automatic selection of optimal parameters for pre-processing algorithm by estimating the quality of the document image. Recognition accuracy can be used as a metric for document quality assessment. We learn filters that capture the script properties and degradation to predict recognition accuracy. An EM based framework has been formulated to iteratively learn optimal parameters for document image pre-processing. In the E-step, we estimate the expected accuracy using the current set of parameters and filters. In the M-step we compute parameters to maximize the expected recognition accuracy found in E-step. The experiments validate the efficacy of the proposed methodology for document image pre-processing applications.
大多数文档图像识别引擎的性能受到待处理图像质量和预处理算法参数值选择的影响。通常,这些参数的选择是凭经验完成的。在本文中,我们提出了一种新的框架,通过估计文档图像的质量来自动选择预处理算法的最佳参数。识别精度可以作为文档质量评估的一个指标。我们学习捕捉脚本属性和退化的过滤器来预测识别的准确性。提出了一个基于EM的框架,迭代学习文档图像预处理的最优参数。在e步中,我们使用当前的参数集和过滤器来估计期望的精度。在m步中,我们计算参数以最大化e步中发现的期望识别精度。实验验证了该方法在文档图像预处理中的有效性。
{"title":"Automatic Selection of Parameters for Document Image Enhancement Using Image Quality Assessment","authors":"Ritu Garg, S. Chaudhury","doi":"10.1109/DAS.2016.53","DOIUrl":"https://doi.org/10.1109/DAS.2016.53","url":null,"abstract":"Performance of most of the recognition engines for document images is effected by quality of the image being processed and the selection of parameter values for the pre-processing algorithm. Usually the choice of such parameters is done empirically. In this paper, we propose a novel framework for automatic selection of optimal parameters for pre-processing algorithm by estimating the quality of the document image. Recognition accuracy can be used as a metric for document quality assessment. We learn filters that capture the script properties and degradation to predict recognition accuracy. An EM based framework has been formulated to iteratively learn optimal parameters for document image pre-processing. In the E-step, we estimate the expected accuracy using the current set of parameters and filters. In the M-step we compute parameters to maximize the expected recognition accuracy found in E-step. The experiments validate the efficacy of the proposed methodology for document image pre-processing applications.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"87 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114089559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Creating Ground Truth for Historical Manuscripts with Document Graphs and Scribbling Interaction 用文献图表和涂鸦互动创造历史手稿的基础真相
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.29
A. Garz, Mathias Seuret, Fotini Simistira, Andreas Fischer, R. Ingold
Ground truth is both - indispensable for training and evaluating document analysis methods, and yet very tedious to create manually. This especially holds true for complex historical manuscripts that exhibit challenging layouts with interfering and overlapping handwriting. In this paper, we propose a novel semi-automatic system to support layout annotations in such a scenario based on document graphs and a pen-based scribbling interaction. On the one hand, document graphs provide a sparse page representation that is already close to the desired ground truth and on the other hand, scribbling facilitates an efficient and convenient pen-based interaction with the graph. The performance of the system is demonstrated in the context of a newly introduced database of historical manuscripts with complex layouts.
Ground truth对于培训和评估文档分析方法是必不可少的,但是手动创建非常繁琐。这尤其适用于复杂的历史手稿,这些手稿展示了具有挑战性的布局,笔迹相互干扰和重叠。在本文中,我们提出了一种新的半自动系统来支持这种场景下基于文档图形和基于笔的涂鸦交互的布局注释。一方面,文档图提供了一个稀疏的页面表示,它已经接近所需的基本事实,另一方面,涂鸦促进了与图的高效和方便的基于笔的交互。以一个新引入的具有复杂布局的历史手稿数据库为例,验证了该系统的性能。
{"title":"Creating Ground Truth for Historical Manuscripts with Document Graphs and Scribbling Interaction","authors":"A. Garz, Mathias Seuret, Fotini Simistira, Andreas Fischer, R. Ingold","doi":"10.1109/DAS.2016.29","DOIUrl":"https://doi.org/10.1109/DAS.2016.29","url":null,"abstract":"Ground truth is both - indispensable for training and evaluating document analysis methods, and yet very tedious to create manually. This especially holds true for complex historical manuscripts that exhibit challenging layouts with interfering and overlapping handwriting. In this paper, we propose a novel semi-automatic system to support layout annotations in such a scenario based on document graphs and a pen-based scribbling interaction. On the one hand, document graphs provide a sparse page representation that is already close to the desired ground truth and on the other hand, scribbling facilitates an efficient and convenient pen-based interaction with the graph. The performance of the system is demonstrated in the context of a newly introduced database of historical manuscripts with complex layouts.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128536429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Document Image Quality Assessment Based on Texture Similarity Index 基于纹理相似度指数的文档图像质量评价
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.33
Alireza Alaei, Donatello Conte, M. Blumenstein, R. Raveaux
In this paper, a full reference document image quality assessment (FR DIQA) method using texture features is proposed. Local binary patterns (LBP) as texture features are extracted at the local and global levels for each image. For each extracted LBP feature set, a similarity measure called the LBP similarity index (LBPSI) is computed. A weighting strategy is further proposed to improve the LBPSI obtained based on local LBP features. The LBPSIs computed for both local and global features are then combined to get the final LBPSI, which also provides the best performance for DIQA. To evaluate the proposed method, two different datasets were used. The first dataset is composed of document images, whereas the second one includes natural scene images. The mean human opinion scores (MHOS) were considered as ground truth for performance evaluation. The results obtained from the proposed LBPSI method indicate a significant improvement in automatically/accurately predicting image quality, especially on the document image-based dataset.
提出了一种基于纹理特征的全参考文献图像质量评价方法。局部二值模式(LBP)作为纹理特征分别在局部和全局两级提取。对于每个提取的LBP特征集,计算一个称为LBP相似指数(LBPSI)的相似性度量。进一步提出了一种加权策略来改进基于局部LBP特征得到的LBPSI。然后将为局部和全局特征计算的LBPSI结合起来得到最终的LBPSI,这也为DIQA提供了最佳性能。为了评估所提出的方法,使用了两个不同的数据集。第一个数据集由文档图像组成,而第二个数据集包含自然场景图像。平均人意见得分(MHOS)被认为是绩效评估的基础真理。结果表明,该方法在自动/准确预测图像质量方面有显著提高,特别是在基于文档图像的数据集上。
{"title":"Document Image Quality Assessment Based on Texture Similarity Index","authors":"Alireza Alaei, Donatello Conte, M. Blumenstein, R. Raveaux","doi":"10.1109/DAS.2016.33","DOIUrl":"https://doi.org/10.1109/DAS.2016.33","url":null,"abstract":"In this paper, a full reference document image quality assessment (FR DIQA) method using texture features is proposed. Local binary patterns (LBP) as texture features are extracted at the local and global levels for each image. For each extracted LBP feature set, a similarity measure called the LBP similarity index (LBPSI) is computed. A weighting strategy is further proposed to improve the LBPSI obtained based on local LBP features. The LBPSIs computed for both local and global features are then combined to get the final LBPSI, which also provides the best performance for DIQA. To evaluate the proposed method, two different datasets were used. The first dataset is composed of document images, whereas the second one includes natural scene images. The mean human opinion scores (MHOS) were considered as ground truth for performance evaluation. The results obtained from the proposed LBPSI method indicate a significant improvement in automatically/accurately predicting image quality, especially on the document image-based dataset.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128971805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
An Interactive Transcription System of Census Records Using Word-Spotting Based Information Transfer 基于字词信息传递的交互式人口普查记录转录系统
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.47
J. M. Romeu, A. Fornés, J. Lladós
This paper presents a system to assist in the transcription of historical handwritten census records in a crowdsourcing platform. Census records have a tabular structured layout. They consist in a sequence of rows with information of homes ordered by street address. For each household snippet in the page, the list of family members is reported. The censuses are recorded in intervals of a few years and the information of individuals in each household is quite stable from a point in time to the next one. This redundancy is used to assist the transcriber, so the redundant information is transferred from the census already transcribed to the next one. Household records are aligned from one year to the next one using the knowledge of the ordering by street address. Given an already transcribed census, a query by string word spotting is applied. Thus, names from the census in time t are used as queries in the corresponding home record in time t+1. Since the search is constrained, the obtained precision-recall values are very high, with an important reduction in the transcription time. The proposed system has been tested in a real citizen-science experience where non expert users transcribe the census data of their home town.
本文提出了一种在众包平台上协助抄写历史手写人口普查记录的系统。人口普查记录采用表格式结构布局。它们包含在按街道地址排序的房屋信息的行序列中。对于页面中的每个家庭片段,将报告家庭成员列表。人口普查每隔几年记录一次,每个家庭的个人信息从一个时间点到下一个时间点是相当稳定的。这种冗余是用来协助抄写员的,所以冗余的信息从已经抄写的人口普查转移到下一个人口普查。使用按街道地址排序的信息,将家庭记录从一年对齐到下一年。给定已转录的人口普查,应用字符串单词定位查询。因此,使用时间t的人口普查中的姓名作为查询时间t+1的相应家庭记录。由于搜索是受限的,因此获得的查准率非常高,大大减少了转录时间。提议的系统已经在一个真正的公民科学体验中进行了测试,在那里,非专业用户转录了他们家乡的人口普查数据。
{"title":"An Interactive Transcription System of Census Records Using Word-Spotting Based Information Transfer","authors":"J. M. Romeu, A. Fornés, J. Lladós","doi":"10.1109/DAS.2016.47","DOIUrl":"https://doi.org/10.1109/DAS.2016.47","url":null,"abstract":"This paper presents a system to assist in the transcription of historical handwritten census records in a crowdsourcing platform. Census records have a tabular structured layout. They consist in a sequence of rows with information of homes ordered by street address. For each household snippet in the page, the list of family members is reported. The censuses are recorded in intervals of a few years and the information of individuals in each household is quite stable from a point in time to the next one. This redundancy is used to assist the transcriber, so the redundant information is transferred from the census already transcribed to the next one. Household records are aligned from one year to the next one using the knowledge of the ordering by street address. Given an already transcribed census, a query by string word spotting is applied. Thus, names from the census in time t are used as queries in the corresponding home record in time t+1. Since the search is constrained, the obtained precision-recall values are very high, with an important reduction in the transcription time. The proposed system has been tested in a real citizen-science experience where non expert users transcribe the census data of their home town.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129831871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Compliant Document Image Classification System Based on One-Class Classifier 基于单类分类器的兼容文档图像分类系统
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.55
Nicolas Sidère, Jean-Yves Ramel, Sabine Barrat, V. P. d'Andecy, S. Kebairi
Document image classification in a professional context requires to respect some constraints such as dealing with a large variability of documents and/or number of classes. Whereas most methods deal with all classes at the same time, we answer this problem by presenting a new compliant system based on the specialization of the features and the parametrization of the classifier separately, class per class. We first compute a generalized vector of features based on global image characterization and structural primitives. Then, for each class, the feature vector is specialized by ranking the features according a stability score. Finally, a one-class K-nn classifier is trained using these specific features. Conducted experiments reveal good classification rates, proving the ability of our system to deal with a large range of documents classes.
专业背景下的文档图像分类需要考虑一些约束条件,例如处理文档和/或类的大量变化。虽然大多数方法同时处理所有类,但我们通过提出一个新的兼容系统来解决这个问题,该系统基于特征的专门化和分类器的参数化,每个类单独。我们首先计算基于全局图像特征和结构基元的广义特征向量。然后,对于每个类,通过根据稳定性评分对特征进行排序来专门化特征向量。最后,使用这些特定的特征训练一个单类K-nn分类器。实验结果表明,该系统具有良好的分类率,能够处理大量的文档分类。
{"title":"A Compliant Document Image Classification System Based on One-Class Classifier","authors":"Nicolas Sidère, Jean-Yves Ramel, Sabine Barrat, V. P. d'Andecy, S. Kebairi","doi":"10.1109/DAS.2016.55","DOIUrl":"https://doi.org/10.1109/DAS.2016.55","url":null,"abstract":"Document image classification in a professional context requires to respect some constraints such as dealing with a large variability of documents and/or number of classes. Whereas most methods deal with all classes at the same time, we answer this problem by presenting a new compliant system based on the specialization of the features and the parametrization of the classifier separately, class per class. We first compute a generalized vector of features based on global image characterization and structural primitives. Then, for each class, the feature vector is specialized by ranking the features according a stability score. Finally, a one-class K-nn classifier is trained using these specific features. Conducted experiments reveal good classification rates, proving the ability of our system to deal with a large range of documents classes.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134458771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Error Detection in Indic OCRs 索引ocr中的错误检测
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.31
V. Vinitha, C. V. Jawahar
A good post processing module is an indispensable part of an OCR pipeline. In this paper, we propose a novel method for error detection in Indian language OCR output. Our solution uses a recurrent neural network (RNN) for classification of a word as an error or not. We propose a generic error detection method and demonstrate its effectiveness on four popular Indian languages. We divide the words into their constituent aksharas and use their bigram and trigram level information to build a feature representation. In order to train the classifier on incorrect words, we use the mis-recognized words in the output of the OCR. In addition to RNN, we also explore the effectiveness of a generative model such as GMM for our task and demonstrate an improved performance by combining both the approaches. We tested our method on four popular Indian languages and report an average error detection performance above 80%.
一个好的后处理模块是OCR管道中不可缺少的一部分。本文提出了一种新的印度语OCR输出错误检测方法。我们的解决方案使用递归神经网络(RNN)来分类一个词是否错误。我们提出了一种通用的错误检测方法,并对四种流行的印度语言进行了验证。我们将单词划分为其组成的aksharas,并使用它们的双字母和三字母级别信息来构建特征表示。为了在不正确的单词上训练分类器,我们在OCR的输出中使用错误识别的单词。除了RNN,我们还探索了生成模型(如GMM)的有效性,并通过结合这两种方法展示了改进的性能。我们在四种流行的印度语言上测试了我们的方法,并报告平均错误检测性能超过80%。
{"title":"Error Detection in Indic OCRs","authors":"V. Vinitha, C. V. Jawahar","doi":"10.1109/DAS.2016.31","DOIUrl":"https://doi.org/10.1109/DAS.2016.31","url":null,"abstract":"A good post processing module is an indispensable part of an OCR pipeline. In this paper, we propose a novel method for error detection in Indian language OCR output. Our solution uses a recurrent neural network (RNN) for classification of a word as an error or not. We propose a generic error detection method and demonstrate its effectiveness on four popular Indian languages. We divide the words into their constituent aksharas and use their bigram and trigram level information to build a feature representation. In order to train the classifier on incorrect words, we use the mis-recognized words in the output of the OCR. In addition to RNN, we also explore the effectiveness of a generative model such as GMM for our task and demonstrate an improved performance by combining both the approaches. We tested our method on four popular Indian languages and report an average error detection performance above 80%.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129723516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Increasing Robustness of Handwriting Recognition Using Character N-Gram Decoding on Large Lexica 基于字符N-Gram解码的手写识别鲁棒性研究
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.43
M. Schall, M. Schambach, M. Franz
Offline handwriting recognition systems often include a decoding step, that is retrieving the most likely character sequence from the underlying machine learning algorithm. Decoding is sensitive to ranges of weakly predicted characters, caused e.g. by obstructions in the scanned document. We present a new algorithm for robust decoding of handwriting recognizer outputs using character n-grams. Multidimensional hierarchical subsampling artificial neural networks with Long-Short-Term-Memory cells have been successfully applied to offline handwriting recognition. Output activations from such networks, trained with Connectionist Temporal Classification, can be decoded with several different algorithms in order to retrieve the most likely literal string that it represents. We present a new algorithm for decoding the network output while restricting the possible strings to a large lexicon. The index used for this work is an n-gram index with tri-grams used for experimental comparisons. N-grams are extracted from the network output using a backtracking algorithm and each n-gram assigned a mean probability. The decoding result is obtained by intersecting the n-gram hit lists while calculating the total probability for each matched lexicon entry. We conclude with an experimental comparison of different decoding algorithms on a large lexicon.
离线手写识别系统通常包括一个解码步骤,即从底层机器学习算法中检索最可能的字符序列。解码对弱预测字符的范围很敏感,例如由扫描文档中的障碍物引起的。提出了一种基于字符n-图的手写识别器输出鲁棒解码算法。具有长短期记忆单元的多维层次子采样人工神经网络已成功应用于离线手写识别。通过Connectionist Temporal Classification进行训练的此类网络的输出激活可以用几种不同的算法进行解码,以便检索它所代表的最可能的文字字符串。我们提出了一种解码网络输出的新算法,同时将可能的字符串限制在一个大的词典中。这项工作中使用的索引是n-gram索引,用于实验比较的是三-gram索引。使用回溯算法从网络输出中提取n个图,并为每个n个图分配一个平均概率。解码结果是通过交叉n-gram命中表,同时计算每个匹配词汇条目的总概率来获得的。最后,我们对一个大词典的不同解码算法进行了实验比较。
{"title":"Increasing Robustness of Handwriting Recognition Using Character N-Gram Decoding on Large Lexica","authors":"M. Schall, M. Schambach, M. Franz","doi":"10.1109/DAS.2016.43","DOIUrl":"https://doi.org/10.1109/DAS.2016.43","url":null,"abstract":"Offline handwriting recognition systems often include a decoding step, that is retrieving the most likely character sequence from the underlying machine learning algorithm. Decoding is sensitive to ranges of weakly predicted characters, caused e.g. by obstructions in the scanned document. We present a new algorithm for robust decoding of handwriting recognizer outputs using character n-grams. Multidimensional hierarchical subsampling artificial neural networks with Long-Short-Term-Memory cells have been successfully applied to offline handwriting recognition. Output activations from such networks, trained with Connectionist Temporal Classification, can be decoded with several different algorithms in order to retrieve the most likely literal string that it represents. We present a new algorithm for decoding the network output while restricting the possible strings to a large lexicon. The index used for this work is an n-gram index with tri-grams used for experimental comparisons. N-grams are extracted from the network output using a backtracking algorithm and each n-gram assigned a mean probability. The decoding result is obtained by intersecting the n-gram hit lists while calculating the total probability for each matched lexicon entry. We conclude with an experimental comparison of different decoding algorithms on a large lexicon.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128597368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Globally Optimal Text Line Extraction Based on K-Shortest Paths Algorithm 基于k -最短路径算法的全局最优文本行提取
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.12
Liuan Wang, S. Uchida, Wei-liang Fan, Jun Sun
The task of text line extraction in images is a crucial prerequisite for content-based image understanding applications. In this paper, we propose a novel text line extraction method based on k-shortest paths global optimization in images. Firstly, the candidate connected components are extracted by reformulating it as Maximal Stable Extremal Region (MSER) results in images. Then, the directed graph is built upon the connected component nodes with edges comprising of unary and pairwise cost function. Finally, the text line extraction problem is solved using the k-shortest paths optimization algorithm by taking advantage of the particular structure of the directed graph. Experimental results on public dataset demonstrate the effectiveness of proposed method in comparison with state-of-the-art methods.
图像中的文本行提取任务是基于内容的图像理解应用程序的关键先决条件。本文提出了一种基于k最短路径全局优化的图像文本行提取方法。首先,将候选连通分量重新表述为图像中的最大稳定极值区域(MSER)结果,提取候选连通分量;然后,在连通的组件节点上构建有向图,这些节点的边由一元和成对代价函数组成。最后,利用有向图的特殊结构,利用k最短路径优化算法解决文本行提取问题。在公共数据集上的实验结果表明了该方法与现有方法的有效性。
{"title":"Globally Optimal Text Line Extraction Based on K-Shortest Paths Algorithm","authors":"Liuan Wang, S. Uchida, Wei-liang Fan, Jun Sun","doi":"10.1109/DAS.2016.12","DOIUrl":"https://doi.org/10.1109/DAS.2016.12","url":null,"abstract":"The task of text line extraction in images is a crucial prerequisite for content-based image understanding applications. In this paper, we propose a novel text line extraction method based on k-shortest paths global optimization in images. Firstly, the candidate connected components are extracted by reformulating it as Maximal Stable Extremal Region (MSER) results in images. Then, the directed graph is built upon the connected component nodes with edges comprising of unary and pairwise cost function. Finally, the text line extraction problem is solved using the k-shortest paths optimization algorithm by taking advantage of the particular structure of the directed graph. Experimental results on public dataset demonstrate the effectiveness of proposed method in comparison with state-of-the-art methods.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122944559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Effective Candidate Component Extraction for Text Localization in Born-Digital Images by Combining Text Contours and Stroke Interior Regions 基于文本轮廓和笔画内区域相结合的数字图像文本定位有效候选成分提取
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.30
Kai Chen, Fei Yin, Cheng-Lin Liu
Extracting candidate text connected components (CCs) is critical for CC-based text localization. Based on the observation that text strokes in born-digital images mostly have complete contours and the text pixels have high contrast with the adjacent non-text pixels, we propose a method to extract candidate text CCs by combining text contours and stroke interior regions. After segmenting the image into non-smooth and smooth regions based on local contrast, text contour pixels in non-smooth regions are detached from adjacent non-text pixels by local binarization. Then, obvious non-text contours can be removed according to the spatial relationship of text and non-text contours. While smooth regions include stroke interior regions and non-text smooth regions, some non-text smooth regions can be easily removed because they are not surrounded by candidate text contours. At last, candidate text contours and stroke interior regions are combined to generate candidate text CCs. The CCs undergo CC filtering, text line grouping and line classification to give the text localization result. Experimental results on the born-digital dataset of ICDAR2013 robust reading competition demonstrate the efficiency and superiority of the proposed method.
候选文本连接组件的提取是基于文本连接组件的文本定位的关键。基于对非数字图像中文本笔画轮廓完整、文本像素与相邻非文本像素对比度高的观察,提出了一种结合文本轮廓和笔画内部区域提取候选文本cc的方法。基于局部对比度将图像分割为非光滑区域和光滑区域后,通过局部二值化将非光滑区域中的文本轮廓像素与相邻的非文本像素分离。然后,根据文本与非文本轮廓的空间关系,去除明显的非文本轮廓。虽然平滑区域包括笔画内部区域和非文本平滑区域,但一些非文本平滑区域可以很容易地删除,因为它们没有被候选文本轮廓包围。最后,结合候选文本轮廓和笔画内部区域生成候选文本cc。CC经过CC过滤、文本行分组和行分类,得到文本定位结果。在ICDAR2013稳健阅读竞赛数据集上的实验结果证明了该方法的有效性和优越性。
{"title":"Effective Candidate Component Extraction for Text Localization in Born-Digital Images by Combining Text Contours and Stroke Interior Regions","authors":"Kai Chen, Fei Yin, Cheng-Lin Liu","doi":"10.1109/DAS.2016.30","DOIUrl":"https://doi.org/10.1109/DAS.2016.30","url":null,"abstract":"Extracting candidate text connected components (CCs) is critical for CC-based text localization. Based on the observation that text strokes in born-digital images mostly have complete contours and the text pixels have high contrast with the adjacent non-text pixels, we propose a method to extract candidate text CCs by combining text contours and stroke interior regions. After segmenting the image into non-smooth and smooth regions based on local contrast, text contour pixels in non-smooth regions are detached from adjacent non-text pixels by local binarization. Then, obvious non-text contours can be removed according to the spatial relationship of text and non-text contours. While smooth regions include stroke interior regions and non-text smooth regions, some non-text smooth regions can be easily removed because they are not surrounded by candidate text contours. At last, candidate text contours and stroke interior regions are combined to generate candidate text CCs. The CCs undergo CC filtering, text line grouping and line classification to give the text localization result. Experimental results on the born-digital dataset of ICDAR2013 robust reading competition demonstrate the efficiency and superiority of the proposed method.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127488207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2016 12th IAPR Workshop on Document Analysis Systems (DAS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1