首页 > 最新文献

2016 12th IAPR Workshop on Document Analysis Systems (DAS)最新文献

英文 中文
Automatic Selection of Parameters for Document Image Enhancement Using Image Quality Assessment 使用图像质量评估的文档图像增强参数的自动选择
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.53
Ritu Garg, S. Chaudhury
Performance of most of the recognition engines for document images is effected by quality of the image being processed and the selection of parameter values for the pre-processing algorithm. Usually the choice of such parameters is done empirically. In this paper, we propose a novel framework for automatic selection of optimal parameters for pre-processing algorithm by estimating the quality of the document image. Recognition accuracy can be used as a metric for document quality assessment. We learn filters that capture the script properties and degradation to predict recognition accuracy. An EM based framework has been formulated to iteratively learn optimal parameters for document image pre-processing. In the E-step, we estimate the expected accuracy using the current set of parameters and filters. In the M-step we compute parameters to maximize the expected recognition accuracy found in E-step. The experiments validate the efficacy of the proposed methodology for document image pre-processing applications.
大多数文档图像识别引擎的性能受到待处理图像质量和预处理算法参数值选择的影响。通常,这些参数的选择是凭经验完成的。在本文中,我们提出了一种新的框架,通过估计文档图像的质量来自动选择预处理算法的最佳参数。识别精度可以作为文档质量评估的一个指标。我们学习捕捉脚本属性和退化的过滤器来预测识别的准确性。提出了一个基于EM的框架,迭代学习文档图像预处理的最优参数。在e步中,我们使用当前的参数集和过滤器来估计期望的精度。在m步中,我们计算参数以最大化e步中发现的期望识别精度。实验验证了该方法在文档图像预处理中的有效性。
{"title":"Automatic Selection of Parameters for Document Image Enhancement Using Image Quality Assessment","authors":"Ritu Garg, S. Chaudhury","doi":"10.1109/DAS.2016.53","DOIUrl":"https://doi.org/10.1109/DAS.2016.53","url":null,"abstract":"Performance of most of the recognition engines for document images is effected by quality of the image being processed and the selection of parameter values for the pre-processing algorithm. Usually the choice of such parameters is done empirically. In this paper, we propose a novel framework for automatic selection of optimal parameters for pre-processing algorithm by estimating the quality of the document image. Recognition accuracy can be used as a metric for document quality assessment. We learn filters that capture the script properties and degradation to predict recognition accuracy. An EM based framework has been formulated to iteratively learn optimal parameters for document image pre-processing. In the E-step, we estimate the expected accuracy using the current set of parameters and filters. In the M-step we compute parameters to maximize the expected recognition accuracy found in E-step. The experiments validate the efficacy of the proposed methodology for document image pre-processing applications.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114089559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Creating Ground Truth for Historical Manuscripts with Document Graphs and Scribbling Interaction 用文献图表和涂鸦互动创造历史手稿的基础真相
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.29
A. Garz, Mathias Seuret, Fotini Simistira, Andreas Fischer, R. Ingold
Ground truth is both - indispensable for training and evaluating document analysis methods, and yet very tedious to create manually. This especially holds true for complex historical manuscripts that exhibit challenging layouts with interfering and overlapping handwriting. In this paper, we propose a novel semi-automatic system to support layout annotations in such a scenario based on document graphs and a pen-based scribbling interaction. On the one hand, document graphs provide a sparse page representation that is already close to the desired ground truth and on the other hand, scribbling facilitates an efficient and convenient pen-based interaction with the graph. The performance of the system is demonstrated in the context of a newly introduced database of historical manuscripts with complex layouts.
Ground truth对于培训和评估文档分析方法是必不可少的,但是手动创建非常繁琐。这尤其适用于复杂的历史手稿,这些手稿展示了具有挑战性的布局,笔迹相互干扰和重叠。在本文中,我们提出了一种新的半自动系统来支持这种场景下基于文档图形和基于笔的涂鸦交互的布局注释。一方面,文档图提供了一个稀疏的页面表示,它已经接近所需的基本事实,另一方面,涂鸦促进了与图的高效和方便的基于笔的交互。以一个新引入的具有复杂布局的历史手稿数据库为例,验证了该系统的性能。
{"title":"Creating Ground Truth for Historical Manuscripts with Document Graphs and Scribbling Interaction","authors":"A. Garz, Mathias Seuret, Fotini Simistira, Andreas Fischer, R. Ingold","doi":"10.1109/DAS.2016.29","DOIUrl":"https://doi.org/10.1109/DAS.2016.29","url":null,"abstract":"Ground truth is both - indispensable for training and evaluating document analysis methods, and yet very tedious to create manually. This especially holds true for complex historical manuscripts that exhibit challenging layouts with interfering and overlapping handwriting. In this paper, we propose a novel semi-automatic system to support layout annotations in such a scenario based on document graphs and a pen-based scribbling interaction. On the one hand, document graphs provide a sparse page representation that is already close to the desired ground truth and on the other hand, scribbling facilitates an efficient and convenient pen-based interaction with the graph. The performance of the system is demonstrated in the context of a newly introduced database of historical manuscripts with complex layouts.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128536429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
High Performance OCR for Camera-Captured Blurred Documents with LSTM Networks 用LSTM网络对相机捕获的模糊文件进行高性能OCR
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.69
Fallak Asad, A. Ul-Hasan, F. Shafait, A. Dengel
Documents are routinely captured by digital cameras in today's age owing to the availability of high quality cameras in smart phones. However, recognition of camera-captured documents is substantially more challenging as compared to traditional flat bed scanned documents due to the distortions introduced by the cameras. One of the major performancelimiting artifacts is the motion and out-of-focus blur that is often induced in the document during the capturing process. Existing approaches try to detect presence of blur in the document to inform the user for re-capturing the image. This paper reports, for the first time, an Optical Character Recognition (OCR) system that can directly recognize blurred documents on which the stateof-the-art OCR systems are unable to provide usable results. Our presented system is based on the Long Short-Term Memory (LSTM) networks and has shown promising character recognition results on both the motion-blurred and out-of-focus blurred images. One important feature of this work is that the LSTM networks have been applied directly to the gray-scale document images to avoid error-prone binarization of blurred documents. Experiments are conducted on publicly available SmartDoc-QA dataset that contains a wide variety of image blur degradations. Our presented system achieves 12.3% character error rate on the test documents, which is an over three-fold reduction in the error rate (38.9%) of the best-performing contemporary OCR system (ABBYY Fine Reader) on the same data.
在当今这个时代,由于智能手机上的高质量摄像头的可用性,文件通常由数码相机拍摄。然而,与传统的平板扫描文件相比,由于相机带来的扭曲,识别相机捕获的文件更具挑战性。主要的性能限制因素之一是在捕获过程中经常在文档中引起的运动和失焦模糊。现有的方法试图检测文档中是否存在模糊,以通知用户重新捕获图像。本文首次报道了一种光学字符识别(OCR)系统,该系统可以直接识别最先进的OCR系统无法提供可用结果的模糊文件。我们提出的系统基于长短期记忆(LSTM)网络,在运动模糊和失焦模糊图像上都显示出良好的字符识别效果。本工作的一个重要特点是LSTM网络直接应用于灰度文档图像,避免了容易出错的模糊文档二值化。实验是在公开可用的SmartDoc-QA数据集上进行的,该数据集包含各种各样的图像模糊退化。我们所提出的系统在测试文档上实现了12.3%的字符错误率,这比当前表现最好的OCR系统(ABBYY Fine Reader)在相同数据上的错误率(38.9%)降低了三倍以上。
{"title":"High Performance OCR for Camera-Captured Blurred Documents with LSTM Networks","authors":"Fallak Asad, A. Ul-Hasan, F. Shafait, A. Dengel","doi":"10.1109/DAS.2016.69","DOIUrl":"https://doi.org/10.1109/DAS.2016.69","url":null,"abstract":"Documents are routinely captured by digital cameras in today's age owing to the availability of high quality cameras in smart phones. However, recognition of camera-captured documents is substantially more challenging as compared to traditional flat bed scanned documents due to the distortions introduced by the cameras. One of the major performancelimiting artifacts is the motion and out-of-focus blur that is often induced in the document during the capturing process. Existing approaches try to detect presence of blur in the document to inform the user for re-capturing the image. This paper reports, for the first time, an Optical Character Recognition (OCR) system that can directly recognize blurred documents on which the stateof-the-art OCR systems are unable to provide usable results. Our presented system is based on the Long Short-Term Memory (LSTM) networks and has shown promising character recognition results on both the motion-blurred and out-of-focus blurred images. One important feature of this work is that the LSTM networks have been applied directly to the gray-scale document images to avoid error-prone binarization of blurred documents. Experiments are conducted on publicly available SmartDoc-QA dataset that contains a wide variety of image blur degradations. Our presented system achieves 12.3% character error rate on the test documents, which is an over three-fold reduction in the error rate (38.9%) of the best-performing contemporary OCR system (ABBYY Fine Reader) on the same data.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125716320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Document Image Quality Assessment Based on Texture Similarity Index 基于纹理相似度指数的文档图像质量评价
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.33
Alireza Alaei, Donatello Conte, M. Blumenstein, R. Raveaux
In this paper, a full reference document image quality assessment (FR DIQA) method using texture features is proposed. Local binary patterns (LBP) as texture features are extracted at the local and global levels for each image. For each extracted LBP feature set, a similarity measure called the LBP similarity index (LBPSI) is computed. A weighting strategy is further proposed to improve the LBPSI obtained based on local LBP features. The LBPSIs computed for both local and global features are then combined to get the final LBPSI, which also provides the best performance for DIQA. To evaluate the proposed method, two different datasets were used. The first dataset is composed of document images, whereas the second one includes natural scene images. The mean human opinion scores (MHOS) were considered as ground truth for performance evaluation. The results obtained from the proposed LBPSI method indicate a significant improvement in automatically/accurately predicting image quality, especially on the document image-based dataset.
提出了一种基于纹理特征的全参考文献图像质量评价方法。局部二值模式(LBP)作为纹理特征分别在局部和全局两级提取。对于每个提取的LBP特征集,计算一个称为LBP相似指数(LBPSI)的相似性度量。进一步提出了一种加权策略来改进基于局部LBP特征得到的LBPSI。然后将为局部和全局特征计算的LBPSI结合起来得到最终的LBPSI,这也为DIQA提供了最佳性能。为了评估所提出的方法,使用了两个不同的数据集。第一个数据集由文档图像组成,而第二个数据集包含自然场景图像。平均人意见得分(MHOS)被认为是绩效评估的基础真理。结果表明,该方法在自动/准确预测图像质量方面有显著提高,特别是在基于文档图像的数据集上。
{"title":"Document Image Quality Assessment Based on Texture Similarity Index","authors":"Alireza Alaei, Donatello Conte, M. Blumenstein, R. Raveaux","doi":"10.1109/DAS.2016.33","DOIUrl":"https://doi.org/10.1109/DAS.2016.33","url":null,"abstract":"In this paper, a full reference document image quality assessment (FR DIQA) method using texture features is proposed. Local binary patterns (LBP) as texture features are extracted at the local and global levels for each image. For each extracted LBP feature set, a similarity measure called the LBP similarity index (LBPSI) is computed. A weighting strategy is further proposed to improve the LBPSI obtained based on local LBP features. The LBPSIs computed for both local and global features are then combined to get the final LBPSI, which also provides the best performance for DIQA. To evaluate the proposed method, two different datasets were used. The first dataset is composed of document images, whereas the second one includes natural scene images. The mean human opinion scores (MHOS) were considered as ground truth for performance evaluation. The results obtained from the proposed LBPSI method indicate a significant improvement in automatically/accurately predicting image quality, especially on the document image-based dataset.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128971805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Compliant Document Image Classification System Based on One-Class Classifier 基于单类分类器的兼容文档图像分类系统
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.55
Nicolas Sidère, Jean-Yves Ramel, Sabine Barrat, V. P. d'Andecy, S. Kebairi
Document image classification in a professional context requires to respect some constraints such as dealing with a large variability of documents and/or number of classes. Whereas most methods deal with all classes at the same time, we answer this problem by presenting a new compliant system based on the specialization of the features and the parametrization of the classifier separately, class per class. We first compute a generalized vector of features based on global image characterization and structural primitives. Then, for each class, the feature vector is specialized by ranking the features according a stability score. Finally, a one-class K-nn classifier is trained using these specific features. Conducted experiments reveal good classification rates, proving the ability of our system to deal with a large range of documents classes.
专业背景下的文档图像分类需要考虑一些约束条件,例如处理文档和/或类的大量变化。虽然大多数方法同时处理所有类,但我们通过提出一个新的兼容系统来解决这个问题,该系统基于特征的专门化和分类器的参数化,每个类单独。我们首先计算基于全局图像特征和结构基元的广义特征向量。然后,对于每个类,通过根据稳定性评分对特征进行排序来专门化特征向量。最后,使用这些特定的特征训练一个单类K-nn分类器。实验结果表明,该系统具有良好的分类率,能够处理大量的文档分类。
{"title":"A Compliant Document Image Classification System Based on One-Class Classifier","authors":"Nicolas Sidère, Jean-Yves Ramel, Sabine Barrat, V. P. d'Andecy, S. Kebairi","doi":"10.1109/DAS.2016.55","DOIUrl":"https://doi.org/10.1109/DAS.2016.55","url":null,"abstract":"Document image classification in a professional context requires to respect some constraints such as dealing with a large variability of documents and/or number of classes. Whereas most methods deal with all classes at the same time, we answer this problem by presenting a new compliant system based on the specialization of the features and the parametrization of the classifier separately, class per class. We first compute a generalized vector of features based on global image characterization and structural primitives. Then, for each class, the feature vector is specialized by ranking the features according a stability score. Finally, a one-class K-nn classifier is trained using these specific features. Conducted experiments reveal good classification rates, proving the ability of our system to deal with a large range of documents classes.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134458771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Error Detection in Indic OCRs 索引ocr中的错误检测
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.31
V. Vinitha, C. V. Jawahar
A good post processing module is an indispensable part of an OCR pipeline. In this paper, we propose a novel method for error detection in Indian language OCR output. Our solution uses a recurrent neural network (RNN) for classification of a word as an error or not. We propose a generic error detection method and demonstrate its effectiveness on four popular Indian languages. We divide the words into their constituent aksharas and use their bigram and trigram level information to build a feature representation. In order to train the classifier on incorrect words, we use the mis-recognized words in the output of the OCR. In addition to RNN, we also explore the effectiveness of a generative model such as GMM for our task and demonstrate an improved performance by combining both the approaches. We tested our method on four popular Indian languages and report an average error detection performance above 80%.
一个好的后处理模块是OCR管道中不可缺少的一部分。本文提出了一种新的印度语OCR输出错误检测方法。我们的解决方案使用递归神经网络(RNN)来分类一个词是否错误。我们提出了一种通用的错误检测方法,并对四种流行的印度语言进行了验证。我们将单词划分为其组成的aksharas,并使用它们的双字母和三字母级别信息来构建特征表示。为了在不正确的单词上训练分类器,我们在OCR的输出中使用错误识别的单词。除了RNN,我们还探索了生成模型(如GMM)的有效性,并通过结合这两种方法展示了改进的性能。我们在四种流行的印度语言上测试了我们的方法,并报告平均错误检测性能超过80%。
{"title":"Error Detection in Indic OCRs","authors":"V. Vinitha, C. V. Jawahar","doi":"10.1109/DAS.2016.31","DOIUrl":"https://doi.org/10.1109/DAS.2016.31","url":null,"abstract":"A good post processing module is an indispensable part of an OCR pipeline. In this paper, we propose a novel method for error detection in Indian language OCR output. Our solution uses a recurrent neural network (RNN) for classification of a word as an error or not. We propose a generic error detection method and demonstrate its effectiveness on four popular Indian languages. We divide the words into their constituent aksharas and use their bigram and trigram level information to build a feature representation. In order to train the classifier on incorrect words, we use the mis-recognized words in the output of the OCR. In addition to RNN, we also explore the effectiveness of a generative model such as GMM for our task and demonstrate an improved performance by combining both the approaches. We tested our method on four popular Indian languages and report an average error detection performance above 80%.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129723516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
An Interactive Transcription System of Census Records Using Word-Spotting Based Information Transfer 基于字词信息传递的交互式人口普查记录转录系统
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.47
J. M. Romeu, A. Fornés, J. Lladós
This paper presents a system to assist in the transcription of historical handwritten census records in a crowdsourcing platform. Census records have a tabular structured layout. They consist in a sequence of rows with information of homes ordered by street address. For each household snippet in the page, the list of family members is reported. The censuses are recorded in intervals of a few years and the information of individuals in each household is quite stable from a point in time to the next one. This redundancy is used to assist the transcriber, so the redundant information is transferred from the census already transcribed to the next one. Household records are aligned from one year to the next one using the knowledge of the ordering by street address. Given an already transcribed census, a query by string word spotting is applied. Thus, names from the census in time t are used as queries in the corresponding home record in time t+1. Since the search is constrained, the obtained precision-recall values are very high, with an important reduction in the transcription time. The proposed system has been tested in a real citizen-science experience where non expert users transcribe the census data of their home town.
本文提出了一种在众包平台上协助抄写历史手写人口普查记录的系统。人口普查记录采用表格式结构布局。它们包含在按街道地址排序的房屋信息的行序列中。对于页面中的每个家庭片段,将报告家庭成员列表。人口普查每隔几年记录一次,每个家庭的个人信息从一个时间点到下一个时间点是相当稳定的。这种冗余是用来协助抄写员的,所以冗余的信息从已经抄写的人口普查转移到下一个人口普查。使用按街道地址排序的信息,将家庭记录从一年对齐到下一年。给定已转录的人口普查,应用字符串单词定位查询。因此,使用时间t的人口普查中的姓名作为查询时间t+1的相应家庭记录。由于搜索是受限的,因此获得的查准率非常高,大大减少了转录时间。提议的系统已经在一个真正的公民科学体验中进行了测试,在那里,非专业用户转录了他们家乡的人口普查数据。
{"title":"An Interactive Transcription System of Census Records Using Word-Spotting Based Information Transfer","authors":"J. M. Romeu, A. Fornés, J. Lladós","doi":"10.1109/DAS.2016.47","DOIUrl":"https://doi.org/10.1109/DAS.2016.47","url":null,"abstract":"This paper presents a system to assist in the transcription of historical handwritten census records in a crowdsourcing platform. Census records have a tabular structured layout. They consist in a sequence of rows with information of homes ordered by street address. For each household snippet in the page, the list of family members is reported. The censuses are recorded in intervals of a few years and the information of individuals in each household is quite stable from a point in time to the next one. This redundancy is used to assist the transcriber, so the redundant information is transferred from the census already transcribed to the next one. Household records are aligned from one year to the next one using the knowledge of the ordering by street address. Given an already transcribed census, a query by string word spotting is applied. Thus, names from the census in time t are used as queries in the corresponding home record in time t+1. Since the search is constrained, the obtained precision-recall values are very high, with an important reduction in the transcription time. The proposed system has been tested in a real citizen-science experience where non expert users transcribe the census data of their home town.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129831871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Effective Candidate Component Extraction for Text Localization in Born-Digital Images by Combining Text Contours and Stroke Interior Regions 基于文本轮廓和笔画内区域相结合的数字图像文本定位有效候选成分提取
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.30
Kai Chen, Fei Yin, Cheng-Lin Liu
Extracting candidate text connected components (CCs) is critical for CC-based text localization. Based on the observation that text strokes in born-digital images mostly have complete contours and the text pixels have high contrast with the adjacent non-text pixels, we propose a method to extract candidate text CCs by combining text contours and stroke interior regions. After segmenting the image into non-smooth and smooth regions based on local contrast, text contour pixels in non-smooth regions are detached from adjacent non-text pixels by local binarization. Then, obvious non-text contours can be removed according to the spatial relationship of text and non-text contours. While smooth regions include stroke interior regions and non-text smooth regions, some non-text smooth regions can be easily removed because they are not surrounded by candidate text contours. At last, candidate text contours and stroke interior regions are combined to generate candidate text CCs. The CCs undergo CC filtering, text line grouping and line classification to give the text localization result. Experimental results on the born-digital dataset of ICDAR2013 robust reading competition demonstrate the efficiency and superiority of the proposed method.
候选文本连接组件的提取是基于文本连接组件的文本定位的关键。基于对非数字图像中文本笔画轮廓完整、文本像素与相邻非文本像素对比度高的观察,提出了一种结合文本轮廓和笔画内部区域提取候选文本cc的方法。基于局部对比度将图像分割为非光滑区域和光滑区域后,通过局部二值化将非光滑区域中的文本轮廓像素与相邻的非文本像素分离。然后,根据文本与非文本轮廓的空间关系,去除明显的非文本轮廓。虽然平滑区域包括笔画内部区域和非文本平滑区域,但一些非文本平滑区域可以很容易地删除,因为它们没有被候选文本轮廓包围。最后,结合候选文本轮廓和笔画内部区域生成候选文本cc。CC经过CC过滤、文本行分组和行分类,得到文本定位结果。在ICDAR2013稳健阅读竞赛数据集上的实验结果证明了该方法的有效性和优越性。
{"title":"Effective Candidate Component Extraction for Text Localization in Born-Digital Images by Combining Text Contours and Stroke Interior Regions","authors":"Kai Chen, Fei Yin, Cheng-Lin Liu","doi":"10.1109/DAS.2016.30","DOIUrl":"https://doi.org/10.1109/DAS.2016.30","url":null,"abstract":"Extracting candidate text connected components (CCs) is critical for CC-based text localization. Based on the observation that text strokes in born-digital images mostly have complete contours and the text pixels have high contrast with the adjacent non-text pixels, we propose a method to extract candidate text CCs by combining text contours and stroke interior regions. After segmenting the image into non-smooth and smooth regions based on local contrast, text contour pixels in non-smooth regions are detached from adjacent non-text pixels by local binarization. Then, obvious non-text contours can be removed according to the spatial relationship of text and non-text contours. While smooth regions include stroke interior regions and non-text smooth regions, some non-text smooth regions can be easily removed because they are not surrounded by candidate text contours. At last, candidate text contours and stroke interior regions are combined to generate candidate text CCs. The CCs undergo CC filtering, text line grouping and line classification to give the text localization result. Experimental results on the born-digital dataset of ICDAR2013 robust reading competition demonstrate the efficiency and superiority of the proposed method.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127488207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Quality Prediction System for Large-Scale Digitisation Workflows 面向大规模数字化工作流的质量预测系统
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.82
C. Clausner, S. Pletschacher, A. Antonacopoulos
The feasibility of large-scale OCR projects can so far only be assessed by running pilot studies on subsets of the target document collections and measuring the success of different workflows based on precise ground truth, which can be very costly to produce in the required volume. The premise of this paper is that, as an alternative, quality prediction may be used to approximate the success of a given OCR workflow. A new system is thus presented where a classifier is trained using metadata, image and layout features in combination with measured success rates (based on minimal ground truth). Subsequently, only document images are required as input for the numeric prediction of the quality score (no ground truth required). This way, the system can be applied to any number of similar (unseen) documents in order to assess their suitability for being processed using the particular workflow. The usefulness of the system has been validated using a realistic dataset of historical newspaper pages.
到目前为止,大规模OCR项目的可行性只能通过在目标文档集合的子集上运行试点研究来评估,并基于精确的地面事实衡量不同工作流程的成功,这在所需的数量上可能是非常昂贵的。本文的前提是,作为一种选择,质量预测可以用来近似给定OCR工作流的成功。因此,提出了一个新的系统,其中分类器是使用元数据、图像和布局特征与测量的成功率(基于最小基础真值)相结合来训练的。随后,只需要文档图像作为质量分数的数字预测的输入(不需要真实值)。通过这种方式,系统可以应用于任意数量的类似(不可见的)文档,以便评估它们是否适合使用特定工作流进行处理。使用历史报纸页面的真实数据集验证了该系统的实用性。
{"title":"Quality Prediction System for Large-Scale Digitisation Workflows","authors":"C. Clausner, S. Pletschacher, A. Antonacopoulos","doi":"10.1109/DAS.2016.82","DOIUrl":"https://doi.org/10.1109/DAS.2016.82","url":null,"abstract":"The feasibility of large-scale OCR projects can so far only be assessed by running pilot studies on subsets of the target document collections and measuring the success of different workflows based on precise ground truth, which can be very costly to produce in the required volume. The premise of this paper is that, as an alternative, quality prediction may be used to approximate the success of a given OCR workflow. A new system is thus presented where a classifier is trained using metadata, image and layout features in combination with measured success rates (based on minimal ground truth). Subsequently, only document images are required as input for the numeric prediction of the quality score (no ground truth required). This way, the system can be applied to any number of similar (unseen) documents in order to assess their suitability for being processed using the particular workflow. The usefulness of the system has been validated using a realistic dataset of historical newspaper pages.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126241189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Semi-automatic Text and Graphics Extraction of Manga Using Eye Tracking Information 基于眼动追踪信息的漫画半自动文本和图形提取
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.72
Christophe Rigaud, Thanh Nam Le, J. Burie, J. Ogier, Shoya Ishimaru, M. Iwata, K. Kise
The popularity of storing, distributing and reading comic books electronically has made the task of comics analysis an interesting research problem. Different work have been carried out aiming at understanding their layout structure and the graphic content. However the results are still far from universally applicable, largely due to the huge variety in expression styles and page arrangement, especially in manga (Japanese comics). In this paper, we propose a comic image analysis approach using eye-tracking data recorded during manga reading sessions. As humans are extremely capable of interpreting the structured drawing content, and show different reading behaviors based on the nature of the content, their eye movements follow distinguishable patterns over text or graphic regions. Therefore, eye gaze data can add rich information to the understanding of the manga content. Experimental results show that the fixations and saccades indeed form consistent patterns among readers, and can be used for manga textual and graphical analysis.
以电子方式存储、分发和阅读漫画书的普及使得分析漫画书的任务成为一个有趣的研究问题。为了了解它们的布局结构和图形内容,进行了不同的工作。然而,这些结果还远远不能普遍适用,这主要是由于表达风格和页面安排的巨大差异,特别是在漫画(日本漫画)中。在本文中,我们提出了一种漫画图像分析方法,使用在漫画阅读过程中记录的眼动数据。由于人类对结构化的绘画内容具有极强的解读能力,并根据内容的性质表现出不同的阅读行为,因此他们的眼球运动在文本或图形区域上遵循可区分的模式。因此,眼睛注视数据可以为理解漫画内容增加丰富的信息。实验结果表明,注视和扫视确实在读者之间形成了一致的模式,可以用于漫画的文本和图形分析。
{"title":"Semi-automatic Text and Graphics Extraction of Manga Using Eye Tracking Information","authors":"Christophe Rigaud, Thanh Nam Le, J. Burie, J. Ogier, Shoya Ishimaru, M. Iwata, K. Kise","doi":"10.1109/DAS.2016.72","DOIUrl":"https://doi.org/10.1109/DAS.2016.72","url":null,"abstract":"The popularity of storing, distributing and reading comic books electronically has made the task of comics analysis an interesting research problem. Different work have been carried out aiming at understanding their layout structure and the graphic content. However the results are still far from universally applicable, largely due to the huge variety in expression styles and page arrangement, especially in manga (Japanese comics). In this paper, we propose a comic image analysis approach using eye-tracking data recorded during manga reading sessions. As humans are extremely capable of interpreting the structured drawing content, and show different reading behaviors based on the nature of the content, their eye movements follow distinguishable patterns over text or graphic regions. Therefore, eye gaze data can add rich information to the understanding of the manga content. Experimental results show that the fixations and saccades indeed form consistent patterns among readers, and can be used for manga textual and graphical analysis.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130239364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2016 12th IAPR Workshop on Document Analysis Systems (DAS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1