首页 > 最新文献

2016 12th IAPR Workshop on Document Analysis Systems (DAS)最新文献

英文 中文
Text Detection in Arabic News Video Based on SWT Operator and Convolutional Auto-Encoders 基于SWT算子和卷积自编码器的阿拉伯语新闻视频文本检测
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.80
Oussama Zayene, Mathias Seuret, Sameh Masmoudi Touj, J. Hennebert, R. Ingold, N. Amara
Text detection in videos is a challenging problem due to variety of text specificities, presence of complex background and anti-aliasing/compression artifacts. In this paper, we present an approach for horizontally aligned artificial text detection in Arabic news video. The novelty of this method revolves around the combination of two techniques: an adapted version of the Stroke Width Transform (SWT) algorithm and a convolutional auto-encoder (CAE). First, the SWT extracts text candidates' components. They are then filtered and grouped using geometric constraints and Stroke Width information. Second, the CAE is used as an unsupervised feature learning method to discriminate the obtained textline candidates as text or non-text. We assess the proposed approach on the public Arabic-Text-in-Video database (AcTiV-DB) using different evaluation protocols including data from several TV channels. Experiments indicate that the use of learned features significantly improves the text detection results.
视频中的文本检测是一个具有挑战性的问题,由于各种文本的特殊性,复杂的背景和抗混叠/压缩伪影的存在。本文提出了一种阿拉伯语新闻视频中水平对齐的人工文本检测方法。这种方法的新颖之处在于结合了两种技术:一种改进型的笔画宽度变换(SWT)算法和一种卷积自编码器(CAE)。首先,SWT提取文本候选组件。然后使用几何约束和笔画宽度信息对它们进行过滤和分组。其次,将CAE作为一种无监督特征学习方法来区分获得的文本候选文本是文本还是非文本。我们使用不同的评估协议,包括来自几个电视频道的数据,在公共阿拉伯文本视频数据库(AcTiV-DB)上评估了所提出的方法。实验表明,使用学习到的特征显著提高了文本检测结果。
{"title":"Text Detection in Arabic News Video Based on SWT Operator and Convolutional Auto-Encoders","authors":"Oussama Zayene, Mathias Seuret, Sameh Masmoudi Touj, J. Hennebert, R. Ingold, N. Amara","doi":"10.1109/DAS.2016.80","DOIUrl":"https://doi.org/10.1109/DAS.2016.80","url":null,"abstract":"Text detection in videos is a challenging problem due to variety of text specificities, presence of complex background and anti-aliasing/compression artifacts. In this paper, we present an approach for horizontally aligned artificial text detection in Arabic news video. The novelty of this method revolves around the combination of two techniques: an adapted version of the Stroke Width Transform (SWT) algorithm and a convolutional auto-encoder (CAE). First, the SWT extracts text candidates' components. They are then filtered and grouped using geometric constraints and Stroke Width information. Second, the CAE is used as an unsupervised feature learning method to discriminate the obtained textline candidates as text or non-text. We assess the proposed approach on the public Arabic-Text-in-Video database (AcTiV-DB) using different evaluation protocols including data from several TV channels. Experiments indicate that the use of learned features significantly improves the text detection results.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121664488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
CNN Based Transfer Learning for Historical Chinese Character Recognition 基于CNN的古汉字识别迁移学习
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.52
Yejun Tang, Liangrui Peng, Qianxiong Xu, Yanwei Wang, Akio Furuhata
Historical Chinese character recognition has been suffering from the problem of lacking sufficient labeled training samples. A transfer learning method based on Convolutional Neural Network (CNN) for historical Chinese character recognition is proposed in this paper. A CNN model L is trained by printed Chinese character samples in the source domain. The network structure and weights of model L are used to initialize another CNN model T, which is regarded as the feature extractor and classifier in the target domain. The model T is then fine-tuned by a few labeled historical or handwritten Chinese character samples, and used for final evaluation in the target domain. Several experiments regarding essential factors of the CNNbased transfer learning method are conducted, showing that the proposed method is effective.
历史汉字识别一直受到缺乏足够标记训练样本的困扰。提出了一种基于卷积神经网络(CNN)的历史汉字识别迁移学习方法。CNN模型L是通过源域的打印汉字样本进行训练的。利用模型L的网络结构和权值初始化另一个CNN模型T,作为目标域的特征提取器和分类器。然后通过一些标记的历史或手写汉字样本对模型T进行微调,并用于目标域中的最终评估。针对基于cnn的迁移学习方法的几个关键因素进行了实验,结果表明该方法是有效的。
{"title":"CNN Based Transfer Learning for Historical Chinese Character Recognition","authors":"Yejun Tang, Liangrui Peng, Qianxiong Xu, Yanwei Wang, Akio Furuhata","doi":"10.1109/DAS.2016.52","DOIUrl":"https://doi.org/10.1109/DAS.2016.52","url":null,"abstract":"Historical Chinese character recognition has been suffering from the problem of lacking sufficient labeled training samples. A transfer learning method based on Convolutional Neural Network (CNN) for historical Chinese character recognition is proposed in this paper. A CNN model L is trained by printed Chinese character samples in the source domain. The network structure and weights of model L are used to initialize another CNN model T, which is regarded as the feature extractor and classifier in the target domain. The model T is then fine-tuned by a few labeled historical or handwritten Chinese character samples, and used for final evaluation in the target domain. Several experiments regarding essential factors of the CNNbased transfer learning method are conducted, showing that the proposed method is effective.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"388 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122178580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Understanding Line Plots Using Bayesian Network 使用贝叶斯网络理解线形图
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.73
Rathin Radhakrishnan Nair, Nishant Sankaran, Ifeoma Nwogu, V. Govindaraju
Information graphics, such as bar charts, graphs, plots etc. in scientific documents primarily facilitate better understanding of information. Graphics are a key component in technical documents as they are simplified representations of complex ideas. When the traditional optical character recognition (OCR) systems is used on digitized documents, we lose the ideas conveyed in these information graphics since OCRs typically work only on text. And although in more recent times, tools have been developed to extract information graphics from pdf files, they still do not intelligently interpret the contents of the extracted graphics. We therefore propose a method for identifying the intended messages of line plots using a Bayesian network. We accomplish this by first extracting a dense set of points in from a line plot and then represent the entire line plot as a sequence of trends. We then implement a Bayesian network for reasoning about the messages conveyed by the line plots and their trends. We validate our approach by performing experiments on a dataset obtained from computer science conference publications and evaluate the performance of the network against the messages generated by human end users. The resulting intended message gives holistic information about the line plot(s) as well as lower level information about the trends that make up the plot.
科学文献中的信息图形,如条形图、图形、绘图等,主要有助于更好地理解信息。图形是技术文档中的关键组成部分,因为它们是复杂思想的简化表示。当传统的光学字符识别(OCR)系统用于数字化文档时,我们失去了这些信息图形所传达的思想,因为OCR通常只对文本起作用。尽管最近开发了从pdf文件中提取信息图形的工具,但它们仍然不能智能地解释所提取图形的内容。因此,我们提出了一种使用贝叶斯网络识别线形图的预期信息的方法。我们通过首先从线形图中提取密集的点集,然后将整个线形图表示为趋势序列来实现这一点。然后,我们实现了一个贝叶斯网络来推理线形图及其趋势所传达的信息。我们通过对从计算机科学会议出版物中获得的数据集进行实验来验证我们的方法,并根据人类最终用户生成的消息评估网络的性能。由此产生的预期信息提供了关于线形图的整体信息,以及关于构成该图的趋势的较低级别的信息。
{"title":"Understanding Line Plots Using Bayesian Network","authors":"Rathin Radhakrishnan Nair, Nishant Sankaran, Ifeoma Nwogu, V. Govindaraju","doi":"10.1109/DAS.2016.73","DOIUrl":"https://doi.org/10.1109/DAS.2016.73","url":null,"abstract":"Information graphics, such as bar charts, graphs, plots etc. in scientific documents primarily facilitate better understanding of information. Graphics are a key component in technical documents as they are simplified representations of complex ideas. When the traditional optical character recognition (OCR) systems is used on digitized documents, we lose the ideas conveyed in these information graphics since OCRs typically work only on text. And although in more recent times, tools have been developed to extract information graphics from pdf files, they still do not intelligently interpret the contents of the extracted graphics. We therefore propose a method for identifying the intended messages of line plots using a Bayesian network. We accomplish this by first extracting a dense set of points in from a line plot and then represent the entire line plot as a sequence of trends. We then implement a Bayesian network for reasoning about the messages conveyed by the line plots and their trends. We validate our approach by performing experiments on a dataset obtained from computer science conference publications and evaluate the performance of the network against the messages generated by human end users. The resulting intended message gives holistic information about the line plot(s) as well as lower level information about the trends that make up the plot.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133432344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
SDK Reinvented: Document Image Analysis Methods as RESTful Web Services 重新发明的SDK:文档图像分析方法作为RESTful Web服务
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.56
Marcel Würsch, R. Ingold, M. Liwicki
Document Image Analysis (DIA) systems become ever more advanced, but also more complex -- computationally, and logically. This increases the difficulty of integrating existing state-of-the-art approaches into new research or into practical workflows. The current approach to sharing software is publishing source code -- leaving the burden to the integrator -- or creating a Software Development Kit (SDK) which is often restricted to one programming language. We present DIVAServices a framework for sharing and accessing DIA methods within the research community and beyond. Using a RESTful web service architecture we provide access to the methods, leading to only one system on which the binaries of methods need to be maintained. All it takes for a developer to use an algorithm is a simple HTTP request with the image data and parameters for the method and they will receive the computed results in a format that allows for seamless integration into any kind of workflow or for further processing. Furthermore, DIVAServices is open-source, enabling other research groups or libraries to host their own instance in their environment. Using this framework, future DIA systems can be built on the shoulders of well tested algorithms, accessible to everyone.
文档图像分析(DIA)系统变得越来越先进,但在计算和逻辑上也越来越复杂。这增加了将现有的最先进的方法集成到新的研究或实际工作流程中的难度。目前共享软件的方法是发布源代码——把负担留给集成商——或者创建一个通常仅限于一种编程语言的软件开发工具包(SDK)。我们为DIVAServices提供了一个框架,用于在研究界内外共享和访问DIA方法。使用RESTful web服务体系结构,我们提供对方法的访问,从而只需要在一个系统上维护方法的二进制文件。开发人员使用算法所需要的只是一个简单的带有图像数据和方法参数的HTTP请求,他们将以一种格式接收计算结果,这种格式允许无缝集成到任何类型的工作流或进行进一步处理。此外,DIVAServices是开源的,使其他研究小组或图书馆能够在他们的环境中托管自己的实例。使用这个框架,未来的DIA系统可以建立在经过良好测试的算法的肩膀上,每个人都可以访问。
{"title":"SDK Reinvented: Document Image Analysis Methods as RESTful Web Services","authors":"Marcel Würsch, R. Ingold, M. Liwicki","doi":"10.1109/DAS.2016.56","DOIUrl":"https://doi.org/10.1109/DAS.2016.56","url":null,"abstract":"Document Image Analysis (DIA) systems become ever more advanced, but also more complex -- computationally, and logically. This increases the difficulty of integrating existing state-of-the-art approaches into new research or into practical workflows. The current approach to sharing software is publishing source code -- leaving the burden to the integrator -- or creating a Software Development Kit (SDK) which is often restricted to one programming language. We present DIVAServices a framework for sharing and accessing DIA methods within the research community and beyond. Using a RESTful web service architecture we provide access to the methods, leading to only one system on which the binaries of methods need to be maintained. All it takes for a developer to use an algorithm is a simple HTTP request with the image data and parameters for the method and they will receive the computed results in a format that allows for seamless integration into any kind of workflow or for further processing. Furthermore, DIVAServices is open-source, enabling other research groups or libraries to host their own instance in their environment. Using this framework, future DIA systems can be built on the shoulders of well tested algorithms, accessible to everyone.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134367542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Multilingual OCR for Indic Scripts 印度脚本的多语言OCR
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.68
Minesh Mathew, A. Singh, C. V. Jawahar
In Indian scenario, a document analysis system has to support multiple languages at the same time. With emerging multilingualism in urban India, often bilingual, trilingual or even more languages need to be supported. This demands development of a multilingual OCR system which can work seamlessly across Indic scripts. In our approach the script is identified at word level, prior to the recognition of the word. An end-to-end RNN based architecture which can detect the script and recognize the text in a segmentation-free manner is proposed for this purpose. We demonstrate the approach for 12 Indian languages and English. It is observed that, even with the similar architecture, performance on Indian languages are poorer compared to English. We investigate this further. Our approach is evaluated on a large corpus comprising of thousands of pages. The Hindi OCR is compared with other popular OCRs for the language, as a further testimony for the efficacy of our method.
在印度场景中,文档分析系统必须同时支持多种语言。随着印度城市出现多语言现象,通常需要支持双语、三语甚至更多语言。这就要求开发一种多语言OCR系统,它可以无缝地跨印度脚本工作。在我们的方法中,在单词识别之前,在单词级别识别脚本。为此,提出了一种端到端的RNN结构,该结构可以检测脚本并以无分割的方式识别文本。我们为12种印度语言和英语演示了这种方法。可以观察到,即使在类似的架构下,印度语言的表现也比英语差。我们对此进行进一步调查。我们的方法在包含数千页的大型语料库上进行了评估。将印地语OCR与该语言的其他流行OCR进行比较,进一步证明我们的方法的有效性。
{"title":"Multilingual OCR for Indic Scripts","authors":"Minesh Mathew, A. Singh, C. V. Jawahar","doi":"10.1109/DAS.2016.68","DOIUrl":"https://doi.org/10.1109/DAS.2016.68","url":null,"abstract":"In Indian scenario, a document analysis system has to support multiple languages at the same time. With emerging multilingualism in urban India, often bilingual, trilingual or even more languages need to be supported. This demands development of a multilingual OCR system which can work seamlessly across Indic scripts. In our approach the script is identified at word level, prior to the recognition of the word. An end-to-end RNN based architecture which can detect the script and recognize the text in a segmentation-free manner is proposed for this purpose. We demonstrate the approach for 12 Indian languages and English. It is observed that, even with the similar architecture, performance on Indian languages are poorer compared to English. We investigate this further. Our approach is evaluated on a large corpus comprising of thousands of pages. The Hindi OCR is compared with other popular OCRs for the language, as a further testimony for the efficacy of our method.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121120983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
MSIO: MultiSpectral Document Image BinarizatIOn MSIO:多光谱文档图像二值化
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.39
Markus Diem, Fabian Hollaus, Robert Sablatnig
MultiSpectral (MS) imaging enriches document digitization by increasing the spectral resolution. We present a methodology which detects a target ink in document images by taking into account this additional information. The proposed method performs a rough foreground estimation to localize possible ink regions. Then, the Adaptive Coherence Estimator (ACE), a target detection algorithm, transforms the MS input space into a single gray-scale image where values close to one indicate ink. A spatial segmentation using GrabCut on the target detection's output is computed to create the final binary image. To find a baseline performance, the method is evaluated on the three most recent Document Image Binarization COntests (DIBCO) despite the fact that they only provide RGB images. In addition, an evaluation on three publicly available MS datasets is carried out. The presented methodology achieved the highest performance at the MultiSpectral Text Extraction (MS-TEx) contest 2015.
多光谱(MS)成像通过提高光谱分辨率丰富了文档数字化。我们提出了一种方法,该方法通过考虑到这些附加信息来检测文档图像中的目标墨水。该方法通过粗略的前景估计来定位可能的油墨区域。然后,自适应相干估计器(ACE),一种目标检测算法,将MS输入空间转换为单个灰度图像,其中接近1的值表示墨水。利用GrabCut对目标检测的输出进行空间分割计算,以创建最终的二值图像。为了找到一个基准性能,我们在最近的三个文档图像二值化竞赛(DIBCO)上对该方法进行了评估,尽管它们只提供RGB图像。此外,对三个公开可用的MS数据集进行了评估。该方法在2015年多光谱文本提取(MS-TEx)竞赛中取得了最高性能。
{"title":"MSIO: MultiSpectral Document Image BinarizatIOn","authors":"Markus Diem, Fabian Hollaus, Robert Sablatnig","doi":"10.1109/DAS.2016.39","DOIUrl":"https://doi.org/10.1109/DAS.2016.39","url":null,"abstract":"MultiSpectral (MS) imaging enriches document digitization by increasing the spectral resolution. We present a methodology which detects a target ink in document images by taking into account this additional information. The proposed method performs a rough foreground estimation to localize possible ink regions. Then, the Adaptive Coherence Estimator (ACE), a target detection algorithm, transforms the MS input space into a single gray-scale image where values close to one indicate ink. A spatial segmentation using GrabCut on the target detection's output is computed to create the final binary image. To find a baseline performance, the method is evaluated on the three most recent Document Image Binarization COntests (DIBCO) despite the fact that they only provide RGB images. In addition, an evaluation on three publicly available MS datasets is carried out. The presented methodology achieved the highest performance at the MultiSpectral Text Extraction (MS-TEx) contest 2015.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123746226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
An Interactive Approach with Off-Line and On-Line Handwritten Text Recognition Combination for Transcribing Historical Documents 联机与离线手写体文本识别相结合的历史文献转录交互方法
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.45
Emilio Granell, Verónica Romero, C. Martínez-Hinarejos
Automatic transcription of historical documents is becoming an important research topic, specially because of the increasing number of digitised historical documents that libraries and archives are publishing. However, state-of-the-art handwritten text recognition systems are far from being perfect. Therefore, to have perfect transcriptions, human expert revision is required to really produce a transcription of standard quality. In this context, an interactive assistive scenario, where the automatic system and the human transcriber cooperate to generate the perfect transcription, would allow for a more effective approach. In this paper we present a multimodal interactive transcription system where user feedback is provided by means of touchscreen pen strokes, traditional keyboard and mouse operations. The combination of both the main and the feedback data stream is based on the use of Confusion Networks derived from the output of the on-line and off-line handwritten text recognition systems. The use of the proposed combination help to optimise overall performance and usability.
历史文献的自动抄写正成为一个重要的研究课题,特别是由于图书馆和档案馆正在出版越来越多的数字化历史文献。然而,最先进的手写文本识别系统还远远不够完美。因此,要有完美的转录,需要人类专家的修订,才能真正产生标准质量的转录。在这种情况下,一个交互式的辅助方案,其中自动系统和人类转录合作产生完美的转录,将允许一个更有效的方法。在本文中,我们提出了一个多模式的交互式转录系统,其中用户反馈是通过触摸屏笔的笔划,传统的键盘和鼠标操作提供的。主数据流和反馈数据流的结合是基于在线和离线手写文本识别系统输出的混淆网络的使用。使用建议的组合有助于优化整体性能和可用性。
{"title":"An Interactive Approach with Off-Line and On-Line Handwritten Text Recognition Combination for Transcribing Historical Documents","authors":"Emilio Granell, Verónica Romero, C. Martínez-Hinarejos","doi":"10.1109/DAS.2016.45","DOIUrl":"https://doi.org/10.1109/DAS.2016.45","url":null,"abstract":"Automatic transcription of historical documents is becoming an important research topic, specially because of the increasing number of digitised historical documents that libraries and archives are publishing. However, state-of-the-art handwritten text recognition systems are far from being perfect. Therefore, to have perfect transcriptions, human expert revision is required to really produce a transcription of standard quality. In this context, an interactive assistive scenario, where the automatic system and the human transcriber cooperate to generate the perfect transcription, would allow for a more effective approach. In this paper we present a multimodal interactive transcription system where user feedback is provided by means of touchscreen pen strokes, traditional keyboard and mouse operations. The combination of both the main and the feedback data stream is based on the use of Confusion Networks derived from the output of the on-line and off-line handwritten text recognition systems. The use of the proposed combination help to optimise overall performance and usability.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116985684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
OCR Accuracy Prediction Method Based on Blur Estimation 基于模糊估计的OCR精度预测方法
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.50
V. C. Kieu, F. Cloppet, N. Vincent
In this paper, we propose an OCR accuracy prediction method based on a local blur estimation since blur is one of the important factors that mostly damage OCR accuracy. First, we apply the blur estimation on synthetic blurred images by using Gaussian and motion blur in order to investigate the relation between blur effect and character size regarding OCR accuracy. This relation is considered as a blur-character size feature to define a classifier. Finally, the classifier can separate characters of a given document into three classes: readable, intermediate, and non-readable classes. Therefore, the quality score of the document is inferred from the three classes. The proposed method is evaluated on a published database and on an industrial one. The correlation with OCR accuracy is also given to compare with the state-of-the-art methods.
鉴于模糊是影响OCR精度的重要因素之一,本文提出了一种基于局部模糊估计的OCR精度预测方法。首先,我们利用高斯模糊和运动模糊对合成模糊图像进行模糊估计,研究模糊效果与字符大小对OCR精度的关系。这种关系被认为是定义分类器的模糊字符大小特征。最后,分类器可以将给定文档的字符分为三类:可读类、中间类和不可读类。因此,从这三个类别中推断出文档的质量分数。在一个已发表的数据库和一个工业数据库上对所提出的方法进行了评估。还给出了与OCR精度的相关性,以便与最先进的方法进行比较。
{"title":"OCR Accuracy Prediction Method Based on Blur Estimation","authors":"V. C. Kieu, F. Cloppet, N. Vincent","doi":"10.1109/DAS.2016.50","DOIUrl":"https://doi.org/10.1109/DAS.2016.50","url":null,"abstract":"In this paper, we propose an OCR accuracy prediction method based on a local blur estimation since blur is one of the important factors that mostly damage OCR accuracy. First, we apply the blur estimation on synthetic blurred images by using Gaussian and motion blur in order to investigate the relation between blur effect and character size regarding OCR accuracy. This relation is considered as a blur-character size feature to define a classifier. Finally, the classifier can separate characters of a given document into three classes: readable, intermediate, and non-readable classes. Therefore, the quality score of the document is inferred from the three classes. The proposed method is evaluated on a published database and on an industrial one. The correlation with OCR accuracy is also given to compare with the state-of-the-art methods.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129195403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Natural Scene Character Recognition Using Robust PCA and Sparse Representation 基于鲁棒PCA和稀疏表示的自然场景字符识别
Pub Date : 2016-04-01 DOI: 10.1109/DAS.2016.32
Zheng Zhang, Yong Xu, Cheng-Lin Liu
Natural scene character recognition is challenging due to the cluttered background, which is hard to separate from text. In this paper, we propose a novel method for robust scene character recognition. Specifically, we first use robust principal component analysis (PCA) to denoise character image by recovering the missing low-rank component and filtering out the sparse noise term, and then use a simple Histogram of oriented Gradient (HOG) to perform image feature extraction, and finally, use a sparse representation based classifier for recognition. In experiments on four public datasets, namely the Char74K dataset, ICADAR 2003 robust reading dataset, Street View Text (SVT) dataset and IIIT5K-word dataset, our method was demonstrated to be competitive with the state-of-the-art methods.
由于背景杂乱,难以从文本中分离出来,自然场景字符识别具有挑战性。本文提出了一种新的鲁棒场景字符识别方法。具体而言,我们首先使用鲁棒主成分分析(PCA)通过恢复缺失的低秩分量并过滤掉稀疏噪声项来对特征图像进行降噪,然后使用简单的定向梯度直方图(HOG)进行图像特征提取,最后使用基于稀疏表示的分类器进行识别。在Char74K数据集、ICADAR 2003鲁棒阅读数据集、街景文本(SVT)数据集和IIIT5K-word数据集四个公共数据集上的实验中,我们的方法与最先进的方法相比具有竞争力。
{"title":"Natural Scene Character Recognition Using Robust PCA and Sparse Representation","authors":"Zheng Zhang, Yong Xu, Cheng-Lin Liu","doi":"10.1109/DAS.2016.32","DOIUrl":"https://doi.org/10.1109/DAS.2016.32","url":null,"abstract":"Natural scene character recognition is challenging due to the cluttered background, which is hard to separate from text. In this paper, we propose a novel method for robust scene character recognition. Specifically, we first use robust principal component analysis (PCA) to denoise character image by recovering the missing low-rank component and filtering out the sparse noise term, and then use a simple Histogram of oriented Gradient (HOG) to perform image feature extraction, and finally, use a sparse representation based classifier for recognition. In experiments on four public datasets, namely the Char74K dataset, ICADAR 2003 robust reading dataset, Street View Text (SVT) dataset and IIIT5K-word dataset, our method was demonstrated to be competitive with the state-of-the-art methods.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"82 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123237194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Visual Analysis System for Features and Distances Qualitative Assessment: Application to Word Image Matching 特征与距离定性评价的视觉分析系统:在文字图像匹配中的应用
Pub Date : 2016-04-01 DOI: 10.1109/DAS.2016.17
Frédéric Rayar, T. Mondal, Sabine Barrat, F. Bouali, G. Venturini
In this paper, a visual analysis system to qualitatively assess the features and distance functions that are used for calculating dissimilarity between two word images is presented. Computation of dissimilarity between two images is the prerequisite for image matching, indexing and retrieval problems. First, the features are extracted from the word images and a distance between each image to others is computed and represented in a matrix form. Then, based on this distance matrix, a proximity graph is built to structure the set of word images and highlight their topology. The proposed visual analysis system is a web based platform that allows visualisation and interactions on the obtained graph. This interactive visualisation tool inherently helps users to quickly analyse and understand the relevance and robustness of selected features and corresponding distance function in a unsupervised way, i.e. without any ground truth. Experiments are performed on a handwritten dataset of segmented words. Three types of features and four distance functions are considered to describe and compare the word images. Theses material are leveraged to evaluate the relevance of the built graph, and the usefulness of the platform.
本文提出了一种视觉分析系统,用于定性地评估用于计算两个词图像之间不相似度的特征和距离函数。计算两幅图像之间的不相似度是图像匹配、索引和检索问题的前提。首先,从单词图像中提取特征,计算每个图像与其他图像之间的距离,并以矩阵形式表示。然后,基于这个距离矩阵,构建一个接近图来构建单词图像集并突出显示它们的拓扑结构。提出的可视化分析系统是一个基于web的平台,允许对获得的图形进行可视化和交互。这种交互式可视化工具本质上帮助用户以无监督的方式快速分析和理解所选特征和相应距离函数的相关性和鲁棒性,即没有任何基础真理。实验在一个手写的分词数据集上进行。考虑了三种特征和四种距离函数来描述和比较单词图像。利用这些材料来评估构建图的相关性和平台的有用性。
{"title":"Visual Analysis System for Features and Distances Qualitative Assessment: Application to Word Image Matching","authors":"Frédéric Rayar, T. Mondal, Sabine Barrat, F. Bouali, G. Venturini","doi":"10.1109/DAS.2016.17","DOIUrl":"https://doi.org/10.1109/DAS.2016.17","url":null,"abstract":"In this paper, a visual analysis system to qualitatively assess the features and distance functions that are used for calculating dissimilarity between two word images is presented. Computation of dissimilarity between two images is the prerequisite for image matching, indexing and retrieval problems. First, the features are extracted from the word images and a distance between each image to others is computed and represented in a matrix form. Then, based on this distance matrix, a proximity graph is built to structure the set of word images and highlight their topology. The proposed visual analysis system is a web based platform that allows visualisation and interactions on the obtained graph. This interactive visualisation tool inherently helps users to quickly analyse and understand the relevance and robustness of selected features and corresponding distance function in a unsupervised way, i.e. without any ground truth. Experiments are performed on a handwritten dataset of segmented words. Three types of features and four distance functions are considered to describe and compare the word images. Theses material are leveraged to evaluate the relevance of the built graph, and the usefulness of the platform.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125559351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2016 12th IAPR Workshop on Document Analysis Systems (DAS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1