首页 > 最新文献

2016 12th IAPR Workshop on Document Analysis Systems (DAS)最新文献

英文 中文
Making Europe's Historical Newspapers Searchable 让欧洲历史报纸变得可搜索
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.83
Clemens Neudecker, A. Antonacopoulos
This paper provides a rare glimpse into the overall approach for the refinement, i.e. the enrichment of scanned historical newspapers with text and layout recognition, in the Europeana Newspapers project. Within three years, the project processed more than 10 million pages of historical newspapers from 12 national and major libraries to produce the largest open access and fully searchable text collection of digital historical newspapers in Europe. In this, a wide variety of legal, logistical, technical and other challenges were encountered. After introducing the background issues in newspaper digitization in Europe, the paper discusses the technical aspects of refinement in greater detail. It explains what decisions were taken in the design of the large-scale processing workflow to address these challenges, what were the results produced and what were identified as best practices.
本文提供了一个难得的一瞥整体方法的细化,即丰富扫描历史报纸与文本和布局识别,在欧洲报纸项目。在三年内,该项目处理了来自12个国家和主要图书馆的1000多万页历史报纸,制作了欧洲最大的开放获取和完全可搜索的数字历史报纸文本集。在这方面,遇到了各种各样的法律、后勤、技术和其他挑战。在介绍了欧洲报纸数字化的背景问题后,本文更详细地讨论了细化的技术方面。它解释了在设计大规模处理工作流时采取了哪些决策来应对这些挑战,产生了哪些结果,以及哪些被确定为最佳实践。
{"title":"Making Europe's Historical Newspapers Searchable","authors":"Clemens Neudecker, A. Antonacopoulos","doi":"10.1109/DAS.2016.83","DOIUrl":"https://doi.org/10.1109/DAS.2016.83","url":null,"abstract":"This paper provides a rare glimpse into the overall approach for the refinement, i.e. the enrichment of scanned historical newspapers with text and layout recognition, in the Europeana Newspapers project. Within three years, the project processed more than 10 million pages of historical newspapers from 12 national and major libraries to produce the largest open access and fully searchable text collection of digital historical newspapers in Europe. In this, a wide variety of legal, logistical, technical and other challenges were encountered. After introducing the background issues in newspaper digitization in Europe, the paper discusses the technical aspects of refinement in greater detail. It explains what decisions were taken in the design of the large-scale processing workflow to address these challenges, what were the results produced and what were identified as best practices.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121259410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Election Tally Sheets Processing System 选举点票表处理系统
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.37
J. I. Toledo, A. Fornés, Jordi Cucurull-Juan, J. Lladós
In paper based elections, manual tallies at polling station level produce myriads of documents. These documents share a common form-like structure and a reduced vocabulary worldwide. On the other hand, each tally sheet is filled by a different writer and on different countries, different scripts are used. We present a complete document analysis system for electoral tally sheet processing combining state of the art techniques with a new handwriting recognition subprocess based on unsupervised feature discovery with Variational Autoencoders and sequence classification with BLSTM neural networks. The whole system is designed to be script independent and allows a fast and reliable results consolidation process with reduced operational cost.
在纸质选举中,投票站的人工点票产生了无数的文件。这些文档在全球范围内共享一个通用的类表单结构和简化的词汇表。另一方面,每个计数表由不同的作者填写,在不同的国家,使用不同的文字。我们提出了一个完整的文件分析系统,用于选举点票处理,结合了最先进的技术和新的手写识别子过程,该子过程基于无监督特征发现与变分自编码器和序列分类与BLSTM神经网络。整个系统被设计成独立于脚本,并允许快速可靠的结果整合过程,同时降低操作成本。
{"title":"Election Tally Sheets Processing System","authors":"J. I. Toledo, A. Fornés, Jordi Cucurull-Juan, J. Lladós","doi":"10.1109/DAS.2016.37","DOIUrl":"https://doi.org/10.1109/DAS.2016.37","url":null,"abstract":"In paper based elections, manual tallies at polling station level produce myriads of documents. These documents share a common form-like structure and a reduced vocabulary worldwide. On the other hand, each tally sheet is filled by a different writer and on different countries, different scripts are used. We present a complete document analysis system for electoral tally sheet processing combining state of the art techniques with a new handwriting recognition subprocess based on unsupervised feature discovery with Variational Autoencoders and sequence classification with BLSTM neural networks. The whole system is designed to be script independent and allows a fast and reliable results consolidation process with reduced operational cost.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114899520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding Line Plots Using Bayesian Network 使用贝叶斯网络理解线形图
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.73
Rathin Radhakrishnan Nair, Nishant Sankaran, Ifeoma Nwogu, V. Govindaraju
Information graphics, such as bar charts, graphs, plots etc. in scientific documents primarily facilitate better understanding of information. Graphics are a key component in technical documents as they are simplified representations of complex ideas. When the traditional optical character recognition (OCR) systems is used on digitized documents, we lose the ideas conveyed in these information graphics since OCRs typically work only on text. And although in more recent times, tools have been developed to extract information graphics from pdf files, they still do not intelligently interpret the contents of the extracted graphics. We therefore propose a method for identifying the intended messages of line plots using a Bayesian network. We accomplish this by first extracting a dense set of points in from a line plot and then represent the entire line plot as a sequence of trends. We then implement a Bayesian network for reasoning about the messages conveyed by the line plots and their trends. We validate our approach by performing experiments on a dataset obtained from computer science conference publications and evaluate the performance of the network against the messages generated by human end users. The resulting intended message gives holistic information about the line plot(s) as well as lower level information about the trends that make up the plot.
科学文献中的信息图形,如条形图、图形、绘图等,主要有助于更好地理解信息。图形是技术文档中的关键组成部分,因为它们是复杂思想的简化表示。当传统的光学字符识别(OCR)系统用于数字化文档时,我们失去了这些信息图形所传达的思想,因为OCR通常只对文本起作用。尽管最近开发了从pdf文件中提取信息图形的工具,但它们仍然不能智能地解释所提取图形的内容。因此,我们提出了一种使用贝叶斯网络识别线形图的预期信息的方法。我们通过首先从线形图中提取密集的点集,然后将整个线形图表示为趋势序列来实现这一点。然后,我们实现了一个贝叶斯网络来推理线形图及其趋势所传达的信息。我们通过对从计算机科学会议出版物中获得的数据集进行实验来验证我们的方法,并根据人类最终用户生成的消息评估网络的性能。由此产生的预期信息提供了关于线形图的整体信息,以及关于构成该图的趋势的较低级别的信息。
{"title":"Understanding Line Plots Using Bayesian Network","authors":"Rathin Radhakrishnan Nair, Nishant Sankaran, Ifeoma Nwogu, V. Govindaraju","doi":"10.1109/DAS.2016.73","DOIUrl":"https://doi.org/10.1109/DAS.2016.73","url":null,"abstract":"Information graphics, such as bar charts, graphs, plots etc. in scientific documents primarily facilitate better understanding of information. Graphics are a key component in technical documents as they are simplified representations of complex ideas. When the traditional optical character recognition (OCR) systems is used on digitized documents, we lose the ideas conveyed in these information graphics since OCRs typically work only on text. And although in more recent times, tools have been developed to extract information graphics from pdf files, they still do not intelligently interpret the contents of the extracted graphics. We therefore propose a method for identifying the intended messages of line plots using a Bayesian network. We accomplish this by first extracting a dense set of points in from a line plot and then represent the entire line plot as a sequence of trends. We then implement a Bayesian network for reasoning about the messages conveyed by the line plots and their trends. We validate our approach by performing experiments on a dataset obtained from computer science conference publications and evaluate the performance of the network against the messages generated by human end users. The resulting intended message gives holistic information about the line plot(s) as well as lower level information about the trends that make up the plot.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133432344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
SDK Reinvented: Document Image Analysis Methods as RESTful Web Services 重新发明的SDK:文档图像分析方法作为RESTful Web服务
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.56
Marcel Würsch, R. Ingold, M. Liwicki
Document Image Analysis (DIA) systems become ever more advanced, but also more complex -- computationally, and logically. This increases the difficulty of integrating existing state-of-the-art approaches into new research or into practical workflows. The current approach to sharing software is publishing source code -- leaving the burden to the integrator -- or creating a Software Development Kit (SDK) which is often restricted to one programming language. We present DIVAServices a framework for sharing and accessing DIA methods within the research community and beyond. Using a RESTful web service architecture we provide access to the methods, leading to only one system on which the binaries of methods need to be maintained. All it takes for a developer to use an algorithm is a simple HTTP request with the image data and parameters for the method and they will receive the computed results in a format that allows for seamless integration into any kind of workflow or for further processing. Furthermore, DIVAServices is open-source, enabling other research groups or libraries to host their own instance in their environment. Using this framework, future DIA systems can be built on the shoulders of well tested algorithms, accessible to everyone.
文档图像分析(DIA)系统变得越来越先进,但在计算和逻辑上也越来越复杂。这增加了将现有的最先进的方法集成到新的研究或实际工作流程中的难度。目前共享软件的方法是发布源代码——把负担留给集成商——或者创建一个通常仅限于一种编程语言的软件开发工具包(SDK)。我们为DIVAServices提供了一个框架,用于在研究界内外共享和访问DIA方法。使用RESTful web服务体系结构,我们提供对方法的访问,从而只需要在一个系统上维护方法的二进制文件。开发人员使用算法所需要的只是一个简单的带有图像数据和方法参数的HTTP请求,他们将以一种格式接收计算结果,这种格式允许无缝集成到任何类型的工作流或进行进一步处理。此外,DIVAServices是开源的,使其他研究小组或图书馆能够在他们的环境中托管自己的实例。使用这个框架,未来的DIA系统可以建立在经过良好测试的算法的肩膀上,每个人都可以访问。
{"title":"SDK Reinvented: Document Image Analysis Methods as RESTful Web Services","authors":"Marcel Würsch, R. Ingold, M. Liwicki","doi":"10.1109/DAS.2016.56","DOIUrl":"https://doi.org/10.1109/DAS.2016.56","url":null,"abstract":"Document Image Analysis (DIA) systems become ever more advanced, but also more complex -- computationally, and logically. This increases the difficulty of integrating existing state-of-the-art approaches into new research or into practical workflows. The current approach to sharing software is publishing source code -- leaving the burden to the integrator -- or creating a Software Development Kit (SDK) which is often restricted to one programming language. We present DIVAServices a framework for sharing and accessing DIA methods within the research community and beyond. Using a RESTful web service architecture we provide access to the methods, leading to only one system on which the binaries of methods need to be maintained. All it takes for a developer to use an algorithm is a simple HTTP request with the image data and parameters for the method and they will receive the computed results in a format that allows for seamless integration into any kind of workflow or for further processing. Furthermore, DIVAServices is open-source, enabling other research groups or libraries to host their own instance in their environment. Using this framework, future DIA systems can be built on the shoulders of well tested algorithms, accessible to everyone.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134367542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Multilingual OCR for Indic Scripts 印度脚本的多语言OCR
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.68
Minesh Mathew, A. Singh, C. V. Jawahar
In Indian scenario, a document analysis system has to support multiple languages at the same time. With emerging multilingualism in urban India, often bilingual, trilingual or even more languages need to be supported. This demands development of a multilingual OCR system which can work seamlessly across Indic scripts. In our approach the script is identified at word level, prior to the recognition of the word. An end-to-end RNN based architecture which can detect the script and recognize the text in a segmentation-free manner is proposed for this purpose. We demonstrate the approach for 12 Indian languages and English. It is observed that, even with the similar architecture, performance on Indian languages are poorer compared to English. We investigate this further. Our approach is evaluated on a large corpus comprising of thousands of pages. The Hindi OCR is compared with other popular OCRs for the language, as a further testimony for the efficacy of our method.
在印度场景中,文档分析系统必须同时支持多种语言。随着印度城市出现多语言现象,通常需要支持双语、三语甚至更多语言。这就要求开发一种多语言OCR系统,它可以无缝地跨印度脚本工作。在我们的方法中,在单词识别之前,在单词级别识别脚本。为此,提出了一种端到端的RNN结构,该结构可以检测脚本并以无分割的方式识别文本。我们为12种印度语言和英语演示了这种方法。可以观察到,即使在类似的架构下,印度语言的表现也比英语差。我们对此进行进一步调查。我们的方法在包含数千页的大型语料库上进行了评估。将印地语OCR与该语言的其他流行OCR进行比较,进一步证明我们的方法的有效性。
{"title":"Multilingual OCR for Indic Scripts","authors":"Minesh Mathew, A. Singh, C. V. Jawahar","doi":"10.1109/DAS.2016.68","DOIUrl":"https://doi.org/10.1109/DAS.2016.68","url":null,"abstract":"In Indian scenario, a document analysis system has to support multiple languages at the same time. With emerging multilingualism in urban India, often bilingual, trilingual or even more languages need to be supported. This demands development of a multilingual OCR system which can work seamlessly across Indic scripts. In our approach the script is identified at word level, prior to the recognition of the word. An end-to-end RNN based architecture which can detect the script and recognize the text in a segmentation-free manner is proposed for this purpose. We demonstrate the approach for 12 Indian languages and English. It is observed that, even with the similar architecture, performance on Indian languages are poorer compared to English. We investigate this further. Our approach is evaluated on a large corpus comprising of thousands of pages. The Hindi OCR is compared with other popular OCRs for the language, as a further testimony for the efficacy of our method.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121120983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
MSIO: MultiSpectral Document Image BinarizatIOn MSIO:多光谱文档图像二值化
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.39
Markus Diem, Fabian Hollaus, Robert Sablatnig
MultiSpectral (MS) imaging enriches document digitization by increasing the spectral resolution. We present a methodology which detects a target ink in document images by taking into account this additional information. The proposed method performs a rough foreground estimation to localize possible ink regions. Then, the Adaptive Coherence Estimator (ACE), a target detection algorithm, transforms the MS input space into a single gray-scale image where values close to one indicate ink. A spatial segmentation using GrabCut on the target detection's output is computed to create the final binary image. To find a baseline performance, the method is evaluated on the three most recent Document Image Binarization COntests (DIBCO) despite the fact that they only provide RGB images. In addition, an evaluation on three publicly available MS datasets is carried out. The presented methodology achieved the highest performance at the MultiSpectral Text Extraction (MS-TEx) contest 2015.
多光谱(MS)成像通过提高光谱分辨率丰富了文档数字化。我们提出了一种方法,该方法通过考虑到这些附加信息来检测文档图像中的目标墨水。该方法通过粗略的前景估计来定位可能的油墨区域。然后,自适应相干估计器(ACE),一种目标检测算法,将MS输入空间转换为单个灰度图像,其中接近1的值表示墨水。利用GrabCut对目标检测的输出进行空间分割计算,以创建最终的二值图像。为了找到一个基准性能,我们在最近的三个文档图像二值化竞赛(DIBCO)上对该方法进行了评估,尽管它们只提供RGB图像。此外,对三个公开可用的MS数据集进行了评估。该方法在2015年多光谱文本提取(MS-TEx)竞赛中取得了最高性能。
{"title":"MSIO: MultiSpectral Document Image BinarizatIOn","authors":"Markus Diem, Fabian Hollaus, Robert Sablatnig","doi":"10.1109/DAS.2016.39","DOIUrl":"https://doi.org/10.1109/DAS.2016.39","url":null,"abstract":"MultiSpectral (MS) imaging enriches document digitization by increasing the spectral resolution. We present a methodology which detects a target ink in document images by taking into account this additional information. The proposed method performs a rough foreground estimation to localize possible ink regions. Then, the Adaptive Coherence Estimator (ACE), a target detection algorithm, transforms the MS input space into a single gray-scale image where values close to one indicate ink. A spatial segmentation using GrabCut on the target detection's output is computed to create the final binary image. To find a baseline performance, the method is evaluated on the three most recent Document Image Binarization COntests (DIBCO) despite the fact that they only provide RGB images. In addition, an evaluation on three publicly available MS datasets is carried out. The presented methodology achieved the highest performance at the MultiSpectral Text Extraction (MS-TEx) contest 2015.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123746226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
An Interactive Approach with Off-Line and On-Line Handwritten Text Recognition Combination for Transcribing Historical Documents 联机与离线手写体文本识别相结合的历史文献转录交互方法
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.45
Emilio Granell, Verónica Romero, C. Martínez-Hinarejos
Automatic transcription of historical documents is becoming an important research topic, specially because of the increasing number of digitised historical documents that libraries and archives are publishing. However, state-of-the-art handwritten text recognition systems are far from being perfect. Therefore, to have perfect transcriptions, human expert revision is required to really produce a transcription of standard quality. In this context, an interactive assistive scenario, where the automatic system and the human transcriber cooperate to generate the perfect transcription, would allow for a more effective approach. In this paper we present a multimodal interactive transcription system where user feedback is provided by means of touchscreen pen strokes, traditional keyboard and mouse operations. The combination of both the main and the feedback data stream is based on the use of Confusion Networks derived from the output of the on-line and off-line handwritten text recognition systems. The use of the proposed combination help to optimise overall performance and usability.
历史文献的自动抄写正成为一个重要的研究课题,特别是由于图书馆和档案馆正在出版越来越多的数字化历史文献。然而,最先进的手写文本识别系统还远远不够完美。因此,要有完美的转录,需要人类专家的修订,才能真正产生标准质量的转录。在这种情况下,一个交互式的辅助方案,其中自动系统和人类转录合作产生完美的转录,将允许一个更有效的方法。在本文中,我们提出了一个多模式的交互式转录系统,其中用户反馈是通过触摸屏笔的笔划,传统的键盘和鼠标操作提供的。主数据流和反馈数据流的结合是基于在线和离线手写文本识别系统输出的混淆网络的使用。使用建议的组合有助于优化整体性能和可用性。
{"title":"An Interactive Approach with Off-Line and On-Line Handwritten Text Recognition Combination for Transcribing Historical Documents","authors":"Emilio Granell, Verónica Romero, C. Martínez-Hinarejos","doi":"10.1109/DAS.2016.45","DOIUrl":"https://doi.org/10.1109/DAS.2016.45","url":null,"abstract":"Automatic transcription of historical documents is becoming an important research topic, specially because of the increasing number of digitised historical documents that libraries and archives are publishing. However, state-of-the-art handwritten text recognition systems are far from being perfect. Therefore, to have perfect transcriptions, human expert revision is required to really produce a transcription of standard quality. In this context, an interactive assistive scenario, where the automatic system and the human transcriber cooperate to generate the perfect transcription, would allow for a more effective approach. In this paper we present a multimodal interactive transcription system where user feedback is provided by means of touchscreen pen strokes, traditional keyboard and mouse operations. The combination of both the main and the feedback data stream is based on the use of Confusion Networks derived from the output of the on-line and off-line handwritten text recognition systems. The use of the proposed combination help to optimise overall performance and usability.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116985684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
OCR Accuracy Prediction Method Based on Blur Estimation 基于模糊估计的OCR精度预测方法
Pub Date : 2016-04-11 DOI: 10.1109/DAS.2016.50
V. C. Kieu, F. Cloppet, N. Vincent
In this paper, we propose an OCR accuracy prediction method based on a local blur estimation since blur is one of the important factors that mostly damage OCR accuracy. First, we apply the blur estimation on synthetic blurred images by using Gaussian and motion blur in order to investigate the relation between blur effect and character size regarding OCR accuracy. This relation is considered as a blur-character size feature to define a classifier. Finally, the classifier can separate characters of a given document into three classes: readable, intermediate, and non-readable classes. Therefore, the quality score of the document is inferred from the three classes. The proposed method is evaluated on a published database and on an industrial one. The correlation with OCR accuracy is also given to compare with the state-of-the-art methods.
鉴于模糊是影响OCR精度的重要因素之一,本文提出了一种基于局部模糊估计的OCR精度预测方法。首先,我们利用高斯模糊和运动模糊对合成模糊图像进行模糊估计,研究模糊效果与字符大小对OCR精度的关系。这种关系被认为是定义分类器的模糊字符大小特征。最后,分类器可以将给定文档的字符分为三类:可读类、中间类和不可读类。因此,从这三个类别中推断出文档的质量分数。在一个已发表的数据库和一个工业数据库上对所提出的方法进行了评估。还给出了与OCR精度的相关性,以便与最先进的方法进行比较。
{"title":"OCR Accuracy Prediction Method Based on Blur Estimation","authors":"V. C. Kieu, F. Cloppet, N. Vincent","doi":"10.1109/DAS.2016.50","DOIUrl":"https://doi.org/10.1109/DAS.2016.50","url":null,"abstract":"In this paper, we propose an OCR accuracy prediction method based on a local blur estimation since blur is one of the important factors that mostly damage OCR accuracy. First, we apply the blur estimation on synthetic blurred images by using Gaussian and motion blur in order to investigate the relation between blur effect and character size regarding OCR accuracy. This relation is considered as a blur-character size feature to define a classifier. Finally, the classifier can separate characters of a given document into three classes: readable, intermediate, and non-readable classes. Therefore, the quality score of the document is inferred from the three classes. The proposed method is evaluated on a published database and on an industrial one. The correlation with OCR accuracy is also given to compare with the state-of-the-art methods.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129195403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Natural Scene Character Recognition Using Robust PCA and Sparse Representation 基于鲁棒PCA和稀疏表示的自然场景字符识别
Pub Date : 2016-04-01 DOI: 10.1109/DAS.2016.32
Zheng Zhang, Yong Xu, Cheng-Lin Liu
Natural scene character recognition is challenging due to the cluttered background, which is hard to separate from text. In this paper, we propose a novel method for robust scene character recognition. Specifically, we first use robust principal component analysis (PCA) to denoise character image by recovering the missing low-rank component and filtering out the sparse noise term, and then use a simple Histogram of oriented Gradient (HOG) to perform image feature extraction, and finally, use a sparse representation based classifier for recognition. In experiments on four public datasets, namely the Char74K dataset, ICADAR 2003 robust reading dataset, Street View Text (SVT) dataset and IIIT5K-word dataset, our method was demonstrated to be competitive with the state-of-the-art methods.
由于背景杂乱,难以从文本中分离出来,自然场景字符识别具有挑战性。本文提出了一种新的鲁棒场景字符识别方法。具体而言,我们首先使用鲁棒主成分分析(PCA)通过恢复缺失的低秩分量并过滤掉稀疏噪声项来对特征图像进行降噪,然后使用简单的定向梯度直方图(HOG)进行图像特征提取,最后使用基于稀疏表示的分类器进行识别。在Char74K数据集、ICADAR 2003鲁棒阅读数据集、街景文本(SVT)数据集和IIIT5K-word数据集四个公共数据集上的实验中,我们的方法与最先进的方法相比具有竞争力。
{"title":"Natural Scene Character Recognition Using Robust PCA and Sparse Representation","authors":"Zheng Zhang, Yong Xu, Cheng-Lin Liu","doi":"10.1109/DAS.2016.32","DOIUrl":"https://doi.org/10.1109/DAS.2016.32","url":null,"abstract":"Natural scene character recognition is challenging due to the cluttered background, which is hard to separate from text. In this paper, we propose a novel method for robust scene character recognition. Specifically, we first use robust principal component analysis (PCA) to denoise character image by recovering the missing low-rank component and filtering out the sparse noise term, and then use a simple Histogram of oriented Gradient (HOG) to perform image feature extraction, and finally, use a sparse representation based classifier for recognition. In experiments on four public datasets, namely the Char74K dataset, ICADAR 2003 robust reading dataset, Street View Text (SVT) dataset and IIIT5K-word dataset, our method was demonstrated to be competitive with the state-of-the-art methods.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123237194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Visual Analysis System for Features and Distances Qualitative Assessment: Application to Word Image Matching 特征与距离定性评价的视觉分析系统:在文字图像匹配中的应用
Pub Date : 2016-04-01 DOI: 10.1109/DAS.2016.17
Frédéric Rayar, T. Mondal, Sabine Barrat, F. Bouali, G. Venturini
In this paper, a visual analysis system to qualitatively assess the features and distance functions that are used for calculating dissimilarity between two word images is presented. Computation of dissimilarity between two images is the prerequisite for image matching, indexing and retrieval problems. First, the features are extracted from the word images and a distance between each image to others is computed and represented in a matrix form. Then, based on this distance matrix, a proximity graph is built to structure the set of word images and highlight their topology. The proposed visual analysis system is a web based platform that allows visualisation and interactions on the obtained graph. This interactive visualisation tool inherently helps users to quickly analyse and understand the relevance and robustness of selected features and corresponding distance function in a unsupervised way, i.e. without any ground truth. Experiments are performed on a handwritten dataset of segmented words. Three types of features and four distance functions are considered to describe and compare the word images. Theses material are leveraged to evaluate the relevance of the built graph, and the usefulness of the platform.
本文提出了一种视觉分析系统,用于定性地评估用于计算两个词图像之间不相似度的特征和距离函数。计算两幅图像之间的不相似度是图像匹配、索引和检索问题的前提。首先,从单词图像中提取特征,计算每个图像与其他图像之间的距离,并以矩阵形式表示。然后,基于这个距离矩阵,构建一个接近图来构建单词图像集并突出显示它们的拓扑结构。提出的可视化分析系统是一个基于web的平台,允许对获得的图形进行可视化和交互。这种交互式可视化工具本质上帮助用户以无监督的方式快速分析和理解所选特征和相应距离函数的相关性和鲁棒性,即没有任何基础真理。实验在一个手写的分词数据集上进行。考虑了三种特征和四种距离函数来描述和比较单词图像。利用这些材料来评估构建图的相关性和平台的有用性。
{"title":"Visual Analysis System for Features and Distances Qualitative Assessment: Application to Word Image Matching","authors":"Frédéric Rayar, T. Mondal, Sabine Barrat, F. Bouali, G. Venturini","doi":"10.1109/DAS.2016.17","DOIUrl":"https://doi.org/10.1109/DAS.2016.17","url":null,"abstract":"In this paper, a visual analysis system to qualitatively assess the features and distance functions that are used for calculating dissimilarity between two word images is presented. Computation of dissimilarity between two images is the prerequisite for image matching, indexing and retrieval problems. First, the features are extracted from the word images and a distance between each image to others is computed and represented in a matrix form. Then, based on this distance matrix, a proximity graph is built to structure the set of word images and highlight their topology. The proposed visual analysis system is a web based platform that allows visualisation and interactions on the obtained graph. This interactive visualisation tool inherently helps users to quickly analyse and understand the relevance and robustness of selected features and corresponding distance function in a unsupervised way, i.e. without any ground truth. Experiments are performed on a handwritten dataset of segmented words. Three types of features and four distance functions are considered to describe and compare the word images. Theses material are leveraged to evaluate the relevance of the built graph, and the usefulness of the platform.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125559351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2016 12th IAPR Workshop on Document Analysis Systems (DAS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1