This paper deals with the topic of performance evaluation of the symbol recognition & spotting systems. It presents an overview as a result of the work and the discussions undertaken by a working group on this subject. The paper starts by giving a general view of symbol recognition & spotting and performance evaluation. Next, the two main issues of performance evaluation are discussed: groundtruthing and performance characterization. Different problems related to both issues are addressed: groundtruthing of real documents, generation of synthetic documents, degradation models, the use of a priori knowledge, mapping of the groundtruth with the system results, and so on. Open problems arising from this overview are also discussed at the end of the paper.
{"title":"Performance Evaluation of Symbol Recognition and Spotting Systems: An Overview","authors":"Mathieu Delalandre, Ernest Valveny, J. Lladós","doi":"10.1109/DAS.2008.63","DOIUrl":"https://doi.org/10.1109/DAS.2008.63","url":null,"abstract":"This paper deals with the topic of performance evaluation of the symbol recognition & spotting systems. It presents an overview as a result of the work and the discussions undertaken by a working group on this subject. The paper starts by giving a general view of symbol recognition & spotting and performance evaluation. Next, the two main issues of performance evaluation are discussed: groundtruthing and performance characterization. Different problems related to both issues are addressed: groundtruthing of real documents, generation of synthetic documents, degradation models, the use of a priori knowledge, mapping of the groundtruth with the system results, and so on. Open problems arising from this overview are also discussed at the end of the paper.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123165492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a robust method for mosaicing of document images using features derived from connected components. Each connected component is described using the angular radial transform (ART). To ensure geometric consistency during feature matching, the ART coefficients of a connected component are augmented with those of its two nearest neighbors. The proposed method addresses two critical issues often encountered in correspondence matching: (i) the stability of features and (ii) robustness against false matches due to the multiple instances of characters in a document image. The use of connected components guarantees a stable localization across images. The augmented features ensure a successful correspondence matching even in the presence of multiple similar regions within the page. We illustrate the effectiveness of the proposed method on camera captured document images exhibiting large variations in viewpoint, illumination and scale.
{"title":"CCD: Connected Component Descriptor for Robust Mosaicing of Camera-Captured Document Images","authors":"T. Kasar, A. Ramakrishnan","doi":"10.1109/DAS.2008.31","DOIUrl":"https://doi.org/10.1109/DAS.2008.31","url":null,"abstract":"We propose a robust method for mosaicing of document images using features derived from connected components. Each connected component is described using the angular radial transform (ART). To ensure geometric consistency during feature matching, the ART coefficients of a connected component are augmented with those of its two nearest neighbors. The proposed method addresses two critical issues often encountered in correspondence matching: (i) the stability of features and (ii) robustness against false matches due to the multiple instances of characters in a document image. The use of connected components guarantees a stable localization across images. The augmented features ensure a successful correspondence matching even in the presence of multiple similar regions within the page. We illustrate the effectiveness of the proposed method on camera captured document images exhibiting large variations in viewpoint, illumination and scale.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124550161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Lettner, Florian Kleber, Robert Sablatnig, Heinz Miklas
This paper deals with the enhancement of the readability in historic texts written on parchment. Due to mold, air, humidity, water, etc. parchment and text are partially damaged and consequently hard to read. In order to enhance the readability of the text, the manuscript pages are imaged in different spectral bands ranging from 360 to 1000 nm. The readability enhancement is based on a spectral and spatial analysis of the multivariate image data by multivariate spatial correlation. The main advantage of the method is that especially the text regions are enhanced which is provided by generating a mask image. This mask is based on the automatic reconstruction of the ruling scheme of the text pages. The method is tested on two medieval Slavonic manuscripts written on parchment.
{"title":"Contrast Enhancement in Multispectral Images by Emphasizing Text Regions","authors":"M. Lettner, Florian Kleber, Robert Sablatnig, Heinz Miklas","doi":"10.1109/DAS.2008.68","DOIUrl":"https://doi.org/10.1109/DAS.2008.68","url":null,"abstract":"This paper deals with the enhancement of the readability in historic texts written on parchment. Due to mold, air, humidity, water, etc. parchment and text are partially damaged and consequently hard to read. In order to enhance the readability of the text, the manuscript pages are imaged in different spectral bands ranging from 360 to 1000 nm. The readability enhancement is based on a spectral and spatial analysis of the multivariate image data by multivariate spatial correlation. The main advantage of the method is that especially the text regions are enhanced which is provided by generating a mask image. This mask is based on the automatic reconstruction of the ruling scheme of the text pages. The method is tested on two medieval Slavonic manuscripts written on parchment.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117212058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe an approach for creating precise personalized document classifiers based on the user's attention. The general idea is to observe which parts of a document the user was interested in just before he or she comes to a classification decision. Having information about this manual classification decision and the document parts the decision was based on, we can learn precise classifiers. For observing the user's focus point of attention we use an unobtrusive eye tracking device and apply an algorithm for reading behavior detection. On this basis, we can extract terms characterizing the text parts interesting to the user and employ them for describing the class the document was assigned to by the user. Having learned classifiers in that way, new documents can be classified automatically using techniques of passage-based retrieval. We prove the very strong improvement of incorporating the user's visual attention by a case study that evaluates an attention-based term extraction method.
{"title":"Attention-Based Document Classifier Learning","authors":"Georg Buscher, A. Dengel","doi":"10.1109/DAS.2008.36","DOIUrl":"https://doi.org/10.1109/DAS.2008.36","url":null,"abstract":"We describe an approach for creating precise personalized document classifiers based on the user's attention. The general idea is to observe which parts of a document the user was interested in just before he or she comes to a classification decision. Having information about this manual classification decision and the document parts the decision was based on, we can learn precise classifiers. For observing the user's focus point of attention we use an unobtrusive eye tracking device and apply an algorithm for reading behavior detection. On this basis, we can extract terms characterizing the text parts interesting to the user and employ them for describing the class the document was assigned to by the user. Having learned classifiers in that way, new documents can be classified automatically using techniques of passage-based retrieval. We prove the very strong improvement of incorporating the user's visual attention by a case study that evaluates an attention-based term extraction method.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126479754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A robust segmentation is the most important part of an automatic character recognition system (e.g. document processing, license plate recognition etc.). In our contribution we present an efficient segmentation framework using a preprocessing step for shadow suppression combined with a local thresholding technique. The method is based on a combination of difference of boxes filters and a new ternary segmentation, which are both simple low-level image operations. We also draw parallels to a recently published work on a ganglion cell model and show that our approach is theoretically more substantiated as well as more robust and more efficient in practice. Systematic evaluation of noisy input data as well as results on a large dataset of license plate images show the robustness and efficiency of our proposed method. Our results can be applied easily to any optical character recognition system resulting in an impressive gain of robustness against nonlinear illumination.
{"title":"Difference of Boxes Filters Revisited: Shadow Suppression and Efficient Character Segmentation","authors":"E. Rodner, H. Süße, W. Ortmann, Joachim Denzler","doi":"10.1109/DAS.2008.12","DOIUrl":"https://doi.org/10.1109/DAS.2008.12","url":null,"abstract":"A robust segmentation is the most important part of an automatic character recognition system (e.g. document processing, license plate recognition etc.). In our contribution we present an efficient segmentation framework using a preprocessing step for shadow suppression combined with a local thresholding technique. The method is based on a combination of difference of boxes filters and a new ternary segmentation, which are both simple low-level image operations. We also draw parallels to a recently published work on a ganglion cell model and show that our approach is theoretically more substantiated as well as more robust and more efficient in practice. Systematic evaluation of noisy input data as well as results on a large dataset of license plate images show the robustness and efficiency of our proposed method. Our results can be applied easily to any optical character recognition system resulting in an impressive gain of robustness against nonlinear illumination.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122310716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastián Peña Saldarriaga, E. Morin, C. Viard-Gaudin
With the growth of on-line handwriting technologies, managing facilities for handwritten documents, such as retrieval of documents by topic, are required. These documents can contain graphics, equations or text for instance. This work reports experiments on categorization of on-line handwritten documents based on their textual contents. We assume that handwritten text blocks have been extracted from the documents, and as a first step of the proposed system, we process them with an existing handwritten recognition engine. We analyse the effect of the word recognition rate on the categorization performances, and we compare them with those obtained with the same texts available as ground truth. Two categorization algorithms (kNN and SVM) are compared in this work. The handwritten texts are a subset of the Reuters-21578 corpus collected from more than 1500 writers. Results show that there is no significant categorization performance loss when the word error rate stands below 22%.
{"title":"Categorization of On-Line Handwritten Documents","authors":"Sebastián Peña Saldarriaga, E. Morin, C. Viard-Gaudin","doi":"10.1109/DAS.2008.45","DOIUrl":"https://doi.org/10.1109/DAS.2008.45","url":null,"abstract":"With the growth of on-line handwriting technologies, managing facilities for handwritten documents, such as retrieval of documents by topic, are required. These documents can contain graphics, equations or text for instance. This work reports experiments on categorization of on-line handwritten documents based on their textual contents. We assume that handwritten text blocks have been extracted from the documents, and as a first step of the proposed system, we process them with an existing handwritten recognition engine. We analyse the effect of the word recognition rate on the categorization performances, and we compare them with those obtained with the same texts available as ground truth. Two categorization algorithms (kNN and SVM) are compared in this work. The handwritten texts are a subset of the Reuters-21578 corpus collected from more than 1500 writers. Results show that there is no significant categorization performance loss when the word error rate stands below 22%.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"663 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116182254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an approach to extract the structure of pre-printed and hand-filled table-forms. The first module performs the cell identification based on Watershed transform. A second module detects the wrong cells produced by handwritten and/or pre-printed data. In this module, wrong cells and other cells are filtered by a compactness, perimeter and area analysis. In a third module, the wrong cells are merged with other cells to determine the exact structure. A miscellaneous database composed of 300 pre-printed and hand-filled table-form images was used to evaluate the efficiency of our methodology. Experiments showed significant and promising results.
{"title":"Pre-Printed and Hand-Filled Table-Form Analysis Aiming Cell Extraction","authors":"Rafaela Dandolini Felipe, L. A. P. Neves","doi":"10.1109/DAS.2008.46","DOIUrl":"https://doi.org/10.1109/DAS.2008.46","url":null,"abstract":"This paper presents an approach to extract the structure of pre-printed and hand-filled table-forms. The first module performs the cell identification based on Watershed transform. A second module detects the wrong cells produced by handwritten and/or pre-printed data. In this module, wrong cells and other cells are filtered by a compactness, perimeter and area analysis. In a third module, the wrong cells are merged with other cells to determine the exact structure. A miscellaneous database composed of 300 pre-printed and hand-filled table-form images was used to evaluate the efficiency of our methodology. Experiments showed significant and promising results.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116589263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Extraction of text from scene image is much difficult than extraction from simple document image. A lot of researches succeeded in extracting single text string from image, but can not deal with image including many text strings. Meanwhile, the result may be mixed with noises be similar to text. This paper describes an algorithm that uses mathematical morphology to extract text effectively, and edge border ratio is utilized to differentiate text region from noise region, using the edge contrast feature of the text region in real scene. This paper also describes the method which can connect characters into text strings, and distribute text strings to different subimages according to their width of strokes. The algorithm is implied to scene image like signs, indicators as well as magazine covers, and its robustness is proved.
{"title":"Text String Extraction from Scene Image Based on Edge Feature and Morphology","authors":"Yuming Wang, Naoki Tanaka","doi":"10.1109/DAS.2008.51","DOIUrl":"https://doi.org/10.1109/DAS.2008.51","url":null,"abstract":"Extraction of text from scene image is much difficult than extraction from simple document image. A lot of researches succeeded in extracting single text string from image, but can not deal with image including many text strings. Meanwhile, the result may be mixed with noises be similar to text. This paper describes an algorithm that uses mathematical morphology to extract text effectively, and edge border ratio is utilized to differentiate text region from noise region, using the edge contrast feature of the text region in real scene. This paper also describes the method which can connect characters into text strings, and distribute text strings to different subimages according to their width of strokes. The algorithm is implied to scene image like signs, indicators as well as magazine covers, and its robustness is proved.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124751149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automatic Term Recognition (ATR) is concerned with discovering terminology in large volumes of text corpora. Technical terms are vital elements for understanding the techniques used in academic research papers, and in this paper, we use focused technical terms to explore technical trends in the research literature. The major purpose of this work is to understand the relationship between techniques and research topics to better explore technical trends. We define this new text mining issue and apply machine learning algorithms for solving this problem by (1) recognizing focused technical terms from research papers; (2) classifying these terms into predefined technology categories; (3) analyzing the evolution of technical trends. The dataset consists of 656 papers collected from well-known conferences on ACM. The experimental results indicate that our proposed methods can effectively explore interesting evolutionary technical trends in various research topics.
{"title":"Exploring Evolutionary Technical Trends from Academic Research Papers","authors":"Teng-Kai Fan, Chia-Hui Chang","doi":"10.1109/DAS.2008.25","DOIUrl":"https://doi.org/10.1109/DAS.2008.25","url":null,"abstract":"Automatic Term Recognition (ATR) is concerned with discovering terminology in large volumes of text corpora. Technical terms are vital elements for understanding the techniques used in academic research papers, and in this paper, we use focused technical terms to explore technical trends in the research literature. The major purpose of this work is to understand the relationship between techniques and research topics to better explore technical trends. We define this new text mining issue and apply machine learning algorithms for solving this problem by (1) recognizing focused technical terms from research papers; (2) classifying these terms into predefined technology categories; (3) analyzing the evolution of technical trends. The dataset consists of 656 papers collected from well-known conferences on ACM. The experimental results indicate that our proposed methods can effectively explore interesting evolutionary technical trends in various research topics.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128259497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Sitaram, Gopal Datt Joshi, S. Noushath, Pulkit Parikh, Vishal Gupta
In this paper, we introduce a novel concept called {PaperDiff} and propose an algorithm to implement it. The aim of PaperDiff is to compare two printed (paper) documents using their images and determine the differences in terms of text inserted, deleted and substituted between them. This lets an end-user compare two documents which are already printed or even if one of which is printed (the other could be in electronic form such as MS-word *.doc file). The algorithm we have proposed for realizing PaperDiff is based on word image comparison and is even suitable for symbol strings and for any script/language (including multiple scripts) in the documents, where even mature optical character recognition (OCR) technology has had very little success. PaperDiff enables end-users like lawyers, novelists, etc, in comparing new document versions with older versions of them. Our proposed method is suitable even when the formatting of content is different between the two input documents, where the structures of the document images are different (for e.g., differing page widths, page structure etc). An experiment of PaperDiff on single column text documents yielded 99.2 % accuracy while detecting 135 induced differences in 10 pairs of documents.
{"title":"PaperDiff: A Script Independent Automatic Method for Finding the Text Differences Between Two Document Images","authors":"R. Sitaram, Gopal Datt Joshi, S. Noushath, Pulkit Parikh, Vishal Gupta","doi":"10.1109/DAS.2008.69","DOIUrl":"https://doi.org/10.1109/DAS.2008.69","url":null,"abstract":"In this paper, we introduce a novel concept called {PaperDiff} and propose an algorithm to implement it. The aim of PaperDiff is to compare two printed (paper) documents using their images and determine the differences in terms of text inserted, deleted and substituted between them. This lets an end-user compare two documents which are already printed or even if one of which is printed (the other could be in electronic form such as MS-word *.doc file). The algorithm we have proposed for realizing PaperDiff is based on word image comparison and is even suitable for symbol strings and for any script/language (including multiple scripts) in the documents, where even mature optical character recognition (OCR) technology has had very little success. PaperDiff enables end-users like lawyers, novelists, etc, in comparing new document versions with older versions of them. Our proposed method is suitable even when the formatting of content is different between the two input documents, where the structures of the document images are different (for e.g., differing page widths, page structure etc). An experiment of PaperDiff on single column text documents yielded 99.2 % accuracy while detecting 135 induced differences in 10 pairs of documents.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"500 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127592971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}