This paper deals with the topic of performance evaluation of the symbol recognition & spotting systems. It presents an overview as a result of the work and the discussions undertaken by a working group on this subject. The paper starts by giving a general view of symbol recognition & spotting and performance evaluation. Next, the two main issues of performance evaluation are discussed: groundtruthing and performance characterization. Different problems related to both issues are addressed: groundtruthing of real documents, generation of synthetic documents, degradation models, the use of a priori knowledge, mapping of the groundtruth with the system results, and so on. Open problems arising from this overview are also discussed at the end of the paper.
{"title":"Performance Evaluation of Symbol Recognition and Spotting Systems: An Overview","authors":"Mathieu Delalandre, Ernest Valveny, J. Lladós","doi":"10.1109/DAS.2008.63","DOIUrl":"https://doi.org/10.1109/DAS.2008.63","url":null,"abstract":"This paper deals with the topic of performance evaluation of the symbol recognition & spotting systems. It presents an overview as a result of the work and the discussions undertaken by a working group on this subject. The paper starts by giving a general view of symbol recognition & spotting and performance evaluation. Next, the two main issues of performance evaluation are discussed: groundtruthing and performance characterization. Different problems related to both issues are addressed: groundtruthing of real documents, generation of synthetic documents, degradation models, the use of a priori knowledge, mapping of the groundtruth with the system results, and so on. Open problems arising from this overview are also discussed at the end of the paper.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123165492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a robust method for mosaicing of document images using features derived from connected components. Each connected component is described using the angular radial transform (ART). To ensure geometric consistency during feature matching, the ART coefficients of a connected component are augmented with those of its two nearest neighbors. The proposed method addresses two critical issues often encountered in correspondence matching: (i) the stability of features and (ii) robustness against false matches due to the multiple instances of characters in a document image. The use of connected components guarantees a stable localization across images. The augmented features ensure a successful correspondence matching even in the presence of multiple similar regions within the page. We illustrate the effectiveness of the proposed method on camera captured document images exhibiting large variations in viewpoint, illumination and scale.
{"title":"CCD: Connected Component Descriptor for Robust Mosaicing of Camera-Captured Document Images","authors":"T. Kasar, A. Ramakrishnan","doi":"10.1109/DAS.2008.31","DOIUrl":"https://doi.org/10.1109/DAS.2008.31","url":null,"abstract":"We propose a robust method for mosaicing of document images using features derived from connected components. Each connected component is described using the angular radial transform (ART). To ensure geometric consistency during feature matching, the ART coefficients of a connected component are augmented with those of its two nearest neighbors. The proposed method addresses two critical issues often encountered in correspondence matching: (i) the stability of features and (ii) robustness against false matches due to the multiple instances of characters in a document image. The use of connected components guarantees a stable localization across images. The augmented features ensure a successful correspondence matching even in the presence of multiple similar regions within the page. We illustrate the effectiveness of the proposed method on camera captured document images exhibiting large variations in viewpoint, illumination and scale.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124550161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Lettner, Florian Kleber, Robert Sablatnig, Heinz Miklas
This paper deals with the enhancement of the readability in historic texts written on parchment. Due to mold, air, humidity, water, etc. parchment and text are partially damaged and consequently hard to read. In order to enhance the readability of the text, the manuscript pages are imaged in different spectral bands ranging from 360 to 1000 nm. The readability enhancement is based on a spectral and spatial analysis of the multivariate image data by multivariate spatial correlation. The main advantage of the method is that especially the text regions are enhanced which is provided by generating a mask image. This mask is based on the automatic reconstruction of the ruling scheme of the text pages. The method is tested on two medieval Slavonic manuscripts written on parchment.
{"title":"Contrast Enhancement in Multispectral Images by Emphasizing Text Regions","authors":"M. Lettner, Florian Kleber, Robert Sablatnig, Heinz Miklas","doi":"10.1109/DAS.2008.68","DOIUrl":"https://doi.org/10.1109/DAS.2008.68","url":null,"abstract":"This paper deals with the enhancement of the readability in historic texts written on parchment. Due to mold, air, humidity, water, etc. parchment and text are partially damaged and consequently hard to read. In order to enhance the readability of the text, the manuscript pages are imaged in different spectral bands ranging from 360 to 1000 nm. The readability enhancement is based on a spectral and spatial analysis of the multivariate image data by multivariate spatial correlation. The main advantage of the method is that especially the text regions are enhanced which is provided by generating a mask image. This mask is based on the automatic reconstruction of the ruling scheme of the text pages. The method is tested on two medieval Slavonic manuscripts written on parchment.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117212058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe an approach for creating precise personalized document classifiers based on the user's attention. The general idea is to observe which parts of a document the user was interested in just before he or she comes to a classification decision. Having information about this manual classification decision and the document parts the decision was based on, we can learn precise classifiers. For observing the user's focus point of attention we use an unobtrusive eye tracking device and apply an algorithm for reading behavior detection. On this basis, we can extract terms characterizing the text parts interesting to the user and employ them for describing the class the document was assigned to by the user. Having learned classifiers in that way, new documents can be classified automatically using techniques of passage-based retrieval. We prove the very strong improvement of incorporating the user's visual attention by a case study that evaluates an attention-based term extraction method.
{"title":"Attention-Based Document Classifier Learning","authors":"Georg Buscher, A. Dengel","doi":"10.1109/DAS.2008.36","DOIUrl":"https://doi.org/10.1109/DAS.2008.36","url":null,"abstract":"We describe an approach for creating precise personalized document classifiers based on the user's attention. The general idea is to observe which parts of a document the user was interested in just before he or she comes to a classification decision. Having information about this manual classification decision and the document parts the decision was based on, we can learn precise classifiers. For observing the user's focus point of attention we use an unobtrusive eye tracking device and apply an algorithm for reading behavior detection. On this basis, we can extract terms characterizing the text parts interesting to the user and employ them for describing the class the document was assigned to by the user. Having learned classifiers in that way, new documents can be classified automatically using techniques of passage-based retrieval. We prove the very strong improvement of incorporating the user's visual attention by a case study that evaluates an attention-based term extraction method.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126479754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A robust segmentation is the most important part of an automatic character recognition system (e.g. document processing, license plate recognition etc.). In our contribution we present an efficient segmentation framework using a preprocessing step for shadow suppression combined with a local thresholding technique. The method is based on a combination of difference of boxes filters and a new ternary segmentation, which are both simple low-level image operations. We also draw parallels to a recently published work on a ganglion cell model and show that our approach is theoretically more substantiated as well as more robust and more efficient in practice. Systematic evaluation of noisy input data as well as results on a large dataset of license plate images show the robustness and efficiency of our proposed method. Our results can be applied easily to any optical character recognition system resulting in an impressive gain of robustness against nonlinear illumination.
{"title":"Difference of Boxes Filters Revisited: Shadow Suppression and Efficient Character Segmentation","authors":"E. Rodner, H. Süße, W. Ortmann, Joachim Denzler","doi":"10.1109/DAS.2008.12","DOIUrl":"https://doi.org/10.1109/DAS.2008.12","url":null,"abstract":"A robust segmentation is the most important part of an automatic character recognition system (e.g. document processing, license plate recognition etc.). In our contribution we present an efficient segmentation framework using a preprocessing step for shadow suppression combined with a local thresholding technique. The method is based on a combination of difference of boxes filters and a new ternary segmentation, which are both simple low-level image operations. We also draw parallels to a recently published work on a ganglion cell model and show that our approach is theoretically more substantiated as well as more robust and more efficient in practice. Systematic evaluation of noisy input data as well as results on a large dataset of license plate images show the robustness and efficiency of our proposed method. Our results can be applied easily to any optical character recognition system resulting in an impressive gain of robustness against nonlinear illumination.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122310716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Extraction of text from scene image is much difficult than extraction from simple document image. A lot of researches succeeded in extracting single text string from image, but can not deal with image including many text strings. Meanwhile, the result may be mixed with noises be similar to text. This paper describes an algorithm that uses mathematical morphology to extract text effectively, and edge border ratio is utilized to differentiate text region from noise region, using the edge contrast feature of the text region in real scene. This paper also describes the method which can connect characters into text strings, and distribute text strings to different subimages according to their width of strokes. The algorithm is implied to scene image like signs, indicators as well as magazine covers, and its robustness is proved.
{"title":"Text String Extraction from Scene Image Based on Edge Feature and Morphology","authors":"Yuming Wang, Naoki Tanaka","doi":"10.1109/DAS.2008.51","DOIUrl":"https://doi.org/10.1109/DAS.2008.51","url":null,"abstract":"Extraction of text from scene image is much difficult than extraction from simple document image. A lot of researches succeeded in extracting single text string from image, but can not deal with image including many text strings. Meanwhile, the result may be mixed with noises be similar to text. This paper describes an algorithm that uses mathematical morphology to extract text effectively, and edge border ratio is utilized to differentiate text region from noise region, using the edge contrast feature of the text region in real scene. This paper also describes the method which can connect characters into text strings, and distribute text strings to different subimages according to their width of strokes. The algorithm is implied to scene image like signs, indicators as well as magazine covers, and its robustness is proved.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124751149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akira Horimatsu, Ryo Niwa, M. Iwamura, K. Kise, S. Uchida, S. Omachi
There are many problems to realize camera-based character recognition. One of the problems is that characters in scenes are often distorted by geometric transformations such as affine distortions. Although some methods that remove the affine distortions have been proposed, they cannot remove a rotation transformation of a character. Thus a skew angle of a character has to be determined by examining all the possible angles. However, this consumes quite a bit of time. In this paper, in order to reduce the processing time for an affine invariant recognition, we propose a set of affine invariant features and a new recognition scheme called "progressive pruning."' The progressive pruning gradually prunes less feasible categories and skew angles using multiple classifiers. We confirmed the progressive pruning with the affine invariant features reduced the processing time at least less than half without decreasing the recognition rate.
{"title":"Affine Invariant Recognition of Characters by Progressive Pruning","authors":"Akira Horimatsu, Ryo Niwa, M. Iwamura, K. Kise, S. Uchida, S. Omachi","doi":"10.1109/DAS.2008.88","DOIUrl":"https://doi.org/10.1109/DAS.2008.88","url":null,"abstract":"There are many problems to realize camera-based character recognition. One of the problems is that characters in scenes are often distorted by geometric transformations such as affine distortions. Although some methods that remove the affine distortions have been proposed, they cannot remove a rotation transformation of a character. Thus a skew angle of a character has to be determined by examining all the possible angles. However, this consumes quite a bit of time. In this paper, in order to reduce the processing time for an affine invariant recognition, we propose a set of affine invariant features and a new recognition scheme called \"progressive pruning.\"' The progressive pruning gradually prunes less feasible categories and skew angles using multiple classifiers. We confirmed the progressive pruning with the affine invariant features reduced the processing time at least less than half without decreasing the recognition rate.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114818382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we propose a novel and efficient technique for finding keywords typed by the user in digitised machine-printed historical documents using the dynamic time warping (DTW) algorithm. The method uses word portions located at the beginning and end of each segmented word of the processed documents and try to estimate the position of the first and last characters in order to reduce the list of candidate words. Since DTW can become computational intensive in large datasets the proposed method manages to significantly prune the list of candidate words thus, speeding up the entire process. Word length is also used as a means of further reducing the data to be processed. Results are improved in terms of time and efficiency compared to those produced if no pruning is done to the list of candidate words.
{"title":"Keyword Matching in Historical Machine-Printed Documents Using Synthetic Data, Word Portions and Dynamic Time Warping","authors":"T. Konidaris, B. Gatos, S. Perantonis, A. Kesidis","doi":"10.1109/DAS.2008.64","DOIUrl":"https://doi.org/10.1109/DAS.2008.64","url":null,"abstract":"In this paper we propose a novel and efficient technique for finding keywords typed by the user in digitised machine-printed historical documents using the dynamic time warping (DTW) algorithm. The method uses word portions located at the beginning and end of each segmented word of the processed documents and try to estimate the position of the first and last characters in order to reduce the list of candidate words. Since DTW can become computational intensive in large datasets the proposed method manages to significantly prune the list of candidate words thus, speeding up the entire process. Word length is also used as a means of further reducing the data to be processed. Results are improved in terms of time and efficiency compared to those produced if no pruning is done to the list of candidate words.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116738695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Sitaram, Gopal Datt Joshi, S. Noushath, Pulkit Parikh, Vishal Gupta
In this paper, we introduce a novel concept called {PaperDiff} and propose an algorithm to implement it. The aim of PaperDiff is to compare two printed (paper) documents using their images and determine the differences in terms of text inserted, deleted and substituted between them. This lets an end-user compare two documents which are already printed or even if one of which is printed (the other could be in electronic form such as MS-word *.doc file). The algorithm we have proposed for realizing PaperDiff is based on word image comparison and is even suitable for symbol strings and for any script/language (including multiple scripts) in the documents, where even mature optical character recognition (OCR) technology has had very little success. PaperDiff enables end-users like lawyers, novelists, etc, in comparing new document versions with older versions of them. Our proposed method is suitable even when the formatting of content is different between the two input documents, where the structures of the document images are different (for e.g., differing page widths, page structure etc). An experiment of PaperDiff on single column text documents yielded 99.2 % accuracy while detecting 135 induced differences in 10 pairs of documents.
{"title":"PaperDiff: A Script Independent Automatic Method for Finding the Text Differences Between Two Document Images","authors":"R. Sitaram, Gopal Datt Joshi, S. Noushath, Pulkit Parikh, Vishal Gupta","doi":"10.1109/DAS.2008.69","DOIUrl":"https://doi.org/10.1109/DAS.2008.69","url":null,"abstract":"In this paper, we introduce a novel concept called {PaperDiff} and propose an algorithm to implement it. The aim of PaperDiff is to compare two printed (paper) documents using their images and determine the differences in terms of text inserted, deleted and substituted between them. This lets an end-user compare two documents which are already printed or even if one of which is printed (the other could be in electronic form such as MS-word *.doc file). The algorithm we have proposed for realizing PaperDiff is based on word image comparison and is even suitable for symbol strings and for any script/language (including multiple scripts) in the documents, where even mature optical character recognition (OCR) technology has had very little success. PaperDiff enables end-users like lawyers, novelists, etc, in comparing new document versions with older versions of them. Our proposed method is suitable even when the formatting of content is different between the two input documents, where the structures of the document images are different (for e.g., differing page widths, page structure etc). An experiment of PaperDiff on single column text documents yielded 99.2 % accuracy while detecting 135 induced differences in 10 pairs of documents.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"500 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127592971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Hase, Kohei Tanabe, Thi Hong Ha Tran, Shogo Tokai
This paper presents on accuracy improvement of multi-font rotated character recognition. Until now, a recognition method for rotated characters was based on distance criterion on the eigen sub-space. That is, an unknown pattern is projected onto the eigen-subspace of each category. The category which shows the closest distance between the projected point and the category locus is chosen. However, this simple method could not be cope with multi-font characters. Therefore, some unknown patterns were created by rotating the input pattern and projected onto the eigen-subspace of each category. By that method, a good performance was achieved for small size of categories like alphabetic 26 capital letters. However, the performance fell down by increasing the number of categories like 62 alpha-numeric letters. By considering the cause of the misclassification, we found that the distance criterion accidentally caused misclassification. This paper proposes a new feature based on periodic property of projected points on the eigen space. The experimental results showed a considerably high recognition rate.
{"title":"Multi-Font Rotated Character Recognition Using Periodicity","authors":"H. Hase, Kohei Tanabe, Thi Hong Ha Tran, Shogo Tokai","doi":"10.1109/DAS.2008.16","DOIUrl":"https://doi.org/10.1109/DAS.2008.16","url":null,"abstract":"This paper presents on accuracy improvement of multi-font rotated character recognition. Until now, a recognition method for rotated characters was based on distance criterion on the eigen sub-space. That is, an unknown pattern is projected onto the eigen-subspace of each category. The category which shows the closest distance between the projected point and the category locus is chosen. However, this simple method could not be cope with multi-font characters. Therefore, some unknown patterns were created by rotating the input pattern and projected onto the eigen-subspace of each category. By that method, a good performance was achieved for small size of categories like alphabetic 26 capital letters. However, the performance fell down by increasing the number of categories like 62 alpha-numeric letters. By considering the cause of the misclassification, we found that the distance criterion accidentally caused misclassification. This paper proposes a new feature based on periodic property of projected points on the eigen space. The experimental results showed a considerably high recognition rate.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"20 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126538789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}