A. Minagawa, Y. Fujii, Hiroaki Takebe, K. Fujimoto
A new method for analyzing the specific logical structure of forms with unknown layout is proposed. This method uses both the target form image and a generic logical structure as inputs, and models two types of relationships probabilistically: that between strings and logical components, and that between neighboring strings having different logical components. This modeling approach allows strings to be assigned to logical components softly but robustly, and allows the use of an intuitive Bayesian probability network similar to the generic logical structure. Based on this probability network model, strings corresponding to logical components can be determined by belief propagation. This method is demonstrated to be effective by conducting tests on three types of forms.
{"title":"Logical Structure Analysis for Form Images with Arbitrary Layout by Belief Propagation","authors":"A. Minagawa, Y. Fujii, Hiroaki Takebe, K. Fujimoto","doi":"10.1109/ICDAR.2007.162","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.162","url":null,"abstract":"A new method for analyzing the specific logical structure of forms with unknown layout is proposed. This method uses both the target form image and a generic logical structure as inputs, and models two types of relationships probabilistically: that between strings and logical components, and that between neighboring strings having different logical components. This modeling approach allows strings to be assigned to logical components softly but robustly, and allows the use of an intuitive Bayesian probability network similar to the generic logical structure. Based on this probability network model, strings corresponding to logical components can be determined by belief propagation. This method is demonstrated to be effective by conducting tests on three types of forms.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129885483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kaizhu Huang, Jun Sun, Y. Hotta, K. Fujimoto, S. Naoi
Handwritten numeral recognition is an important pattern recognition task. It can be widely used in various domains, e.g., bank money recognition, which requires a very high recognition rate. As a state-of-the-art classifier, support vector machine (SVM), has been extensively used in this area. Typically, SVM is trained in a batch model, i.e., all data points are simultaneously input for training the classification boundary. However, some slightly exceptional data, only accounting for a small proportion, are critical for the recognition rates. Training a classifier among all the data may possibly treat such legal but slightly exceptional samples as "noise ". In this paper, we propose a novel approach to attack this problem. This approach exploits a two-stage framework by using difference features. In the first stage, a regular SVM is trained on all the training data; in the second stage, only the samples misclassified in the first stage are specially considered. Therefore, the performance can be lifted. The number of misclassifications is often small because of the good performance of SVM. This will present difficulties in training an accurate SVM engine only for these misclassified samples. We then further propose a multi-way to binary approach using difference features. This approach successfully transforms multi-category classification to binary classification and expands the training samples greatly. In order to evaluate the proposed method, experiments are performed on 10,000 handwritten numeral samples extracted from real banks forms. This new algorithm achieves 99.0% accuracy. In comparison, the traditional SVM only gets 98.4%.
{"title":"An SVM-Based High-accurate Recognition Approach for Handwritten Numerals by Using Difference Features","authors":"Kaizhu Huang, Jun Sun, Y. Hotta, K. Fujimoto, S. Naoi","doi":"10.1109/ICDAR.2007.57","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.57","url":null,"abstract":"Handwritten numeral recognition is an important pattern recognition task. It can be widely used in various domains, e.g., bank money recognition, which requires a very high recognition rate. As a state-of-the-art classifier, support vector machine (SVM), has been extensively used in this area. Typically, SVM is trained in a batch model, i.e., all data points are simultaneously input for training the classification boundary. However, some slightly exceptional data, only accounting for a small proportion, are critical for the recognition rates. Training a classifier among all the data may possibly treat such legal but slightly exceptional samples as \"noise \". In this paper, we propose a novel approach to attack this problem. This approach exploits a two-stage framework by using difference features. In the first stage, a regular SVM is trained on all the training data; in the second stage, only the samples misclassified in the first stage are specially considered. Therefore, the performance can be lifted. The number of misclassifications is often small because of the good performance of SVM. This will present difficulties in training an accurate SVM engine only for these misclassified samples. We then further propose a multi-way to binary approach using difference features. This approach successfully transforms multi-category classification to binary classification and expands the training samples greatly. In order to evaluate the proposed method, experiments are performed on 10,000 handwritten numeral samples extracted from real banks forms. This new algorithm achieves 99.0% accuracy. In comparison, the traditional SVM only gets 98.4%.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129640739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shape descriptors play an important role in many document analysis application. In this paper we review some of the shape descriptors proposed in the last years from a new point of view. We propose the definitions of descriptor and primitive and introduce the notion of feature extraction method. With these definitions, we propose a new classification of shape descriptors that permits to classify according to their properties pointing out their strengths and weaknesses.
{"title":"A Review of Shape Descriptors for Document Analysis","authors":"O. R. Terrades, S. Tabbone, Ernest Valveny","doi":"10.1109/ICDAR.2007.33","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.33","url":null,"abstract":"Shape descriptors play an important role in many document analysis application. In this paper we review some of the shape descriptors proposed in the last years from a new point of view. We propose the definitions of descriptor and primitive and introduce the notion of feature extraction method. With these definitions, we propose a new classification of shape descriptors that permits to classify according to their properties pointing out their strengths and weaknesses.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131956609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
India is a multi-lingual multi-script country but there is not much work towards handwritten character recognition of Indian languages. In this paper we propose a modified quadratic classifier based scheme towards the recognition of off-line handwritten numerals of six popular Indian scripts. Here we consider Devnagari, Bangla, Telugu, Oriya, Kannada and Tamil scripts for our experiment. The features used in the classifier are obtained from the directional information of the numerals. For feature computation, the bounding box of a numeral is segmented into blocks and the directional features are computed in each of the blocks. These blocks are then down sampled by a Gaussian filter and the features obtained from the down sampled blocks are fed to a modified quadratic classifier for recognition. Here we have used two sets of feature. We have used 64 dimensional features for high-speed recognition and 400 dimensional features for high-accuracy recognition in our proposed system. A five-fold cross validation technique has been used for result computation and we obtained 99.56%, 98.99%, 99.37%, 98.40%, 98.71% and 98.51% accuracy from Devnagari, Bangla, Telugu, Oriya, Kannada, and Tamil scripts, respectively.
{"title":"Handwritten Numeral Recognition of Six Popular Indian Scripts","authors":"U. Pal, N. Sharma, T. Wakabayashi, F. Kimura","doi":"10.1109/ICDAR.2007.129","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.129","url":null,"abstract":"India is a multi-lingual multi-script country but there is not much work towards handwritten character recognition of Indian languages. In this paper we propose a modified quadratic classifier based scheme towards the recognition of off-line handwritten numerals of six popular Indian scripts. Here we consider Devnagari, Bangla, Telugu, Oriya, Kannada and Tamil scripts for our experiment. The features used in the classifier are obtained from the directional information of the numerals. For feature computation, the bounding box of a numeral is segmented into blocks and the directional features are computed in each of the blocks. These blocks are then down sampled by a Gaussian filter and the features obtained from the down sampled blocks are fed to a modified quadratic classifier for recognition. Here we have used two sets of feature. We have used 64 dimensional features for high-speed recognition and 400 dimensional features for high-accuracy recognition in our proposed system. A five-fold cross validation technique has been used for result computation and we obtained 99.56%, 98.99%, 99.37%, 98.40%, 98.71% and 98.51% accuracy from Devnagari, Bangla, Telugu, Oriya, Kannada, and Tamil scripts, respectively.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132084313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. R. Kalva, F. Enembreck, Alessandro Lameiras Koerich
This paper presents a novel method for the classification of images that combines information extracted from the images and contextual information. The main hypothesis is that contextual information related to an image can contribute in the image classification process. First, independent classifiers are designed to deal with images and text. From the images color, shape and texture features are extracted. These features are used with a neural network (NN) classifier to carry out image classification. On the other hand, contextual information is processed and used with a Naive Bayes (NB) classifier. At the end, the outputs of both classifiers are combined through heuristic rules. Experimental results on a database of more than 5,000 HTML documents have shown that the combination of classifiers provides a meaningful improvement (about 16%) in the correct image classification rate relative to the results provided by the NN classifier alone.
{"title":"WEB Image Classification Based on the Fusion of Image and Text Classifiers","authors":"P. R. Kalva, F. Enembreck, Alessandro Lameiras Koerich","doi":"10.1109/ICDAR.2007.264","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.264","url":null,"abstract":"This paper presents a novel method for the classification of images that combines information extracted from the images and contextual information. The main hypothesis is that contextual information related to an image can contribute in the image classification process. First, independent classifiers are designed to deal with images and text. From the images color, shape and texture features are extracted. These features are used with a neural network (NN) classifier to carry out image classification. On the other hand, contextual information is processed and used with a Naive Bayes (NB) classifier. At the end, the outputs of both classifiers are combined through heuristic rules. Experimental results on a database of more than 5,000 HTML documents have shown that the combination of classifiers provides a meaningful improvement (about 16%) in the correct image classification rate relative to the results provided by the NN classifier alone.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127629144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A symbol decoding and graph generation algorithm for online handwritten mathematical expression recognition is formulated. It differs from our previous system and most other systems in two aspects: (1) it embeds stroke grouping into symbol identification to form a unified probabilistic framework for symbol recognition; and (2) a symbol graph rather than a list of symbol sequence hypotheses is generated, which makes post-processing with new information possible. Experimental results show that high quality symbol graph can be generated by the proposed algorithm. Symbol sequence corresponding to the best path in the graph demonstrates much higher symbol recognition accuracy than before, especially after rescoring with trigram. Math formula recognition performance is significantly improved.
{"title":"A Unified Framework for Symbol Segmentation and Recognition of Handwritten Mathematical Expressions","authors":"Yu Shi, HaiYang Li, F. Soong","doi":"10.1109/ICDAR.2007.38","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.38","url":null,"abstract":"A symbol decoding and graph generation algorithm for online handwritten mathematical expression recognition is formulated. It differs from our previous system and most other systems in two aspects: (1) it embeds stroke grouping into symbol identification to form a unified probabilistic framework for symbol recognition; and (2) a symbol graph rather than a list of symbol sequence hypotheses is generated, which makes post-processing with new information possible. Experimental results show that high quality symbol graph can be generated by the proposed algorithm. Symbol sequence corresponding to the best path in the graph demonstrates much higher symbol recognition accuracy than before, especially after rescoring with trigram. Math formula recognition performance is significantly improved.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"157 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129205757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a new text line detection method for unconstrained handwritten documents. The proposed technique is based on a strategy that consists of three distinct steps. The first step includes image binarization and enhancement, connected component extraction and average character height estimation. In the second step, a block-based Hough transform is used for the detection of potential text lines while a third step is used to correct possible splitting, to detect text lines that the previous step did not reveal and, finally, to separate vertically connected characters and assign them to text lines. The performance evaluation of the proposed approach is based on a consistent and concrete evaluation methodology.
{"title":"Text Line Detection in Unconstrained Handwritten Documents Using a Block-Based Hough Transform Approach","authors":"G. Louloudis, B. Gatos, C. Halatsis","doi":"10.1109/ICDAR.2007.244","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.244","url":null,"abstract":"In this paper we present a new text line detection method for unconstrained handwritten documents. The proposed technique is based on a strategy that consists of three distinct steps. The first step includes image binarization and enhancement, connected component extraction and average character height estimation. In the second step, a block-based Hough transform is used for the detection of potential text lines while a third step is used to correct possible splitting, to detect text lines that the previous step did not reveal and, finally, to separate vertically connected characters and assign them to text lines. The performance evaluation of the proposed approach is based on a consistent and concrete evaluation methodology.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123172676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Krishna Subramanian, P. Natarajan, M. Decerbo, D. Castañón
In this paper, we present a new approach for analysis of images for text-localization and extraction. Our approach puts very few constraints on the font, size and color of text and is capable of handling both scene text and artificial text well. In this paper, we exploit two well-known features of text: approximately constant stroke width and local contrast, and develop a fast, simple, and effective algorithm to detect character strokes. We also show how these can be used for accurate extraction and motivate some advantages of using this approach for text localization over other color-space segmentation based approaches. We analyze the performance of our stroke detection algorithm on images collected for the robust-reading competitions at ICDAR 2003.
{"title":"Character-Stroke Detection for Text-Localization and Extraction","authors":"Krishna Subramanian, P. Natarajan, M. Decerbo, D. Castañón","doi":"10.1109/ICDAR.2007.79","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.79","url":null,"abstract":"In this paper, we present a new approach for analysis of images for text-localization and extraction. Our approach puts very few constraints on the font, size and color of text and is capable of handling both scene text and artificial text well. In this paper, we exploit two well-known features of text: approximately constant stroke width and local contrast, and develop a fast, simple, and effective algorithm to detect character strokes. We also show how these can be used for accurate extraction and motivate some advantages of using this approach for text localization over other color-space segmentation based approaches. We analyze the performance of our stroke detection algorithm on images collected for the robust-reading competitions at ICDAR 2003.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123468674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a new method for high- dimensional descriptor matching, based on the KD-tree, which is a classic method for nearest neighbours search. This new method, which we name 3-way tree, avoids the boundary effects that disrupt the KD-tree in higher dimensionalities, by the addition of redundant, overlapping sub-trees. That way, more precision is obtained for the same querying times. We evaluate our method in the context of image identification for cultural collections, a task which can greatly benefit from the use of high-dimensional local descriptors computed around Pol (Points of Interest).
{"title":"Matching Local Descriptors for Image Identification on Cultural Databases","authors":"Eduardo Valle, M. Cord, S. Philipp-Foliguet","doi":"10.1109/ICDAR.2007.164","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.164","url":null,"abstract":"In this paper we present a new method for high- dimensional descriptor matching, based on the KD-tree, which is a classic method for nearest neighbours search. This new method, which we name 3-way tree, avoids the boundary effects that disrupt the KD-tree in higher dimensionalities, by the addition of redundant, overlapping sub-trees. That way, more precision is obtained for the same querying times. We evaluate our method in the context of image identification for cultural collections, a task which can greatly benefit from the use of high-dimensional local descriptors computed around Pol (Points of Interest).","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126548491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present an application of latent semantic analysis (LSA) for indexing and retrieval of document images with text. The query is specified as a set of word images and the documents which best match with the query representation in the the latent semantic space are retrieved. We show through extensive experiments on a large database that use of LSA for document images provides improvements in retrieval precision as is the case with electronic text documents.
{"title":"Word image based latent semantic indexing for conceptual querying in document image databases","authors":"Sameek Banerjee, Gaurav Harit, S. Chaudhury","doi":"10.1109/ICDAR.2007.269","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.269","url":null,"abstract":"In this paper we present an application of latent semantic analysis (LSA) for indexing and retrieval of document images with text. The query is specified as a set of word images and the documents which best match with the query representation in the the latent semantic space are retrieved. We show through extensive experiments on a large database that use of LSA for document images provides improvements in retrieval precision as is the case with electronic text documents.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122561751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}