In this paper, we present a complete courtesy amount recognition system for Chinese bank checks. The system takes color bank check images as input and consists of three main processing steps: numeral string extraction, segmentation & recognition, and post-processing. They focus sequentially on: detection and extraction of numeral string; segmentation and recognition of the string; and further analysis of recognition results for acceptance or rejection. Information fusion, method complementarity, multi-hypotheses generation then evaluation are three principles employed for designing algorithms in the first two modules. And logistic regression is used for post-processing. A large number of real checks collected from different banks are used for testing the system. Read rate around 82% is observed when the substitution rate is set to 1%, which corresponds to that of a human operator. The performance can also be tuned further toward a suitable balance between inaccuracy and rejection, in accordance with user preference.
{"title":"A Courtesy Amount Recognition System for Chinese Bank Checks","authors":"Dong Liu, Youbin Chen","doi":"10.1109/ICFHR.2012.154","DOIUrl":"https://doi.org/10.1109/ICFHR.2012.154","url":null,"abstract":"In this paper, we present a complete courtesy amount recognition system for Chinese bank checks. The system takes color bank check images as input and consists of three main processing steps: numeral string extraction, segmentation & recognition, and post-processing. They focus sequentially on: detection and extraction of numeral string; segmentation and recognition of the string; and further analysis of recognition results for acceptance or rejection. Information fusion, method complementarity, multi-hypotheses generation then evaluation are three principles employed for designing algorithms in the first two modules. And logistic regression is used for post-processing. A large number of real checks collected from different banks are used for testing the system. Read rate around 82% is observed when the substitution rate is set to 1%, which corresponds to that of a human operator. The performance can also be tuned further toward a suitable balance between inaccuracy and rejection, in accordance with user preference.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131722820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recognition of Off-line handwritten characters and signatures, stroke extraction is often a crucial step. Given the large number of Chinese handwritten characters, pattern matching based on structural decomposition and analysis is useful and essential to Off-line Chinese recognition to reduce ambiguity. Two challenging problems for stroke extraction are: 1) how to extract primary strokes and 2) how to solve the segmentation ambiguities at intersection points. In this paper, we introduce a novel approach based on Optimum Paths(AOP) to solve this problem. Optimum Paths(AOP) are derived from the degree information and continuation property, we use them to tackle these two problems. Compared with other methods, the proposed approach has extracted strokes from Off-line Chinese handwritten characters with better performance.
{"title":"A Novel Approach for Stroke Extraction of Off-Line Chinese Handwritten Characters Based on Optimum Paths","authors":"J. Tan, J. Lai, Weishi Zheng, C. Suen","doi":"10.1109/ICFHR.2012.165","DOIUrl":"https://doi.org/10.1109/ICFHR.2012.165","url":null,"abstract":"In recognition of Off-line handwritten characters and signatures, stroke extraction is often a crucial step. Given the large number of Chinese handwritten characters, pattern matching based on structural decomposition and analysis is useful and essential to Off-line Chinese recognition to reduce ambiguity. Two challenging problems for stroke extraction are: 1) how to extract primary strokes and 2) how to solve the segmentation ambiguities at intersection points. In this paper, we introduce a novel approach based on Optimum Paths(AOP) to solve this problem. Optimum Paths(AOP) are derived from the degree information and continuation property, we use them to tackle these two problems. Compared with other methods, the proposed approach has extracted strokes from Off-line Chinese handwritten characters with better performance.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132134018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a codebook based method for handwritten text-line segmentation which uses image-patches in the training data to learn a graph-based similarity for clustering. We first construct a codebook of image-patches using K-medoids, and obtain exemplars which encode local evidence. We then obtain the corresponding codewords for all patches extracted from a given image and construct a similarity graph using the learned evidence and partitioned to obtain text-lines. Our learning based approach performs well on a field dataset containing degraded and un-constrained handwritten Arabic document images. Results on ICDAR 2009 segmentation contest dataset show that the method is competitive with previous approaches.
{"title":"Learning Text-Line Segmentation Using Codebooks and Graph Partitioning","authors":"Le Kang, J. Kumar, Peng Ye, D. Doermann","doi":"10.1109/ICFHR.2012.228","DOIUrl":"https://doi.org/10.1109/ICFHR.2012.228","url":null,"abstract":"In this paper, we present a codebook based method for handwritten text-line segmentation which uses image-patches in the training data to learn a graph-based similarity for clustering. We first construct a codebook of image-patches using K-medoids, and obtain exemplars which encode local evidence. We then obtain the corresponding codewords for all patches extracted from a given image and construct a similarity graph using the learned evidence and partitioned to obtain text-lines. Our learning based approach performs well on a field dataset containing degraded and un-constrained handwritten Arabic document images. Results on ICDAR 2009 segmentation contest dataset show that the method is competitive with previous approaches.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121777376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arti Shivram, Chetan Ramaiah, U. Porwal, V. Govindaraju
With the explosive growth of the tablet form factor and greater availability of pen-based direct input, writer identification in online environments is increasingly becoming critical for a variety of downstream applications such as intelligent and adaptive user environments, search, retrieval, indexing and digital forensics. Extant research has approached writer identification by using writing styles as a discriminative function between writers. In contrast, we model writing styles as a shared component of an individualâs handwriting. We develop a theoretical framework for this conceptualization and model this using a three level hierarchical Bayesian model (Latent Dirichlet Allocation). In this text-independent, unsupervised model each writerâs handwriting is modeled as a distribution over finite writing styles that are shared amongst writers. We test our model on a novel online/offline handwriting dataset IBM UB 1 which is being made available to the public. Our experiments show comparable results to current benchmarks and demonstrate the efficacy of explicitly modeling shared writing styles.
{"title":"Modeling Writing Styles for Online Writer Identification: A Hierarchical Bayesian Approach","authors":"Arti Shivram, Chetan Ramaiah, U. Porwal, V. Govindaraju","doi":"10.1109/ICFHR.2012.235","DOIUrl":"https://doi.org/10.1109/ICFHR.2012.235","url":null,"abstract":"With the explosive growth of the tablet form factor and greater availability of pen-based direct input, writer identification in online environments is increasingly becoming critical for a variety of downstream applications such as intelligent and adaptive user environments, search, retrieval, indexing and digital forensics. Extant research has approached writer identification by using writing styles as a discriminative function between writers. In contrast, we model writing styles as a shared component of an individualâs handwriting. We develop a theoretical framework for this conceptualization and model this using a three level hierarchical Bayesian model (Latent Dirichlet Allocation). In this text-independent, unsupervised model each writerâs handwriting is modeled as a distribution over finite writing styles that are shared amongst writers. We test our model on a novel online/offline handwriting dataset IBM UB 1 which is being made available to the public. Our experiments show comparable results to current benchmarks and demonstrate the efficacy of explicitly modeling shared writing styles.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114752624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automated identification of individuals using biometric technologies is finding increasing application in diverse areas, yet designing practical systems can still present significant challenges. Choice of the modality to adopt, the classification/matching techniques best suited to the application, the most effective sensors to use, and so on, are all important considerations, and can help to ameliorate factors which might detract from optimal performance. Less well researched, however, is how to optimise performance by means of exploiting broader-based information often available in a specific task and, in particular, the exploitation of so-called "soft" biometric data is often overlooked. This paper proposes a novel approach to the integration of soft biometric data into an effective processing structure for an identification task by adopting a fuzzy representation of information which is inherently continuous, using subject age as a typical example. Our results show this to be a promising methodology with possible benefits in a number of potentially difficult practical scenarios.
{"title":"Improving Handwritten Signature-Based Identity Prediction through the Integration of Fuzzy Soft-Biometric Data","authors":"Márjory Da Costa-Abreu, M. Fairhurst","doi":"10.1109/ICFHR.2012.221","DOIUrl":"https://doi.org/10.1109/ICFHR.2012.221","url":null,"abstract":"Automated identification of individuals using biometric technologies is finding increasing application in diverse areas, yet designing practical systems can still present significant challenges. Choice of the modality to adopt, the classification/matching techniques best suited to the application, the most effective sensors to use, and so on, are all important considerations, and can help to ameliorate factors which might detract from optimal performance. Less well researched, however, is how to optimise performance by means of exploiting broader-based information often available in a specific task and, in particular, the exploitation of so-called \"soft\" biometric data is often overlooked. This paper proposes a novel approach to the integration of soft biometric data into an effective processing structure for an identification task by adopting a fuzzy representation of information which is inherently continuous, using subject age as a typical example. Our results show this to be a promising methodology with possible benefits in a number of potentially difficult practical scenarios.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123307515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The discipline of First Temple Period epigraphy (the study of writing) relies heavily on manually-drawn facsimiles (black and white images) of ancient inscriptions. This practice may unintentionally mix up documentation and interpretation. As an alternative, this article surveys the performance of several existing binarization techniques. The quality of their results is found to be inadequate for our purpose. A new method for automatically creating a facsimile is then suggested. The technique is based on a connected-component oriented elastic registration of an already existing imperfect facsimile to the inscription image. Some empirical results, supporting the methodology, are presented. The procedure is also relevant to the creation of facsimiles for other types of inscriptions.
{"title":"Binarization of First Temple Period Inscriptions: Performance of Existing Algorithms and a New Registration Based Scheme","authors":"Arie Shaus, Eli Turkel, E. Piasetzky","doi":"10.1109/ICFHR.2012.187","DOIUrl":"https://doi.org/10.1109/ICFHR.2012.187","url":null,"abstract":"The discipline of First Temple Period epigraphy (the study of writing) relies heavily on manually-drawn facsimiles (black and white images) of ancient inscriptions. This practice may unintentionally mix up documentation and interpretation. As an alternative, this article surveys the performance of several existing binarization techniques. The quality of their results is found to be inadequate for our purpose. A new method for automatically creating a facsimile is then suggested. The technique is based on a connected-component oriented elastic registration of an already existing imperfect facsimile to the inscription image. Some empirical results, supporting the methodology, are presented. The procedure is also relevant to the creation of facsimiles for other types of inscriptions.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123617506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Because of the existence of possible carbon and seals, it's quite often that images of financial documents such as Chinese bank checks are suffered from bleed-through effects, which will affect the performance of automatic financial document processing such as seal verification and OCR. This paper presents an effective algorithm to deal with bleed-through effects existing in the images of financial documents. Double-sided images scanned simultaneously are used as in-puts, and the bleed-through effect is detected and removed after the registration of the recto and verso side images. There are two major aspects of contribution in our work. First, our algorithm can deal with images with complex background from real-life financial documents while most other algorithms only deal with images with simple background. Second, we combine the fast ICA algorithm with Gatos' local adaptive thresholding algorithm [1] to deal with the bleed-through effects. Experiments show that our proposed algorithm is very promising.
{"title":"Reduction of Bleed-Through Effect in Images of Chinese Bank Items","authors":"Bingyu Chi, Youbin Chen","doi":"10.1109/ICFHR.2012.260","DOIUrl":"https://doi.org/10.1109/ICFHR.2012.260","url":null,"abstract":"Because of the existence of possible carbon and seals, it's quite often that images of financial documents such as Chinese bank checks are suffered from bleed-through effects, which will affect the performance of automatic financial document processing such as seal verification and OCR. This paper presents an effective algorithm to deal with bleed-through effects existing in the images of financial documents. Double-sided images scanned simultaneously are used as in-puts, and the bleed-through effect is detected and removed after the registration of the recto and verso side images. There are two major aspects of contribution in our work. First, our algorithm can deal with images with complex background from real-life financial documents while most other algorithms only deal with images with simple background. Second, we combine the fast ICA algorithm with Gatos' local adaptive thresholding algorithm [1] to deal with the bleed-through effects. Experiments show that our proposed algorithm is very promising.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124080173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jacob Devlin, M. Kamali, Krishna Subramanian, R. Prasad, P. Natarajan
When performing handwriting recognition on natural language text, the use of a word-level language model (LM) is known to significantly improve recognition accuracy. The most common type of language model, the n-gram model, decomposes sentences into short, overlapping chunks. In this paper, we propose a new type of language model which we use in addition to the standard n-gram LM. Our new model uses the likelihood score from a statistical machine translation system as a reranking feature. In general terms, we automatically translate each OCR hypothesis into another language, and then create a feature score based on how "difficult" it was to perform the translation. Intuitively, the difficulty of translation correlates with how well-formed the input sentence is. In an Arabic handwriting recognition task, we were able to obtain an 0.4% absolute improvement to word error rate (WER) on top of a powerful 5-gram LM.
{"title":"Statistical Machine Translation as a Language Model for Handwriting Recognition","authors":"Jacob Devlin, M. Kamali, Krishna Subramanian, R. Prasad, P. Natarajan","doi":"10.1109/ICFHR.2012.273","DOIUrl":"https://doi.org/10.1109/ICFHR.2012.273","url":null,"abstract":"When performing handwriting recognition on natural language text, the use of a word-level language model (LM) is known to significantly improve recognition accuracy. The most common type of language model, the n-gram model, decomposes sentences into short, overlapping chunks. In this paper, we propose a new type of language model which we use in addition to the standard n-gram LM. Our new model uses the likelihood score from a statistical machine translation system as a reranking feature. In general terms, we automatically translate each OCR hypothesis into another language, and then create a feature score based on how \"difficult\" it was to perform the translation. Intuitively, the difficulty of translation correlates with how well-formed the input sentence is. In an Arabic handwriting recognition task, we were able to obtain an 0.4% absolute improvement to word error rate (WER) on top of a powerful 5-gram LM.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129340686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Fisher linear discriminant analysis (FDA) is the most well-known supervised dimensionality reduction model. However, when the number of classes is much larger than the reduced dimensionality, FDA suffers from the class separation problem in that it will preserve the distances of the already well-separated classes and cause a large overlap of neighboring classes. To cope with this problem, we propose a new model called confused distance maximization (CDM). The objective of CDM is to maximize the distance of the most confusable classes, according to the confusion matrix estimated from the training data with a pre-learned classifier. Compared with FDA that maximizes the sum of the distances of all class pairs, CDM is more relevant to the classification accuracy by weighting the pairwise distance according to the confusion matrix. Furthermore, CDM is computationally inexpensive which makes it indeed efficient and effective for large category problems. Experiments on two large-scale 3,755-class Chinese handwriting databases (offline and online) demonstrate that CDM can achieve the best performance compared with FDA and other competitive weighting based criteria.
{"title":"Confused Distance Maximization for Large Category Dimensionality Reduction","authors":"Xu-Yao Zhang, Cheng-Lin Liu","doi":"10.1109/ICFHR.2012.196","DOIUrl":"https://doi.org/10.1109/ICFHR.2012.196","url":null,"abstract":"The Fisher linear discriminant analysis (FDA) is the most well-known supervised dimensionality reduction model. However, when the number of classes is much larger than the reduced dimensionality, FDA suffers from the class separation problem in that it will preserve the distances of the already well-separated classes and cause a large overlap of neighboring classes. To cope with this problem, we propose a new model called confused distance maximization (CDM). The objective of CDM is to maximize the distance of the most confusable classes, according to the confusion matrix estimated from the training data with a pre-learned classifier. Compared with FDA that maximizes the sum of the distances of all class pairs, CDM is more relevant to the classification accuracy by weighting the pairwise distance according to the confusion matrix. Furthermore, CDM is computationally inexpensive which makes it indeed efficient and effective for large category problems. Experiments on two large-scale 3,755-class Chinese handwriting databases (offline and online) demonstrate that CDM can achieve the best performance compared with FDA and other competitive weighting based criteria.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130457974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Page segmentation and classification is very important in document layout analysis system before it is presented to an OCR system or for any other subsequent processing steps. In this paper, we propose an accurate and suitably designed system for complex documents segmentation. This system is based on steerable pyramid transform. The features extracted from pyramid sub-bands serve to locate and classify regions into text (either machine printed or handwritten) and non-text (images, graphics, drawings or paintings) in some noise-infected, deformed, multilingual, multi-script document images. These documents contain tabular structures, logos, stamps, handwritten script blocks, photos etc. The encouraging and promising results obtained on 1,000 official complex document images data set are presented in this research paper.
{"title":"Page Segmentation Based on Steerable Pyramid Features","authors":"Mohamed Benjelil, R. Mullot, A. Alimi","doi":"10.1109/ICFHR.2012.253","DOIUrl":"https://doi.org/10.1109/ICFHR.2012.253","url":null,"abstract":"Page segmentation and classification is very important in document layout analysis system before it is presented to an OCR system or for any other subsequent processing steps. In this paper, we propose an accurate and suitably designed system for complex documents segmentation. This system is based on steerable pyramid transform. The features extracted from pyramid sub-bands serve to locate and classify regions into text (either machine printed or handwritten) and non-text (images, graphics, drawings or paintings) in some noise-infected, deformed, multilingual, multi-script document images. These documents contain tabular structures, logos, stamps, handwritten script blocks, photos etc. The encouraging and promising results obtained on 1,000 official complex document images data set are presented in this research paper.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"29 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125795496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}