Pub Date : 2001-09-10DOI: 10.1109/ICDAR.2001.953747
E. Ratzlaff
A scanning n-tuple classifier is applied to the task of recognizing online handwritten isolated digits. Various aspects of preprocessing, feature extraction, training and application of the scanning n-tuple method are examined. These include: distortion transformations of training data, test data perturbations, variations in bitmap generation and scaling, chain code extraction and concatenation, various static and dynamic features, and scanning n-tuple combinations. Results are reported for both the UNIPEN Train-R01/V07 and DevTest-R01/V02 subset la isolated digits databases.
{"title":"A scanning n-tuple classifier for online recognition of handwritten digits","authors":"E. Ratzlaff","doi":"10.1109/ICDAR.2001.953747","DOIUrl":"https://doi.org/10.1109/ICDAR.2001.953747","url":null,"abstract":"A scanning n-tuple classifier is applied to the task of recognizing online handwritten isolated digits. Various aspects of preprocessing, feature extraction, training and application of the scanning n-tuple method are examined. These include: distortion transformations of training data, test data perturbations, variations in bitmap generation and scaling, chain code extraction and concatenation, various static and dynamic features, and scanning n-tuple combinations. Results are reported for both the UNIPEN Train-R01/V07 and DevTest-R01/V02 subset la isolated digits databases.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130849319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-10DOI: 10.1109/ICDAR.2001.953789
Wenwei Wang, A. Brakensiek, A. Kosmala, G. Rigoll
Because of large shape variations in human handwriting, cursive handwriting recognition remains a challenging task. Usually, the recognition performance depends crucially upon the pre-processing steps, e.g. the word baseline detection and segmentation process. Hidden Markov models (HMMs) have the ability to model similarities and variations among samples of a class. In this paper, we present a multi-branch HMM modeling method and an HMM-based two-pass modeling approach. Whereas the multi-branch HMM method makes the resulting system more robust with word baseline detection, the two-pass recognition approach exploits the segmentation ability of the Viterbi algorithm and creates another HMM set and carries out a second recognition pass. The total performance is enhanced by the combination of the two recognition passes. Experiments recognizing cursive handwritten words with a 30,000-word lexicon have been carried out. The results demonstrate that our novel approaches achieve better recognition performance and reduce the relative error rate significantly.
{"title":"Multi-branch and two-pass HMM modeling approaches for off-line cursive handwriting recognition","authors":"Wenwei Wang, A. Brakensiek, A. Kosmala, G. Rigoll","doi":"10.1109/ICDAR.2001.953789","DOIUrl":"https://doi.org/10.1109/ICDAR.2001.953789","url":null,"abstract":"Because of large shape variations in human handwriting, cursive handwriting recognition remains a challenging task. Usually, the recognition performance depends crucially upon the pre-processing steps, e.g. the word baseline detection and segmentation process. Hidden Markov models (HMMs) have the ability to model similarities and variations among samples of a class. In this paper, we present a multi-branch HMM modeling method and an HMM-based two-pass modeling approach. Whereas the multi-branch HMM method makes the resulting system more robust with word baseline detection, the two-pass recognition approach exploits the segmentation ability of the Viterbi algorithm and creates another HMM set and carries out a second recognition pass. The total performance is enhanced by the combination of the two recognition passes. Experiments recognizing cursive handwritten words with a 30,000-word lexicon have been carried out. The results demonstrate that our novel approaches achieve better recognition performance and reduce the relative error rate significantly.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127901682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-10DOI: 10.1109/ICDAR.2001.953953
Xuewen Wang, Xiaoqing Ding, Changsong Liu
With the proposal of the concept of a "smart camera", character recognition in natural scene images has become an interesting but difficult task nowadays. In this paper, we propose an algorithm for extracting characters from text regions of natural scene images with complex backgrounds. Our method first clusters the color feature vectors of the text regions into a number of color classes by applying a modified coarse-fine fuzzy c-means algorithm. Then, different slices are constructed according to these color classes. Characters are eventually extracted from the images using the information of segmentation and recognition. Some experiments have shown that this method is a promising starting point for such applications.
{"title":"Character extraction and recognition in natural scene images","authors":"Xuewen Wang, Xiaoqing Ding, Changsong Liu","doi":"10.1109/ICDAR.2001.953953","DOIUrl":"https://doi.org/10.1109/ICDAR.2001.953953","url":null,"abstract":"With the proposal of the concept of a \"smart camera\", character recognition in natural scene images has become an interesting but difficult task nowadays. In this paper, we propose an algorithm for extracting characters from text regions of natural scene images with complex backgrounds. Our method first clusters the color feature vectors of the text regions into a number of color classes by applying a modified coarse-fine fuzzy c-means algorithm. Then, different slices are constructed according to these color classes. Characters are eventually extracted from the images using the information of segmentation and recognition. Some experiments have shown that this method is a promising starting point for such applications.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125480434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-10DOI: 10.1109/ICDAR.2001.953893
Kam-Fai Chan, D. Yeung
Most of the calculator programs found in existing pen-based mobile computing devices, such as personal digital assistants (PDA) and other handheld devices, do not take full advantages of the pen technology offered by these devices. Instead, input of expressions is still done through a virtual keypad shown on the screen, and the stylus (i.e., electronic pen) is simply used as a pointing device. In this paper we propose an intelligent handwriting-based calculator program with which the user can enter expressions simply by writing them on the screen using a stylus. In addition, variables can be defined to store intermediate results for subsequent calculations, as in ordinary algebraic calculations. The proposed software is the result of a novel application of on-line mathematical expression recognition technology which has mostly been used by others only for some mathematical expression editor programs.
{"title":"PenCalc: a novel application of on-line mathematical expression recognition technology","authors":"Kam-Fai Chan, D. Yeung","doi":"10.1109/ICDAR.2001.953893","DOIUrl":"https://doi.org/10.1109/ICDAR.2001.953893","url":null,"abstract":"Most of the calculator programs found in existing pen-based mobile computing devices, such as personal digital assistants (PDA) and other handheld devices, do not take full advantages of the pen technology offered by these devices. Instead, input of expressions is still done through a virtual keypad shown on the screen, and the stylus (i.e., electronic pen) is simply used as a pointing device. In this paper we propose an intelligent handwriting-based calculator program with which the user can enter expressions simply by writing them on the screen using a stylus. In addition, variables can be defined to store intermediate results for subsequent calculations, as in ordinary algebraic calculations. The proposed software is the result of a novel application of on-line mathematical expression recognition technology which has mostly been used by others only for some mathematical expression editor programs.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122992227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-10DOI: 10.1109/ICDAR.2001.953963
Gaurav Harit, S. Chaudhury, Puneet Gupta, Neeti Vohra, S. Joshi
This paper presents a new model-based document image segmentation scheme that uses XML-DTDs (eXtensible Markup Language Document Type Definitions). Given a document image, the algorithm has the ability to select the appropriate model. A new wavelet-based tool has been designed for distinguishing text from non-text regions and characterization of font sizes. Our model-based analysis scheme makes use of this tool for identifying the logical components of a document image.
{"title":"A model guided document image analysis scheme","authors":"Gaurav Harit, S. Chaudhury, Puneet Gupta, Neeti Vohra, S. Joshi","doi":"10.1109/ICDAR.2001.953963","DOIUrl":"https://doi.org/10.1109/ICDAR.2001.953963","url":null,"abstract":"This paper presents a new model-based document image segmentation scheme that uses XML-DTDs (eXtensible Markup Language Document Type Definitions). Given a document image, the algorithm has the ability to select the appropriate model. A new wavelet-based tool has been designed for distinguishing text from non-text regions and characterization of font sizes. Our model-based analysis scheme makes use of this tool for identifying the logical components of a document image.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122312676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-10DOI: 10.1109/ICDAR.2001.953768
Jianying Hu, R. Kashi, D. Lopresti, G. Wilfong, G. Nagy
The principle that for every document analysis task there exists a mechanism for creating well-defined ground-truth is a widely held tenet. Past experience with standard datasets providing ground-truth for character recognition and page segmentation tasks supports this belief. In the process of attempting to evaluate several table recognition algorithms we have been developing, however, we have uncovered a number of serious hurdles connected with the ground-truthing of tables. This problem may, in fact, be much more difficult than it appears. We present a detailed analysis of why table ground-truthing is so hard, including the notions that there may exist more than one acceptable "truth" and/or incomplete or partial "truths".
{"title":"Why table ground-truthing is hard","authors":"Jianying Hu, R. Kashi, D. Lopresti, G. Wilfong, G. Nagy","doi":"10.1109/ICDAR.2001.953768","DOIUrl":"https://doi.org/10.1109/ICDAR.2001.953768","url":null,"abstract":"The principle that for every document analysis task there exists a mechanism for creating well-defined ground-truth is a widely held tenet. Past experience with standard datasets providing ground-truth for character recognition and page segmentation tasks supports this belief. In the process of attempting to evaluate several table recognition algorithms we have been developing, however, we have uncovered a number of serious hurdles connected with the ground-truthing of tables. This problem may, in fact, be much more difficult than it appears. We present a detailed analysis of why table ground-truthing is so hard, including the notions that there may exist more than one acceptable \"truth\" and/or incomplete or partial \"truths\".","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128019458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-10DOI: 10.1109/ICDAR.2001.953908
A. Downton, A. Tams, G. Wells, A. C. Holmes, S. Lucas, G. Beccaloni, M. Scoble, G. S. Robinson
Presents a progress report (after 1 year of a 3 year project) on the overall design for a flexible archive conversion system, intended eventually for widespread use as a tool to convert legacy typescript and handwritten archive card indexes into Internet-accessible and searchable databases. The VIADOCS system is being developed and evaluated on a demonstrator archive of 30,000 pyraloid moth cards at the UK Natural History Museum, and has already demonstrated a successful and efficient mechanism for image acquisition using a modified bank cheque scanner. Document image processing and analysis techniques, defined by an XML validating document type definition (DTD), are being used to correct defects in the acquired images and parse card sequences to match the hierarchical taxonomy of pyraloid moth species. Parsed data is processed by offline OCR engines augmented by field-specific subject dictionaries to produce a 'draft' online archive. This archive will then be validated interactively via a Web browser as it is used. It is hoped eventually to provide an efficient and configurable legacy archive document conversion system not only for the Natural History Museum, but also for all museums, libraries and archives where there is a need to interrogate legacy documents via computer.
{"title":"Constructing Web-based legacy index card archives-architectural design issues and initial data acquisition","authors":"A. Downton, A. Tams, G. Wells, A. C. Holmes, S. Lucas, G. Beccaloni, M. Scoble, G. S. Robinson","doi":"10.1109/ICDAR.2001.953908","DOIUrl":"https://doi.org/10.1109/ICDAR.2001.953908","url":null,"abstract":"Presents a progress report (after 1 year of a 3 year project) on the overall design for a flexible archive conversion system, intended eventually for widespread use as a tool to convert legacy typescript and handwritten archive card indexes into Internet-accessible and searchable databases. The VIADOCS system is being developed and evaluated on a demonstrator archive of 30,000 pyraloid moth cards at the UK Natural History Museum, and has already demonstrated a successful and efficient mechanism for image acquisition using a modified bank cheque scanner. Document image processing and analysis techniques, defined by an XML validating document type definition (DTD), are being used to correct defects in the acquired images and parse card sequences to match the hierarchical taxonomy of pyraloid moth species. Parsed data is processed by offline OCR engines augmented by field-specific subject dictionaries to produce a 'draft' online archive. This archive will then be validated interactively via a Web browser as it is used. It is hoped eventually to provide an efficient and configurable legacy archive document conversion system not only for the Natural History Museum, but also for all museums, libraries and archives where there is a need to interrogate legacy documents via computer.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133935798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-10DOI: 10.1109/ICDAR.2001.953926
C. Stefano, A. D. Cioppa, A. Marcelli
The recent research efforts in the field of video parsing and analysis have recognized that the soundtrack represents an important supplementary source of content information. In this framework, one of the most relevant topics is that of detecting homogeneous segments within the audio stream, in that changes in the audio very often coincide with scene changes. We present some preliminary results obtained by using different evolutionary algorithms for detecting music and speech audio segments. The experiments have been carried out on MPEG encoded sequences to avoid the computational cost of the decoding procedures.
{"title":"An investigation on MPEG audio segmentation by evolutionary algorithms","authors":"C. Stefano, A. D. Cioppa, A. Marcelli","doi":"10.1109/ICDAR.2001.953926","DOIUrl":"https://doi.org/10.1109/ICDAR.2001.953926","url":null,"abstract":"The recent research efforts in the field of video parsing and analysis have recognized that the soundtrack represents an important supplementary source of content information. In this framework, one of the most relevant topics is that of detecting homogeneous segments within the audio stream, in that changes in the audio very often coincide with scene changes. We present some preliminary results obtained by using different evolutionary algorithms for detecting music and speech audio segments. The experiments have been carried out on MPEG encoded sequences to avoid the computational cost of the decoding procedures.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134445505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-10DOI: 10.1109/ICDAR.2001.953927
Hao Wang, J. Kangas
This paper proposes a method of identifying character-like regions in order to extract and recognize characters in natural color scene images automatically. After connected component extraction based on a multi-group decomposition scheme, alignment analysis is used to check the block candidates, namely, the character-like regions in each binary image layer and the final composed image. Priority adaptive segmentation (PAS) is implemented to obtain accurate foreground pixels of the character in each block. Then some heuristic meanings such as statistical features, recognition confidence, and alignment properties, are employed to justify the segmented characters. The algorithms are robust for a wide range of character fonts, shooting conditions, and color backgrounds. Results of our experiments are promising for real applications.
{"title":"Character-like region verification for extracting text in scene images","authors":"Hao Wang, J. Kangas","doi":"10.1109/ICDAR.2001.953927","DOIUrl":"https://doi.org/10.1109/ICDAR.2001.953927","url":null,"abstract":"This paper proposes a method of identifying character-like regions in order to extract and recognize characters in natural color scene images automatically. After connected component extraction based on a multi-group decomposition scheme, alignment analysis is used to check the block candidates, namely, the character-like regions in each binary image layer and the final composed image. Priority adaptive segmentation (PAS) is implemented to obtain accurate foreground pixels of the character in each block. Then some heuristic meanings such as statistical features, recognition confidence, and alignment properties, are employed to justify the segmented characters. The algorithms are robust for a wide range of character fonts, shooting conditions, and color backgrounds. Results of our experiments are promising for real applications.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134471355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2001-09-10DOI: 10.1109/ICDAR.2001.953795
Urs-Viktor Marti, H. Bunke
In this paper we present a system for unconstrained handwritten text recognition. The system consists of three components: preprocessing, feature extraction and recognition. In the preprocessing phase, a page of handwritten text is divided into its lines and the writing is normalized by means of skew and slant correction, positioning and scaling. From a normalized text line image, features are extracted using a sliding window technique. From each position of the window nine geometrical features are computed. The core of the system, the recognizes is based on hidden Markov models. For each individual character, a model is provided. The character models are concatenated to words using a vocabulary. Moreover, the word models are concatenated to models that represent full lines of text. Thus the difficult problem of segmenting a line of text into its individual words can be overcome. To enhance the recognition capabilities of the system, a statistical language model is integrated into the hidden Markov model framework. To preselect useful language models and compare them, perplexity is used. Both perplexity as originally proposed and normalized perplexity are considered.
{"title":"On the influence of vocabulary size and language models in unconstrained handwritten text recognition","authors":"Urs-Viktor Marti, H. Bunke","doi":"10.1109/ICDAR.2001.953795","DOIUrl":"https://doi.org/10.1109/ICDAR.2001.953795","url":null,"abstract":"In this paper we present a system for unconstrained handwritten text recognition. The system consists of three components: preprocessing, feature extraction and recognition. In the preprocessing phase, a page of handwritten text is divided into its lines and the writing is normalized by means of skew and slant correction, positioning and scaling. From a normalized text line image, features are extracted using a sliding window technique. From each position of the window nine geometrical features are computed. The core of the system, the recognizes is based on hidden Markov models. For each individual character, a model is provided. The character models are concatenated to words using a vocabulary. Moreover, the word models are concatenated to models that represent full lines of text. Thus the difficult problem of segmenting a line of text into its individual words can be overcome. To enhance the recognition capabilities of the system, a statistical language model is integrated into the hidden Markov model framework. To preselect useful language models and compare them, perplexity is used. Both perplexity as originally proposed and normalized perplexity are considered.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114013725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}