Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227855
Sait Ulas Korkmaz, G. Kirçiçegi, Y. Akinci, V. Atalay
This paper presents particularly a contextual postprocessing subsystem for a Turkish machine printedcharacter recognition system. The contextual postprocessing subsystem is based on positional binary 3-gram statistics for Turkish language, an error correctorparser and a lexicon, which contains root words and theinflected forms of the root words. Error corrector parseris used for correcting CR alternatives using TurkishMorphology.
{"title":"A character recognizer for Turkish language","authors":"Sait Ulas Korkmaz, G. Kirçiçegi, Y. Akinci, V. Atalay","doi":"10.1109/ICDAR.2003.1227855","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227855","url":null,"abstract":"This paper presents particularly a contextual postprocessing subsystem for a Turkish machine printedcharacter recognition system. The contextual postprocessing subsystem is based on positional binary 3-gram statistics for Turkish language, an error correctorparser and a lexicon, which contains root words and theinflected forms of the root words. Error corrector parseris used for correcting CR alternatives using TurkishMorphology.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128204457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227642
M. Schambach
On the basis of a well accepted, HMM-based cursive script recognition system, an algorithm which automatically adapts the length of the models representing the letter writing variants is proposed. An average improvement in recognition performance of about 2.72 percent could be obtained. Two initialization methods for the algorithm have been tested, which show quite different behaviors; both prove to be useful in different application areas. To get a deeper insight into the functioning of the algorithm a method for the visualization of letter HMMs is developed. It shows the plausibility of most results, but also the limitations of the proposed method. However, these are mostly due to given restrictions of the training and recognition method of the underlying system.
{"title":"Model length adaptation of an HMM based cursive word recognition system","authors":"M. Schambach","doi":"10.1109/ICDAR.2003.1227642","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227642","url":null,"abstract":"On the basis of a well accepted, HMM-based cursive script recognition system, an algorithm which automatically adapts the length of the models representing the letter writing variants is proposed. An average improvement in recognition performance of about 2.72 percent could be obtained. Two initialization methods for the algorithm have been tested, which show quite different behaviors; both prove to be useful in different application areas. To get a deeper insight into the functioning of the algorithm a method for the visualization of letter HMMs is developed. It shows the plausibility of most results, but also the limitations of the proposed method. However, these are mostly due to given restrictions of the training and recognition method of the underlying system.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131995947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227736
Tamás Varga, H. Bunke
A perturbation model for generating synthetic text lines from existing cursively handwritten lines of text produced by human writers is presented. Our purpose is to improve the performance of an HMM-based off-line cursive handwriting recognition system by providing it with additional synthetic training data. Two kinds of perturbations are applied, geometrical transformations and thinning/thickening operations. The proposed perturbation model is evaluated under different experimental conditions.
{"title":"Generation of synthetic training data for an HMM-based handwriting recognition system","authors":"Tamás Varga, H. Bunke","doi":"10.1109/ICDAR.2003.1227736","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227736","url":null,"abstract":"A perturbation model for generating synthetic text lines from existing cursively handwritten lines of text produced by human writers is presented. Our purpose is to improve the performance of an HMM-based off-line cursive handwriting recognition system by providing it with additional synthetic training data. Two kinds of perturbations are applied, geometrical transformations and thinning/thickening operations. The proposed perturbation model is evaluated under different experimental conditions.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"342 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124228756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227715
Jingyu He, A. Downton
A configurable archive document image analysis system for digital library construction has been designed using rapid prototyping and top-down iterative development methods. This approach has been found to be essential in order to capture the curators' expertise about existing card archive structures, content and databases. The design currently achieves about 93% correct segmentation of the required archive card fields overall, with 81.3% of all archive cards in a testset of 2000 images having all fields correctly segmented and labeled. Analysis of errors in the testset indicates that heavily-annotated cards and non-standard card formats comprise 5-10% of the overall archive, and a significant proportion of these are unlikely to be resolvable without curatorial intervention.
{"title":"User-assisted archive document image analysis for digital library construction","authors":"Jingyu He, A. Downton","doi":"10.1109/ICDAR.2003.1227715","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227715","url":null,"abstract":"A configurable archive document image analysis system for digital library construction has been designed using rapid prototyping and top-down iterative development methods. This approach has been found to be essential in order to capture the curators' expertise about existing card archive structures, content and databases. The design currently achieves about 93% correct segmentation of the required archive card fields overall, with 81.3% of all archive cards in a testset of 2000 images having all fields correctly segmented and labeled. Analysis of errors in the testset indicates that heavily-annotated cards and non-standard card formats comprise 5-10% of the overall archive, and a significant proportion of these are unlikely to be resolvable without curatorial intervention.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127797774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227728
B. Allier, J. Duong, Antoine Gagneux, Pierre Mallet, H. Emptoz
In this article we present a study based on the use of texture features for logical pre-labeling. The aim of our work is to calculate a great number of texture features over three sets of machine-printed document images and to study their joint discriminant power using SVM classifiers. The three corpuses we use are: the Archives of Savoie (AoS), composed of strongly structured documents, a subset of the UW3 database, and a third that is not structured at all, since it is composed of Web site images. The originality of our contribution is to sum up various methods that have been used for many years in our domain, and to test them on documents having very different specificities.
在这篇文章中,我们提出了一个基于纹理特征的逻辑预标记的研究。我们的工作目的是在三组机器打印的文档图像上计算大量的纹理特征,并使用SVM分类器研究它们的联合判别能力。我们使用的三个语料库是:萨瓦档案馆(Archives of Savoie, AoS),它由强结构化文档组成,是UW3数据库的一个子集;第三个语料库完全没有结构化,因为它由Web站点图像组成。我们贡献的独创性在于总结了在我们的领域中已经使用多年的各种方法,并在具有非常不同的特殊性的文档上对它们进行了测试。
{"title":"Texture feature characterization for logical pre-labeling","authors":"B. Allier, J. Duong, Antoine Gagneux, Pierre Mallet, H. Emptoz","doi":"10.1109/ICDAR.2003.1227728","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227728","url":null,"abstract":"In this article we present a study based on the use of texture features for logical pre-labeling. The aim of our work is to calculate a great number of texture features over three sets of machine-printed document images and to study their joint discriminant power using SVM classifiers. The three corpuses we use are: the Archives of Savoie (AoS), composed of strongly structured documents, a subset of the UW3 database, and a third that is not structured at all, since it is composed of Web site images. The originality of our contribution is to sum up various methods that have been used for many years in our domain, and to test them on documents having very different specificities.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"18 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129116163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227670
Yiping Yang, Ondrej Velek, M. Nakagawa
This paper proposes a method to accelerate character recognition of a large character set by employing pivots into the search space. We divide the feature space of character categories into smaller clusters and derive the centroid of each cluster as a pivot. Given an input pattern, it is compared with all the pivots and only a limited number of clusters whose pivots have higher similarities (or smaller distances) to the input pattern are searched for with the result that we can accelerate the recognition speed. This is based on the assumption that the search space is a distance space. The method has been applied to pre-classification of a practical off-line Japanese character recognizer with the result that the pre-classification time is reduced to 61 % while keeping its pre-classification recognition rate up to 40 candidates as the same as the original 99.6% and the total recognition time is reduced to 70% of the original time without sacrificing the recognition rate at all. If we sacrifice the pre-classification rate from 99.6% to 97.7%, then its time is reduced to 35% and the total recognition time is reduced to 51.5% with recognition rate as 96.3% from 98.3%.
{"title":"Accelerating large character set recognition using pivots","authors":"Yiping Yang, Ondrej Velek, M. Nakagawa","doi":"10.1109/ICDAR.2003.1227670","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227670","url":null,"abstract":"This paper proposes a method to accelerate character recognition of a large character set by employing pivots into the search space. We divide the feature space of character categories into smaller clusters and derive the centroid of each cluster as a pivot. Given an input pattern, it is compared with all the pivots and only a limited number of clusters whose pivots have higher similarities (or smaller distances) to the input pattern are searched for with the result that we can accelerate the recognition speed. This is based on the assumption that the search space is a distance space. The method has been applied to pre-classification of a practical off-line Japanese character recognizer with the result that the pre-classification time is reduced to 61 % while keeping its pre-classification recognition rate up to 40 candidates as the same as the original 99.6% and the total recognition time is reduced to 70% of the original time without sacrificing the recognition rate at all. If we sacrifice the pre-classification rate from 99.6% to 97.7%, then its time is reduced to 35% and the total recognition time is reduced to 51.5% with recognition rate as 96.3% from 98.3%.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"538 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116336252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227695
J. Hull, B. Erol, J. Graham, Dar-Shyang Lee
The components of a key frame selection algorithm for a paper-based multimedia browsing interface called Video Paper are described. Analysis of video image frames is combined with the results of processing the closed caption to select key frames that are printed on a paper document together with the closed caption. Bar codes positioned near the key frames allow a user to play the video from the corresponding times. This paper describes several component techniques that are being investigated for key frame selection in the Video Paper system, including face detection and text recognition. The Video Paper system implementation is also discussed.
{"title":"Visualizing multimedia content on paper documents: components of key frame selection for Video Paper","authors":"J. Hull, B. Erol, J. Graham, Dar-Shyang Lee","doi":"10.1109/ICDAR.2003.1227695","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227695","url":null,"abstract":"The components of a key frame selection algorithm for a paper-based multimedia browsing interface called Video Paper are described. Analysis of video image frames is combined with the results of processing the closed caption to select key frames that are printed on a paper document together with the closed caption. Bar codes positioned near the key frames allow a user to play the video from the corresponding times. This paper describes several component techniques that are being investigated for key frame selection in the Video Paper system, including face detection and text recognition. The Video Paper system implementation is also discussed.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116852883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227724
Laurence Likforman-Sulem, Pascal Vaillant, François Yvon
In the frame of a unified messaging system, a crucial task of the system is to provide the user with key information on every message received, like keywords reflecting the object of the message, or the name of the sender. However, in the case of facsimiles, this information is not as easy to detect as in the case of e-mails, since no standard headers are defined. The aim of the presented work is to identify and extract specific information (the name of the sender) from a fax cover page. For this purpose, methods based on image document analysis (OCR recognition, physical blocks selection), and text analysis methods (optimized dictionary lookup, local grammar rules), are implemented to work in parallel. The fusion of their results brings a more accurate guess than any of the methods would achieve separately.
{"title":"Proper names extraction from fax images combining textual and image features","authors":"Laurence Likforman-Sulem, Pascal Vaillant, François Yvon","doi":"10.1109/ICDAR.2003.1227724","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227724","url":null,"abstract":"In the frame of a unified messaging system, a crucial task of the system is to provide the user with key information on every message received, like keywords reflecting the object of the message, or the name of the sender. However, in the case of facsimiles, this information is not as easy to detect as in the case of e-mails, since no standard headers are defined. The aim of the presented work is to identify and extract specific information (the name of the sender) from a fax cover page. For this purpose, methods based on image document analysis (OCR recognition, physical blocks selection), and text analysis methods (optimized dictionary lookup, local grammar rules), are implemented to work in parallel. The fusion of their results brings a more accurate guess than any of the methods would achieve separately.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117180630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227628
Michael Shilman, Zile Wei, Sashi Raghupathy, P. Simard, David Jones
This paper presents an integrated approach to parsing textual structure in freeform handwritten notes. Text-graphics classification and text layout analysis are classical problems in printed document analysis, but the irregularity in handwriting and content in freeform notes reveals limitations in existing approaches. We advocate an integrated technique that solves the layout analysis and classification problems simultaneously: the problems are so tightly coupled that it is not possible to solve one without the other for real user notes. We tune and evaluate our approach on a large corpus of unscripted user files and reflect on the difficult recognition scenarios that we have encountered in practice.
{"title":"Discerning structure from freeform handwritten notes","authors":"Michael Shilman, Zile Wei, Sashi Raghupathy, P. Simard, David Jones","doi":"10.1109/ICDAR.2003.1227628","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227628","url":null,"abstract":"This paper presents an integrated approach to parsing textual structure in freeform handwritten notes. Text-graphics classification and text layout analysis are classical problems in printed document analysis, but the irregularity in handwriting and content in freeform notes reveals limitations in existing approaches. We advocate an integrated technique that solves the layout analysis and classification problems simultaneously: the problems are so tightly coupled that it is not possible to solve one without the other for real user notes. We tune and evaluate our approach on a large corpus of unscripted user files and reflect on the difficult recognition scenarios that we have encountered in practice.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128032035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227730
A. El-Nasan, S. Veeramachaneni, G. Nagy
We propose further improvement of a handwriting recognition method that avoids segmentation while able to recognize words that were never seen before in handwritten form. This method is based on the fact that few pairs of English words share exactly the same set of letter bigrams and even fewer share longer n-grams. The lexical n-gram matches between every word in a lexicon and a set of reference words can be precomputed. A position-based match function then detects the matches between the handwritten signal of a query word and each reference word. We show that with a reasonable set of reference words, the recognition of lexicon words exceeds 90%.
{"title":"Handwriting recognition using position sensitive letter n-gram matching","authors":"A. El-Nasan, S. Veeramachaneni, G. Nagy","doi":"10.1109/ICDAR.2003.1227730","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227730","url":null,"abstract":"We propose further improvement of a handwriting recognition method that avoids segmentation while able to recognize words that were never seen before in handwritten form. This method is based on the fact that few pairs of English words share exactly the same set of letter bigrams and even fewer share longer n-grams. The lexical n-gram matches between every word in a lexicon and a set of reference words can be precomputed. A position-based match function then detects the matches between the handwritten signal of a query word and each reference word. We show that with a reasonable set of reference words, the recognition of lexicon words exceeds 90%.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124272716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}