Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227735
D. Doermann, Jian Liang, Huiping Li
The increasing availability of high performance, low priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or as standalone still or video devices are highly mobile and easy to use; they can capture images of any kind of document including very thick books, historical pages too fragile to touch, and text in scenes; and they are much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there is clearly a demand from many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges and solutions for recognizing documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.
{"title":"Progress in camera-based document image analysis","authors":"D. Doermann, Jian Liang, Huiping Li","doi":"10.1109/ICDAR.2003.1227735","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227735","url":null,"abstract":"The increasing availability of high performance, low priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or as standalone still or video devices are highly mobile and easy to use; they can capture images of any kind of document including very thick books, historical pages too fragile to touch, and text in scenes; and they are much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there is clearly a demand from many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges and solutions for recognizing documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131056141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227764
Hongwei Hao, Cheng-Lin Liu, H. Sako
For combining classifiers at measurement level, thediverse outputs of classifiers should be transformed touniform measures that represent the confidence ofdecision, hopefully, the class probability or likelihood.This paper presents our experimental results of classifiercombination using confidence evaluation. We test threetypes of confidences: log-likelihood, exponential andsigmoid. For re-scaling the classifier outputs, we usethree scaling functions based on global normalizationand Gaussian density estimation. Experimental results inhandwritten digit recognition show that via confidenceevaluation, superior classification performance can beobtained using simple combination rules.
{"title":"Confidence evaluation for combining diverse classifiers","authors":"Hongwei Hao, Cheng-Lin Liu, H. Sako","doi":"10.1109/ICDAR.2003.1227764","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227764","url":null,"abstract":"For combining classifiers at measurement level, thediverse outputs of classifiers should be transformed touniform measures that represent the confidence ofdecision, hopefully, the class probability or likelihood.This paper presents our experimental results of classifiercombination using confidence evaluation. We test threetypes of confidences: log-likelihood, exponential andsigmoid. For re-scaling the classifier outputs, we usethree scaling functions based on global normalizationand Gaussian density estimation. Experimental results inhandwritten digit recognition show that via confidenceevaluation, superior classification performance can beobtained using simple combination rules.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131160839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227679
Molly L. Boose, D. B. Shema, Lawrence S. Baum
This paper discusses a scalable solution for integrating legacy illustrated parts drawings into a Class IV Interactive Electronic Technical Manual (IETM) (1995). An IETM is an interactive electronic version of a system's technical manuals such as for a commercial airplane or a military helicopter. It contains the information a technician needs to do her job including troubleshooting, vehicle maintenance and repair procedures. A Class IV IETM is an IETM that is authored and managed directly via a database. The end-user system optimizes viewing and navigation, minimizing the need for users to browse and search through large volumes of data. The Boeing Company has hundreds of thousands of illustrated parts drawings for both commercial and military vehicles. As Boeing migrates to Class IV IETM systems, it is necessary to incorporate existing illustrated parts drawings into the new systems. Manually re-authoring the drawings to bring them up to the level of a Class IV IETM is prohibitively expensive. Our solution is to provide a batch-processing system that performs the required modifications to the raster images and automatically updates the IETM database.
{"title":"A scalable solution for integrating illustrated parts drawings into a Class IV Interactive Electronic Technical Manual","authors":"Molly L. Boose, D. B. Shema, Lawrence S. Baum","doi":"10.1109/ICDAR.2003.1227679","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227679","url":null,"abstract":"This paper discusses a scalable solution for integrating legacy illustrated parts drawings into a Class IV Interactive Electronic Technical Manual (IETM) (1995). An IETM is an interactive electronic version of a system's technical manuals such as for a commercial airplane or a military helicopter. It contains the information a technician needs to do her job including troubleshooting, vehicle maintenance and repair procedures. A Class IV IETM is an IETM that is authored and managed directly via a database. The end-user system optimizes viewing and navigation, minimizing the need for users to browse and search through large volumes of data. The Boeing Company has hundreds of thousands of illustrated parts drawings for both commercial and military vehicles. As Boeing migrates to Class IV IETM systems, it is necessary to incorporate existing illustrated parts drawings into the new systems. Manually re-authoring the drawings to bring them up to the level of a Class IV IETM is prohibitively expensive. Our solution is to provide a batch-processing system that performs the required modifications to the raster images and automatically updates the IETM database.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129323292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227855
Sait Ulas Korkmaz, G. Kirçiçegi, Y. Akinci, V. Atalay
This paper presents particularly a contextual postprocessing subsystem for a Turkish machine printedcharacter recognition system. The contextual postprocessing subsystem is based on positional binary 3-gram statistics for Turkish language, an error correctorparser and a lexicon, which contains root words and theinflected forms of the root words. Error corrector parseris used for correcting CR alternatives using TurkishMorphology.
{"title":"A character recognizer for Turkish language","authors":"Sait Ulas Korkmaz, G. Kirçiçegi, Y. Akinci, V. Atalay","doi":"10.1109/ICDAR.2003.1227855","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227855","url":null,"abstract":"This paper presents particularly a contextual postprocessing subsystem for a Turkish machine printedcharacter recognition system. The contextual postprocessing subsystem is based on positional binary 3-gram statistics for Turkish language, an error correctorparser and a lexicon, which contains root words and theinflected forms of the root words. Error corrector parseris used for correcting CR alternatives using TurkishMorphology.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128204457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227823
Dihua Xi, Seong-Whan Lee
Form document analysis is one of the most essential tasksin document analysis and recognition. One of the most fundamentaland crucial tasks is the extraction of the referencelines which are contained in almost all form documents.This paper presents an efficient methodology for the complicatedgrey-level form image processing. We construct anon-orthogonal wavelet with adjustable rectangle supportsand offer algorithms for the extraction of the reference linesbased on the strip growth method using the multiresolutionwavelet sub images. We have compared this system with thepopular Hough transform (HT) based and the novel orthogonalwavelet based methods. As shown in the experiments,the proposed algorithmdemonstrates high performance andfast speed for the complicated form images. This system isalso effective for the form images with slight skew.
{"title":"Reference line extraction from form documents with complicated backgrounds","authors":"Dihua Xi, Seong-Whan Lee","doi":"10.1109/ICDAR.2003.1227823","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227823","url":null,"abstract":"Form document analysis is one of the most essential tasksin document analysis and recognition. One of the most fundamentaland crucial tasks is the extraction of the referencelines which are contained in almost all form documents.This paper presents an efficient methodology for the complicatedgrey-level form image processing. We construct anon-orthogonal wavelet with adjustable rectangle supportsand offer algorithms for the extraction of the reference linesbased on the strip growth method using the multiresolutionwavelet sub images. We have compared this system with thepopular Hough transform (HT) based and the novel orthogonalwavelet based methods. As shown in the experiments,the proposed algorithmdemonstrates high performance andfast speed for the complicated form images. This system isalso effective for the form images with slight skew.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127173645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227675
Yi Li, Zhiyan Wang, Haizan Zeng
A novel technique is presented in this paper to extract strings in color images of both business settlement plan (BSP) and non-BSP airline coupon. The essential concept is to remove non-text pixels from complex coupon images, rather than extract strings directly. First we transfer color images from RGB to HSV space, which is approximate uniformed, and then remove the black component of images using the property of HSV space. A statistical approach called principal components analysis (PCA) is applied to extract strings by removing the background decorative pattern based on priori environment. Finally, a method to validate and improve performance is present.
{"title":"String extraction from color airline coupon image using statistical approach","authors":"Yi Li, Zhiyan Wang, Haizan Zeng","doi":"10.1109/ICDAR.2003.1227675","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227675","url":null,"abstract":"A novel technique is presented in this paper to extract strings in color images of both business settlement plan (BSP) and non-BSP airline coupon. The essential concept is to remove non-text pixels from complex coupon images, rather than extract strings directly. First we transfer color images from RGB to HSV space, which is approximate uniformed, and then remove the black component of images using the property of HSV space. A statistical approach called principal components analysis (PCA) is applied to extract strings by removing the background decorative pattern based on priori environment. Finally, a method to validate and improve performance is present.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114296911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227734
Yefeng Zheng, Huiping Li, D. Doermann
In this paper we address the problem of the identification of text from noisy documents. We segment and identify handwriting from machine printed text because 1) handwriting in a document often indicates corrections, additions or other supplemental information that should be treated differently from the main body or body content, and 2) the segmentation and recognition techniques for machine printed text and handwriting are significantly different. Our novelty is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise. We further exploit context to refine the classification. A Markov random field (MRF) based approach is used to model the geometrical structure of the printed text, handwriting and noise to rectify the mis-classification. Experimental results show our approach is promising and robust, and can significantly improve the page segmentation results in noise documents.
{"title":"Text identification in noisy document images using Markov random model","authors":"Yefeng Zheng, Huiping Li, D. Doermann","doi":"10.1109/ICDAR.2003.1227734","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227734","url":null,"abstract":"In this paper we address the problem of the identification of text from noisy documents. We segment and identify handwriting from machine printed text because 1) handwriting in a document often indicates corrections, additions or other supplemental information that should be treated differently from the main body or body content, and 2) the segmentation and recognition techniques for machine printed text and handwriting are significantly different. Our novelty is that we treat noise as a separate class and model noise based on selected features. Trained Fisher classifiers are used to identify machine printed text and handwriting from noise. We further exploit context to refine the classification. A Markov random field (MRF) based approach is used to model the geometrical structure of the printed text, handwriting and noise to rectify the mis-classification. Experimental results show our approach is promising and robust, and can significantly improve the page segmentation results in noise documents.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121871123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227696
R. Kasturi
Vehicle text marks are unique features which are useful for identifying vehicles in video surveillance applications. We propose a method for finding such text marks. An existing text detection algorithm is modified such that detection is increased and made more robust to outdoor conditions. False alarm is reduced by introducing a binary image test which remove detections that are not likely to be caused by text. The method is tested on a captured video of a typical street scene.
{"title":"Detection of text marks on moving vehicles","authors":"R. Kasturi","doi":"10.1109/ICDAR.2003.1227696","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227696","url":null,"abstract":"Vehicle text marks are unique features which are useful for identifying vehicles in video surveillance applications. We propose a method for finding such text marks. An existing text detection algorithm is modified such that detection is increased and made more robust to outdoor conditions. False alarm is reduced by introducing a binary image test which remove detections that are not likely to be caused by text. The method is tested on a captured video of a typical street scene.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"250 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121880873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227826
S. Srihari, C. Tomai, Bin Zhang, Sangjik Lee
The analysis of handwritten documents from the view-pointof determining their writership has great bearing onthe criminal justice system. In many cases, only a limitedamount of handwriting is available and sometimes it consistsof only numerals. Using a large number of handwrittennumeral images extracted from about 3000 samples writtenby 1000 writers, a study of the individuality of numerals foridentification/verification purposes was conducted. The individualityof numerals was studied using cluster analysis.Numerals discriminability was measured for writer verification.The study shows that some numerals present a higherdiscriminatory power and that their performances for theverification/identification tasks are very different.
{"title":"Individuality of numerals","authors":"S. Srihari, C. Tomai, Bin Zhang, Sangjik Lee","doi":"10.1109/ICDAR.2003.1227826","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227826","url":null,"abstract":"The analysis of handwritten documents from the view-pointof determining their writership has great bearing onthe criminal justice system. In many cases, only a limitedamount of handwriting is available and sometimes it consistsof only numerals. Using a large number of handwrittennumeral images extracted from about 3000 samples writtenby 1000 writers, a study of the individuality of numerals foridentification/verification purposes was conducted. The individualityof numerals was studied using cluster analysis.Numerals discriminability was measured for writer verification.The study shows that some numerals present a higherdiscriminatory power and that their performances for theverification/identification tasks are very different.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129982815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227746
M. Morita, R. Sabourin, F. Bortolozzi, C. Y. Suen
In this paper a methodology for feature selection in unsupervisedlearning is proposed. It makes use of a multi-objectivegenetic algorithm where the minimization of thenumber of features and a validity index that measures thequality of clusters have been used to guide the search towardsthe more discriminant features and the best numberof clusters. The proposed strategy is evaluated usingtwo synthetic data sets and then it is applied to handwrittenmonth word recognition. Comprehensive experimentsdemonstrate the feasibility and efficiency of the proposedmethodology.
{"title":"Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition","authors":"M. Morita, R. Sabourin, F. Bortolozzi, C. Y. Suen","doi":"10.1109/ICDAR.2003.1227746","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227746","url":null,"abstract":"In this paper a methodology for feature selection in unsupervisedlearning is proposed. It makes use of a multi-objectivegenetic algorithm where the minimization of thenumber of features and a validity index that measures thequality of clusters have been used to guide the search towardsthe more discriminant features and the best numberof clusters. The proposed strategy is evaluated usingtwo synthetic data sets and then it is applied to handwrittenmonth word recognition. Comprehensive experimentsdemonstrate the feasibility and efficiency of the proposedmethodology.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130088080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}