Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227763
S. Chowdhury, Sekhar Mandal, A. Das, B. Chanda
With an aim to high-level understanding of the mathematicalcontents in a document image the requirement ofmath-zone extraction and recognition technique is obvious.In this paper we present fully auotmatic segmentation ofdisplayed-math zones from the document image, using onlythe spatial layout information of math-formulas and equations,so as to help commercial OCR systems which cannotdiscern math-zones and also for the identification and arrangementof math symbols by others.
{"title":"Automated segmentation of math-zones from document images","authors":"S. Chowdhury, Sekhar Mandal, A. Das, B. Chanda","doi":"10.1109/ICDAR.2003.1227763","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227763","url":null,"abstract":"With an aim to high-level understanding of the mathematicalcontents in a document image the requirement ofmath-zone extraction and recognition technique is obvious.In this paper we present fully auotmatic segmentation ofdisplayed-math zones from the document image, using onlythe spatial layout information of math-formulas and equations,so as to help commercial OCR systems which cannotdiscern math-zones and also for the identification and arrangementof math symbols by others.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131304825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227801
P. Simard, David Steinkraus, John C. Platt
Neural networks are a powerful technology forclassification of visual inputs arising from documents.However, there is a confusing plethora of different neuralnetwork methods that are used in the literature and inindustry. This paper describes a set of concrete bestpractices that document analysis researchers can use toget good results with neural networks. The mostimportant practice is getting a training set as large aspossible: we expand the training set by adding a newform of distorted data. The next most important practiceis that convolutional neural networks are better suited forvisual document tasks than fully connected networks. Wepropose that a simple "do-it-yourself" implementation ofconvolution with a flexible architecture is suitable formany visual document problems. This simpleconvolutional neural network does not require complexmethods, such as momentum, weight decay, structure-dependentlearning rates, averaging layers, tangent prop,or even finely-tuning the architecture. The end result is avery simple yet general architecture which can yieldstate-of-the-art performance for document analysis. Weillustrate our claims on the MNIST set of English digitimages.
{"title":"Best practices for convolutional neural networks applied to visual document analysis","authors":"P. Simard, David Steinkraus, John C. Platt","doi":"10.1109/ICDAR.2003.1227801","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227801","url":null,"abstract":"Neural networks are a powerful technology forclassification of visual inputs arising from documents.However, there is a confusing plethora of different neuralnetwork methods that are used in the literature and inindustry. This paper describes a set of concrete bestpractices that document analysis researchers can use toget good results with neural networks. The mostimportant practice is getting a training set as large aspossible: we expand the training set by adding a newform of distorted data. The next most important practiceis that convolutional neural networks are better suited forvisual document tasks than fully connected networks. Wepropose that a simple \"do-it-yourself\" implementation ofconvolution with a flexible architecture is suitable formany visual document problems. This simpleconvolutional neural network does not require complexmethods, such as momentum, weight decay, structure-dependentlearning rates, averaging layers, tangent prop,or even finely-tuning the architecture. The end result is avery simple yet general architecture which can yieldstate-of-the-art performance for document analysis. Weillustrate our claims on the MNIST set of English digitimages.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"4 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131505750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227746
M. Morita, R. Sabourin, F. Bortolozzi, C. Y. Suen
In this paper a methodology for feature selection in unsupervisedlearning is proposed. It makes use of a multi-objectivegenetic algorithm where the minimization of thenumber of features and a validity index that measures thequality of clusters have been used to guide the search towardsthe more discriminant features and the best numberof clusters. The proposed strategy is evaluated usingtwo synthetic data sets and then it is applied to handwrittenmonth word recognition. Comprehensive experimentsdemonstrate the feasibility and efficiency of the proposedmethodology.
{"title":"Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition","authors":"M. Morita, R. Sabourin, F. Bortolozzi, C. Y. Suen","doi":"10.1109/ICDAR.2003.1227746","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227746","url":null,"abstract":"In this paper a methodology for feature selection in unsupervisedlearning is proposed. It makes use of a multi-objectivegenetic algorithm where the minimization of thenumber of features and a validity index that measures thequality of clusters have been used to guide the search towardsthe more discriminant features and the best numberof clusters. The proposed strategy is evaluated usingtwo synthetic data sets and then it is applied to handwrittenmonth word recognition. Comprehensive experimentsdemonstrate the feasibility and efficiency of the proposedmethodology.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130088080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227805
E. Tapia, R. Rojas
In this article, we present a system for the recognition ofon-line handwritten mathematical formulas which is usedin the electronic chalkboard (E-chalk), a multimedia systemfor distance-teaching. We discuss the classification of symbolsand the construction of the tree of spatial relationshipsamong them. The classification is based on support vectormachines and the construction of formulas is based onbaseline structure analysis.
{"title":"Recognition of on-line handwritten mathematical formulas in the E-chalk system","authors":"E. Tapia, R. Rojas","doi":"10.1109/ICDAR.2003.1227805","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227805","url":null,"abstract":"In this article, we present a system for the recognition ofon-line handwritten mathematical formulas which is usedin the electronic chalkboard (E-chalk), a multimedia systemfor distance-teaching. We discuss the classification of symbolsand the construction of the tree of spatial relationshipsamong them. The classification is based on support vectormachines and the construction of formulas is based onbaseline structure analysis.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116525650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227823
Dihua Xi, Seong-Whan Lee
Form document analysis is one of the most essential tasksin document analysis and recognition. One of the most fundamentaland crucial tasks is the extraction of the referencelines which are contained in almost all form documents.This paper presents an efficient methodology for the complicatedgrey-level form image processing. We construct anon-orthogonal wavelet with adjustable rectangle supportsand offer algorithms for the extraction of the reference linesbased on the strip growth method using the multiresolutionwavelet sub images. We have compared this system with thepopular Hough transform (HT) based and the novel orthogonalwavelet based methods. As shown in the experiments,the proposed algorithmdemonstrates high performance andfast speed for the complicated form images. This system isalso effective for the form images with slight skew.
{"title":"Reference line extraction from form documents with complicated backgrounds","authors":"Dihua Xi, Seong-Whan Lee","doi":"10.1109/ICDAR.2003.1227823","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227823","url":null,"abstract":"Form document analysis is one of the most essential tasksin document analysis and recognition. One of the most fundamentaland crucial tasks is the extraction of the referencelines which are contained in almost all form documents.This paper presents an efficient methodology for the complicatedgrey-level form image processing. We construct anon-orthogonal wavelet with adjustable rectangle supportsand offer algorithms for the extraction of the reference linesbased on the strip growth method using the multiresolutionwavelet sub images. We have compared this system with thepopular Hough transform (HT) based and the novel orthogonalwavelet based methods. As shown in the experiments,the proposed algorithmdemonstrates high performance andfast speed for the complicated form images. This system isalso effective for the form images with slight skew.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127173645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227790
N. Mezghani, M. Cheriet, A. Mitiche
The purpose of this study is to investigate a methodfor high performance on-line Arabic characters recognition.This method is based on the use of Kohonen mapsand their corresponding confusion matrices which serve toprune them of error-causing nodes, and to combine themconsequently. We use two Kohonen maps obtained usingtwo distinct character representations, namely, Fourier descriptorsand tangents extracted along the characters on-linesignals. The two Kohonen maps are then combined usinga majority vote decision rule that, for each character,favors the most reliable map. This combination, withoutadding in any significant way to the process complexity, affordsa much better recognition rate.
{"title":"Combination of pruned Kohonen maps for on-line arabic characters recognition","authors":"N. Mezghani, M. Cheriet, A. Mitiche","doi":"10.1109/ICDAR.2003.1227790","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227790","url":null,"abstract":"The purpose of this study is to investigate a methodfor high performance on-line Arabic characters recognition.This method is based on the use of Kohonen mapsand their corresponding confusion matrices which serve toprune them of error-causing nodes, and to combine themconsequently. We use two Kohonen maps obtained usingtwo distinct character representations, namely, Fourier descriptorsand tangents extracted along the characters on-linesignals. The two Kohonen maps are then combined usinga majority vote decision rule that, for each character,favors the most reliable map. This combination, withoutadding in any significant way to the process complexity, affordsa much better recognition rate.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127763630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227828
M. Fairhurst
Document security is an increasingly importantelement in the multi-faceted discipline ofdocument processing, and authentication ofindividual identity will play an increasinglyimportant future role in relation to questions ofdocument ownership, identity andconfidentiality. Biometrics-based techniques areemerging as key elements in the drive to addresssecurity and confidentiality in an effective way,yet past experience suggests that there are manypractical issues yet to be resolved if biometrictechnologies are to fulfill their potential in thedocument processing field. This paper addressessome aspects of biometric processing which arebecoming increasing priorities, and suggestshow a greater engagement of the documentprocessing community can help to bring aboutrefinements to existing approaches to biometricidentity checking.
{"title":"Document identity, authentication and ownership: the future of biometric verification","authors":"M. Fairhurst","doi":"10.1109/ICDAR.2003.1227828","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227828","url":null,"abstract":"Document security is an increasingly importantelement in the multi-faceted discipline ofdocument processing, and authentication ofindividual identity will play an increasinglyimportant future role in relation to questions ofdocument ownership, identity andconfidentiality. Biometrics-based techniques areemerging as key elements in the drive to addresssecurity and confidentiality in an effective way,yet past experience suggests that there are manypractical issues yet to be resolved if biometrictechnologies are to fulfill their potential in thedocument processing field. This paper addressessome aspects of biometric processing which arebecoming increasing priorities, and suggestshow a greater engagement of the documentprocessing community can help to bring aboutrefinements to existing approaches to biometricidentity checking.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133550053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227735
D. Doermann, Jian Liang, Huiping Li
The increasing availability of high performance, low priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or as standalone still or video devices are highly mobile and easy to use; they can capture images of any kind of document including very thick books, historical pages too fragile to touch, and text in scenes; and they are much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there is clearly a demand from many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges and solutions for recognizing documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.
{"title":"Progress in camera-based document image analysis","authors":"D. Doermann, Jian Liang, Huiping Li","doi":"10.1109/ICDAR.2003.1227735","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227735","url":null,"abstract":"The increasing availability of high performance, low priced, portable digital imaging devices has created a tremendous opportunity for supplementing traditional scanning for document image acquisition. Digital cameras attached to cellular phones, PDAs, or as standalone still or video devices are highly mobile and easy to use; they can capture images of any kind of document including very thick books, historical pages too fragile to touch, and text in scenes; and they are much more versatile than desktop scanners. Should robust solutions to the analysis of documents captured with such devices become available, there is clearly a demand from many domains. Traditional scanner-based document analysis techniques provide us with a good reference and starting point, but they cannot be used directly on camera-captured images. Camera captured images can suffer from low resolution, blur, and perspective distortion, as well as complex layout and interaction of the content and background. In this paper we present a survey of application domains, technical challenges and solutions for recognizing documents captured by digital cameras. We begin by describing typical imaging devices and the imaging process. We discuss document analysis from a single camera-captured image as well as multiple frames and highlight some sample applications under development and feasible ideas for future development.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131056141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227764
Hongwei Hao, Cheng-Lin Liu, H. Sako
For combining classifiers at measurement level, thediverse outputs of classifiers should be transformed touniform measures that represent the confidence ofdecision, hopefully, the class probability or likelihood.This paper presents our experimental results of classifiercombination using confidence evaluation. We test threetypes of confidences: log-likelihood, exponential andsigmoid. For re-scaling the classifier outputs, we usethree scaling functions based on global normalizationand Gaussian density estimation. Experimental results inhandwritten digit recognition show that via confidenceevaluation, superior classification performance can beobtained using simple combination rules.
{"title":"Confidence evaluation for combining diverse classifiers","authors":"Hongwei Hao, Cheng-Lin Liu, H. Sako","doi":"10.1109/ICDAR.2003.1227764","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227764","url":null,"abstract":"For combining classifiers at measurement level, thediverse outputs of classifiers should be transformed touniform measures that represent the confidence ofdecision, hopefully, the class probability or likelihood.This paper presents our experimental results of classifiercombination using confidence evaluation. We test threetypes of confidences: log-likelihood, exponential andsigmoid. For re-scaling the classifier outputs, we usethree scaling functions based on global normalizationand Gaussian density estimation. Experimental results inhandwritten digit recognition show that via confidenceevaluation, superior classification performance can beobtained using simple combination rules.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131160839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-08-03DOI: 10.1109/ICDAR.2003.1227759
Qian Wang, Tao Xia, C. Tan, Lida Li
In this paper, we propose a directional wavelet approachto remove images of interfering strokes coming from theback of a historical handwritten document due to seepingof ink during long period of storage. Our previous workrequired mapping of both sides of the document in orderto identify the interfering strokes to be eliminated. Perfectmapping, however, is difficult due to document skews,differing resolutions, non-availability of the reverseside and warped pages during scanning. The newapproach does not require double-sided mapping butinstead uses a directional wavelet transformto distinguish the foreground and reverse side strokes.Experiments have shown that the directional waveletoperation effectively removes the interfering strokes.
{"title":"Directional wavelet approach to remove document image interference","authors":"Qian Wang, Tao Xia, C. Tan, Lida Li","doi":"10.1109/ICDAR.2003.1227759","DOIUrl":"https://doi.org/10.1109/ICDAR.2003.1227759","url":null,"abstract":"In this paper, we propose a directional wavelet approachto remove images of interfering strokes coming from theback of a historical handwritten document due to seepingof ink during long period of storage. Our previous workrequired mapping of both sides of the document in orderto identify the interfering strokes to be eliminated. Perfectmapping, however, is difficult due to document skews,differing resolutions, non-availability of the reverseside and warped pages during scanning. The newapproach does not require double-sided mapping butinstead uses a directional wavelet transformto distinguish the foreground and reverse side strokes.Experiments have shown that the directional waveletoperation effectively removes the interfering strokes.","PeriodicalId":249193,"journal":{"name":"Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings.","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132558670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}