With the increasing market of cheap cameras, natural scene text has to be handled in an efficient way. Some works deal with text detection in the image while more recent ones point out the challenge of text extraction and recognition. We propose here an OCR correction system to handle traditional issues of recognizer errors but also the ones due to natural scene images, i.e. cut characters, artistic display, incomplete sentences (present in advertisements) and out- of-vocabulary (OOV) words such as acronyms and so on. The main algorithm bases on finite-state machines (FSMs) to deal with learned OCR confusions, capital/accented letters and lexicon look-up. Moreover, as OCR is not considered as a black box, several outputs are taken into account to intermingle recognition and correction steps. Based on a public database of natural scene words, detailed results are also presented along with future works.
{"title":"A Weighted Finite-State Framework for Correcting Errors in Natural Scene OCR","authors":"Richard Beaufort, C. Mancas-Thillou","doi":"10.1109/ICDAR.2007.41","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.41","url":null,"abstract":"With the increasing market of cheap cameras, natural scene text has to be handled in an efficient way. Some works deal with text detection in the image while more recent ones point out the challenge of text extraction and recognition. We propose here an OCR correction system to handle traditional issues of recognizer errors but also the ones due to natural scene images, i.e. cut characters, artistic display, incomplete sentences (present in advertisements) and out- of-vocabulary (OOV) words such as acronyms and so on. The main algorithm bases on finite-state machines (FSMs) to deal with learned OCR confusions, capital/accented letters and lexicon look-up. Moreover, as OCR is not considered as a black box, several outputs are taken into account to intermingle recognition and correction steps. Based on a public database of natural scene words, detailed results are also presented along with future works.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126581409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dashan Gao, Yizhou Wang, Haitham A. Hindi, Minh Do
Document decomposition is a basic but crucial step for many document related applications. This paper proposes a novel approach to decompose document images into zones. It first generates overlapping zone hypotheses based on generic visual features. Then, each candidate zone is evaluated quantitatively by a learned generative zone model. We formulate the zone inference problem into a constrained optimization problem, so as to select an optimal set of non-overlapping zones that cover a given document image. The experimental results demonstrate that the proposed method is very robust to document structure variation and noise.
{"title":"Decompose Document Image Using Integer Linear Programming","authors":"Dashan Gao, Yizhou Wang, Haitham A. Hindi, Minh Do","doi":"10.1109/ICDAR.2007.97","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.97","url":null,"abstract":"Document decomposition is a basic but crucial step for many document related applications. This paper proposes a novel approach to decompose document images into zones. It first generates overlapping zone hypotheses based on generic visual features. Then, each candidate zone is evaluated quantitatively by a learned generative zone model. We formulate the zone inference problem into a constrained optimization problem, so as to select an optimal set of non-overlapping zones that cover a given document image. The experimental results demonstrate that the proposed method is very robust to document structure variation and noise.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131415135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe an unsupervised learning algorithm for extracting sparse and locally shift-invariant features. We also devise a principled procedure for learning hierarchies of invariant features. Each feature detector is composed of a set of trainable convolutional filters followed by a max-pooling layer over non-overlapping windows, and a point-wise sigmoid non-linearity. A second stage of more invariant features is fed with patches provided by the first stage feature extractor, and is trained in the same way. The method is used to pre-train the first four layers of a deep convolutional network which achieves state-of-the-art performance on the MNIST dataset of handwritten digits. The final testing error rate is equal to 0.42%. Preliminary experiments on compression of bitonal document images show very promising results in terms of compression ratio and reconstruction error.
{"title":"A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images","authors":"Marc'Aurelio Ranzato, Yann LeCun","doi":"10.1109/ICDAR.2007.35","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.35","url":null,"abstract":"We describe an unsupervised learning algorithm for extracting sparse and locally shift-invariant features. We also devise a principled procedure for learning hierarchies of invariant features. Each feature detector is composed of a set of trainable convolutional filters followed by a max-pooling layer over non-overlapping windows, and a point-wise sigmoid non-linearity. A second stage of more invariant features is fed with patches provided by the first stage feature extractor, and is trained in the same way. The method is used to pre-train the first four layers of a deep convolutional network which achieves state-of-the-art performance on the MNIST dataset of handwritten digits. The final testing error rate is equal to 0.42%. Preliminary experiments on compression of bitonal document images show very promising results in terms of compression ratio and reconstruction error.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127428249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gaofeng Meng, N. Zheng, Yuanlin Zhang, Yonghong Song
Defects inspection and correction is an important topic in the fields of scanned documents preprocessing. In this paper, a very fast and robust algorithm is proposed for locating and removing a special kind of circular noises caused by scanning documents with punched holes. Firstly, original image is reduced according to an elaborately selected ratio. Punched holes after reduction will leave some distinctive small regions. By examining such small regions, holes noises can be fast detected and located. To diminish false detections, Hough transformation is applied to the roughly located regions to further confirm the located holes. Finally, circular noise is eliminated by fitting a bi-linear blending Coons surface which interpolates along the four edges of noisy region. Experiments on a variety of scanned documents with punched holes demonstrate the feasibility and efficiency of the proposed algorithm.
{"title":"Circular Noises Removal from Scanned Document Images","authors":"Gaofeng Meng, N. Zheng, Yuanlin Zhang, Yonghong Song","doi":"10.1109/ICDAR.2007.80","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.80","url":null,"abstract":"Defects inspection and correction is an important topic in the fields of scanned documents preprocessing. In this paper, a very fast and robust algorithm is proposed for locating and removing a special kind of circular noises caused by scanning documents with punched holes. Firstly, original image is reduced according to an elaborately selected ratio. Punched holes after reduction will leave some distinctive small regions. By examining such small regions, holes noises can be fast detected and located. To diminish false detections, Hough transformation is applied to the roughly located regions to further confirm the located holes. Finally, circular noise is eliminated by fitting a bi-linear blending Coons surface which interpolates along the four edges of noisy region. Experiments on a variety of scanned documents with punched holes demonstrate the feasibility and efficiency of the proposed algorithm.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115466849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The automatic extraction of handwriting styles is an important process that can be used for various applications in the processing of handwriting. We propose a novel method that employs hierarchical clustering to explore prominent clusters of handwriting. So-called membership vectors are introduced to describe the handwriting of a writer. Each membership vector reveals the frequency of occurrence of prototypical characters in a writer's handwriting. By clustering these vectors, consistent handwriting styles can be extracted, similar to the exemplar handwritings documented in copybooks. The results presented here are challenging. The most prominent handwriting styles detected correspond to the broad style categories cursive, mixed, and print.
{"title":"Generating Copybooks from Consistent Handwriting Styles","authors":"R. Niels, L. Vuurpijl","doi":"10.1109/ICDAR.2007.123","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.123","url":null,"abstract":"The automatic extraction of handwriting styles is an important process that can be used for various applications in the processing of handwriting. We propose a novel method that employs hierarchical clustering to explore prominent clusters of handwriting. So-called membership vectors are introduced to describe the handwriting of a writer. Each membership vector reveals the frequency of occurrence of prototypical characters in a writer's handwriting. By clustering these vectors, consistent handwriting styles can be extracted, similar to the exemplar handwritings documented in copybooks. The results presented here are challenging. The most prominent handwriting styles detected correspond to the broad style categories cursive, mixed, and print.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115579193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present AgentSketch, an agent- based system for on-line recognition of hand-drawn diagrams. Agents are used for managing the activity of symbol recognizers and for providing efficient interpretations of the sketch to the user thanks to the use of contextual information for ambiguity resolution. The system can be applied to a variety of domains by providing recognizers of the symbols in that domain. A first experimental evaluation has been performed on the domain of UML use case diagrams to verify the effectiveness of the proposed approach.
{"title":"A Multi-Agent System for Hand-drawn Diagram Recognition","authors":"G. Casella, V. Deufemia, V. Mascardi","doi":"10.1109/ICDAR.2007.21","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.21","url":null,"abstract":"In this paper we present AgentSketch, an agent- based system for on-line recognition of hand-drawn diagrams. Agents are used for managing the activity of symbol recognizers and for providing efficient interpretations of the sketch to the user thanks to the use of contextual information for ambiguity resolution. The system can be applied to a variety of domains by providing recognizers of the symbols in that domain. A first experimental evaluation has been performed on the domain of UML use case diagrams to verify the effectiveness of the proposed approach.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115927093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xu-Cheng Yin, Jun Sun, S. Naoi, K. Fujimoto, Y. Fujii, Koji Kurokawa, Hiroaki Takebe
Document images captured by a mobile phone camera often have perspective distortions. Efficiency and accuracy are two important issues in designing a rectification system for such perspective documents. In this paper, we propose a new perspective rectification system based on vanishing point detection. This system achieves both the desired efficiency and accuracy using a multi-stage strategy: at the first stage, document boundaries and straight lines are used to compute vanishing points; at the second stage, text baselines and block aligns are utilized; and at the last stage, character tilt orientations are voted for the vertical vanishing point. A profit function is introduced to evaluate the reliability of detected vanishing points at each stage. If vanishing points at one stage are reliable, then rectification is ended at that stage. Otherwise, our method continues to seek more reliable vanishing points in the next stage. We have tested this method with more than 400 images including paper documents, signboards and posters. The image acceptance rate is more than 98.5% with an average speed of only about 60 ms.
{"title":"A Multi-Stage Strategy to Perspective Rectification for Mobile Phone Camera-Based Document Images","authors":"Xu-Cheng Yin, Jun Sun, S. Naoi, K. Fujimoto, Y. Fujii, Koji Kurokawa, Hiroaki Takebe","doi":"10.1109/ICDAR.2007.22","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.22","url":null,"abstract":"Document images captured by a mobile phone camera often have perspective distortions. Efficiency and accuracy are two important issues in designing a rectification system for such perspective documents. In this paper, we propose a new perspective rectification system based on vanishing point detection. This system achieves both the desired efficiency and accuracy using a multi-stage strategy: at the first stage, document boundaries and straight lines are used to compute vanishing points; at the second stage, text baselines and block aligns are utilized; and at the last stage, character tilt orientations are voted for the vertical vanishing point. A profit function is introduced to evaluate the reliability of detected vanishing points at each stage. If vanishing points at one stage are reliable, then rectification is ended at that stage. Otherwise, our method continues to seek more reliable vanishing points in the next stage. We have tested this method with more than 400 images including paper documents, signboards and posters. The image acceptance rate is more than 98.5% with an average speed of only about 60 ms.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"395 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114865217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a method for off-line mathematical formulae recognition based on the structural construction paradigm and two-dimensional grammars. In general, this approach can be successfully used in the analysis of images containing objects that exhibit rich structural relations. An important benefit of the structural construction is in treating the symbol segmentation in the image and its structural analysis as a single intertwined process. This allows the system to avoid errors usually appearing during the segmentation phase. We have developed and tested a pilot study proving that the method is computationally efficient, practical and able to cope with noise.
{"title":"Mathematical Formulae Recognition Using 2D Grammars","authors":"D. Prusa, Václav Hlaváč","doi":"10.1109/ICDAR.2007.165","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.165","url":null,"abstract":"We present a method for off-line mathematical formulae recognition based on the structural construction paradigm and two-dimensional grammars. In general, this approach can be successfully used in the analysis of images containing objects that exhibit rich structural relations. An important benefit of the structural construction is in treating the symbol segmentation in the image and its structural analysis as a single intertwined process. This allows the system to avoid errors usually appearing during the segmentation phase. We have developed and tested a pilot study proving that the method is computationally efficient, practical and able to cope with noise.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121914514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Post-processors are critical to the performance of language recognizers like OCRs, speech recognizers, etc. Dictionary-based post-processing commonly employ either an algorithmic approach or a statistical approach. Other linguistic features are not exploited for this purpose. The language analysis is also largely limited to the prose form. This paper proposes a framework to use the rich metric and formal structure of classical poetic forms in Indian languages for post-processing a recognizer like an OCR engine. We show that the structure present in the form of the vrtta and prasa can be efficiently used to disambiguate some cases that may be difficult for an OCR. The approach is efficient, and complementary to other post-processing approaches and can be used in conjunction with them.
{"title":"On Using Classical Poetry Structure for Indian Language Post-Processing","authors":"A. Namboodiri, P J Narayanan, C. V. Jawahar","doi":"10.1109/ICDAR.2007.199","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.199","url":null,"abstract":"Post-processors are critical to the performance of language recognizers like OCRs, speech recognizers, etc. Dictionary-based post-processing commonly employ either an algorithmic approach or a statistical approach. Other linguistic features are not exploited for this purpose. The language analysis is also largely limited to the prose form. This paper proposes a framework to use the rich metric and formal structure of classical poetic forms in Indian languages for post-processing a recognizer like an OCR engine. We show that the structure present in the form of the vrtta and prasa can be efficiently used to disambiguate some cases that may be difficult for an OCR. The approach is efficient, and complementary to other post-processing approaches and can be used in conjunction with them.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117207054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents the XML-based formats ALTO, TEI, METS used for digital libraries and their interest for data representation in a document image analysis and recognition (DIAR) process. In the first part we briefly present these formats with focus on their adequacy for structural representation and modeling of DIAR data. The second part shows how these formats can be used in a reverse engineering process. Their implementation as a data representation framework will be shown.
{"title":"XML Data Representation in Document Image Analysis","authors":"A. Belaïd, Ingrid Falk, Yves Rangoni","doi":"10.1109/ICDAR.2007.272","DOIUrl":"https://doi.org/10.1109/ICDAR.2007.272","url":null,"abstract":"This paper presents the XML-based formats ALTO, TEI, METS used for digital libraries and their interest for data representation in a document image analysis and recognition (DIAR) process. In the first part we briefly present these formats with focus on their adequacy for structural representation and modeling of DIAR data. The second part shows how these formats can be used in a reverse engineering process. Their implementation as a data representation framework will be shown.","PeriodicalId":279268,"journal":{"name":"Ninth International Conference on Document Analysis and Recognition (ICDAR 2007)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129507616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}