Text extraction in video documents, as an important research field of content-based information indexing and retrieval, has been developing rapidly since 1990s. This has led to much progress in text extraction, performance evaluation, and related applications. By reviewing the approaches proposed during the past five years, this paper introduces the progress made in this area and discusses promising directions for future research.
{"title":"Extraction of Text Objects in Video Documents: Recent Progress","authors":"Jing Zhang, R. Kasturi","doi":"10.1109/DAS.2008.49","DOIUrl":"https://doi.org/10.1109/DAS.2008.49","url":null,"abstract":"Text extraction in video documents, as an important research field of content-based information indexing and retrieval, has been developing rapidly since 1990s. This has led to much progress in text extraction, performance evaluation, and related applications. By reviewing the approaches proposed during the past five years, this paper introduces the progress made in this area and discusses promising directions for future research.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116697354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image based mail piece identification is a new technology for optimizing the postal sorting process. Exploiting the mail piece surface uniqueness, characteristic features are used to identify each mail piece within a large amount of mail pieces. This process facilitates the storage of mail piece relevant data and its assignment in different sorting steps without manipulating the mail piece surface. Mail piece transportation and the mechanical mail piece sorting process may cause address and label movements, image rotations and further application-specific modifications. In contrast to other document identification systems image based mail piece identification requires high robustness towards above mentioned surface modifications for stable identification. This paper introduces four identification models for these special scenarios. Based on the proposed models text based feature extraction methods, feature representations and in particular appropriate distance metrics are presented which guarantee a robust mail piece identification. The applicability of the proposed procedure is shown in different experiments.
{"title":"Handling of Surface Modifications for Robust Image Based Mail Piece Comparison","authors":"K. Worm, B. Meffert","doi":"10.1109/DAS.2008.39","DOIUrl":"https://doi.org/10.1109/DAS.2008.39","url":null,"abstract":"Image based mail piece identification is a new technology for optimizing the postal sorting process. Exploiting the mail piece surface uniqueness, characteristic features are used to identify each mail piece within a large amount of mail pieces. This process facilitates the storage of mail piece relevant data and its assignment in different sorting steps without manipulating the mail piece surface. Mail piece transportation and the mechanical mail piece sorting process may cause address and label movements, image rotations and further application-specific modifications. In contrast to other document identification systems image based mail piece identification requires high robustness towards above mentioned surface modifications for stable identification. This paper introduces four identification models for these special scenarios. Based on the proposed models text based feature extraction methods, feature representations and in particular appropriate distance metrics are presented which guarantee a robust mail piece identification. The applicability of the proposed procedure is shown in different experiments.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124799620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a new adaptive approach for the binarization and enhancement of historical and degraded documents. The proposed method is based on (i) efficient pre-processing; (ii) the combination of the results of several state-of-the-art binarization methodologies; (iii) the incorporation of edge information and (iv) the application of efficient image post-processing based on mathematical morphology for the enhancement of the final result. The proposed method demonstrated superior performance against six well-known techniques on numerous historical handwritten and machine-printed documents mainly from the Library of Congress of the United States archive. The performance evaluation was based on a consistent and concrete methodology.
{"title":"Efficient Binarization of Historical and Degraded Document Images","authors":"B. Gatos, I. Pratikakis, S. Perantonis","doi":"10.1109/DAS.2008.66","DOIUrl":"https://doi.org/10.1109/DAS.2008.66","url":null,"abstract":"This paper presents a new adaptive approach for the binarization and enhancement of historical and degraded documents. The proposed method is based on (i) efficient pre-processing; (ii) the combination of the results of several state-of-the-art binarization methodologies; (iii) the incorporation of edge information and (iv) the application of efficient image post-processing based on mathematical morphology for the enhancement of the final result. The proposed method demonstrated superior performance against six well-known techniques on numerous historical handwritten and machine-printed documents mainly from the Library of Congress of the United States archive. The performance evaluation was based on a consistent and concrete methodology.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123222059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Book documents usually have consistent typographies throughout the whole book, including headers, footers, columns, text line directions, and fonts used in the each level of headings. Such document-level typography information is of great value for downstream document processing applications. This paper presents a document analysis system that can extract a comprehensive set of typographies used in book documents. The system consists of several components: recognition of fonts used in the body text and chapter headings; detection of page body area, headers and footers; detection of columns, text line direction and line spacing of body text. Page-association is employed in the system. The preliminary experimental results demonstrate the effectiveness of the system.
{"title":"Comprehensive Global Typography Extraction System for Electronic Book Documents","authors":"Liangcai Gao, Zhi Tang, Xiaofan Lin, Ruiheng Qiu","doi":"10.1109/DAS.2008.30","DOIUrl":"https://doi.org/10.1109/DAS.2008.30","url":null,"abstract":"Book documents usually have consistent typographies throughout the whole book, including headers, footers, columns, text line directions, and fonts used in the each level of headings. Such document-level typography information is of great value for downstream document processing applications. This paper presents a document analysis system that can extract a comprehensive set of typographies used in book documents. The system consists of several components: recognition of fonts used in the body text and chapter headings; detection of page body area, headers and footers; detection of columns, text line direction and line spacing of body text. Page-association is employed in the system. The preliminary experimental results demonstrate the effectiveness of the system.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114908046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We discuss problems in developing policies for ground truthing document images for pixel-accurate segmentation. First, we describe ground truthing policies that apply to four different scales: (1) paragraph, (2) text line, (3) character, and (4) pixel. We then analyze difficult and/or ambiguous cases that will challenge any policy, e.g. blank space, overlapping content, etc. Experiments have shown the benefit of using "tighter'' zones that capture more detail (e.g., at the text line level, instead of paragraph). We show that tighter ground truth does significantly improve classification results, by 45% in recent experiments. It is important to face the fact that a pixel-accurate segmentation can be better than manually obtained ground truth. In practice, perfectly accurate pixel-level ground truth may not be achievable of course, but we believe it is important to explore methods to semi-automatically improve existing ground truth.
{"title":"Truthing for Pixel-Accurate Segmentation","authors":"Michael A. Moll, H. Baird, Chang An","doi":"10.1109/DAS.2008.47","DOIUrl":"https://doi.org/10.1109/DAS.2008.47","url":null,"abstract":"We discuss problems in developing policies for ground truthing document images for pixel-accurate segmentation. First, we describe ground truthing policies that apply to four different scales: (1) paragraph, (2) text line, (3) character, and (4) pixel. We then analyze difficult and/or ambiguous cases that will challenge any policy, e.g. blank space, overlapping content, etc. Experiments have shown the benefit of using \"tighter'' zones that capture more detail (e.g., at the text line level, instead of paragraph). We show that tighter ground truth does significantly improve classification results, by 45% in recent experiments. It is important to face the fact that a pixel-accurate segmentation can be better than manually obtained ground truth. In practice, perfectly accurate pixel-level ground truth may not be achievable of course, but we believe it is important to explore methods to semi-automatically improve existing ground truth.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116403012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There is an approach of annotation extraction from printed documents in which annotations are extracted by comparing the image of an annotated document and its original document image. In one of the previous methods, the image of an original document is actually printed and scanned in order to reproduce image degradations of the image of the annotated document. However such a method lacks convenience since users have to use the same printer and scanner to obtain images of an annotated document and its original document. In this paper, we propose an improved annotation extraction method in which the image degradations are compensated by image processing. In the proposed method, the difference between original and annotated document images due to image degradations is reduced by not only removal of the degradations in the annotated document images but also reproduction of the degradations in the original document images. The proposed method consists of three steps of processing which are for dithering, for color change, and for local displacement. We also propose an objective evaluation of extracted annotations to compare the experimental results accurately. Experimental results of the proposed method have shown that the recall of extracted annotations was 80.94% and the precision was 85.59%.
{"title":"Accuracy Improvement and Objective Evaluation of Annotation Extraction from Printed Documents","authors":"T. Nakai, K. Iwata, K. Kise","doi":"10.1109/DAS.2008.80","DOIUrl":"https://doi.org/10.1109/DAS.2008.80","url":null,"abstract":"There is an approach of annotation extraction from printed documents in which annotations are extracted by comparing the image of an annotated document and its original document image. In one of the previous methods, the image of an original document is actually printed and scanned in order to reproduce image degradations of the image of the annotated document. However such a method lacks convenience since users have to use the same printer and scanner to obtain images of an annotated document and its original document. In this paper, we propose an improved annotation extraction method in which the image degradations are compensated by image processing. In the proposed method, the difference between original and annotated document images due to image degradations is reduced by not only removal of the degradations in the annotated document images but also reproduction of the degradations in the original document images. The proposed method consists of three steps of processing which are for dithering, for color change, and for local displacement. We also propose an objective evaluation of extracted annotations to compare the experimental results accurately. Experimental results of the proposed method have shown that the recall of extracted annotations was 80.94% and the precision was 85.59%.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116406313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Both graphic text and scene text detection in video images with complex background and low resolution is still a challenging and interesting problem for researchers in the field of image processing and computer vision. In this paper, we present a novel technique for detecting both graphic text and scene text in video images by finding segments containing text in an input image and then using statistical features such as vertical and horizontal bars for edges in the segments for detecting true text blocks efficiently. To identify a segment containing text, heuristic rules are formed based on combination of filters and edge analysis. Furthermore, the same rules are extended to grow the boundaries of a candidate segment in order to include complete text in the input image. The experimental results of the proposed method show that the technique performs better than existing methods in terms of a number of metrics.
{"title":"An Efficient Edge Based Technique for Text Detection in Video Frames","authors":"P. Shivakumara, Weihua Huang, C. Tan","doi":"10.1109/DAS.2008.17","DOIUrl":"https://doi.org/10.1109/DAS.2008.17","url":null,"abstract":"Both graphic text and scene text detection in video images with complex background and low resolution is still a challenging and interesting problem for researchers in the field of image processing and computer vision. In this paper, we present a novel technique for detecting both graphic text and scene text in video images by finding segments containing text in an input image and then using statistical features such as vertical and horizontal bars for edges in the segments for detecting true text blocks efficiently. To identify a segment containing text, heuristic rules are formed based on combination of filters and edge analysis. Furthermore, the same rules are extended to grow the boundaries of a candidate segment in order to include complete text in the input image. The experimental results of the proposed method show that the technique performs better than existing methods in terms of a number of metrics.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128374012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Vamvakas, B. Gatos, N. Stamatopoulos, S. Perantonis
In this paper a complete OCR methodology for recognizing historical documents, either printed or handwritten without any knowledge of the font, is presented. This methodology consists of three steps: The first two steps refer to creating a database for training using a set of documents, while the third one refers to recognition of new document images. First, a pre-processing step that includes image binarization and enhancement takes place. At a second step a top-down segmentation approach is used in order to detect text lines, words and characters. A clustering scheme is then adopted in order to group characters of similar shape. This is a semi-automatic procedure since the user is able to interact at any time in order to correct possible errors of clustering and assign an ASCII label. After this step, a database is created in order to be used for recognition. Finally, in the third step, for every new document image the above segmentation approach takes place while the recognition is based on the character database that has been produced at the previous step.
{"title":"A Complete Optical Character Recognition Methodology for Historical Documents","authors":"G. Vamvakas, B. Gatos, N. Stamatopoulos, S. Perantonis","doi":"10.1109/DAS.2008.73","DOIUrl":"https://doi.org/10.1109/DAS.2008.73","url":null,"abstract":"In this paper a complete OCR methodology for recognizing historical documents, either printed or handwritten without any knowledge of the font, is presented. This methodology consists of three steps: The first two steps refer to creating a database for training using a set of documents, while the third one refers to recognition of new document images. First, a pre-processing step that includes image binarization and enhancement takes place. At a second step a top-down segmentation approach is used in order to detect text lines, words and characters. A clustering scheme is then adopted in order to group characters of similar shape. This is a semi-automatic procedure since the user is able to interact at any time in order to correct possible errors of clustering and assign an ASCII label. After this step, a database is created in order to be used for recognition. Finally, in the third step, for every new document image the above segmentation approach takes place while the recognition is based on the character database that has been produced at the previous step.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130401513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The use of gradients in text images is nowadays quite frequent. Existing segmentation methods encounter serious problems when it comes to modern text images where gradients might appear in the background or the foreground or both at the same time. This paper presents an approach for lightness gradient areas detection based on the Hough Transform. The issues arising are discussed, and results are presented on a dataset comprising Web images, logos and scanned documents.
{"title":"Detecting Gradients in Text Images Using the Hough Transform","authors":"Dimosthenis Karatzas","doi":"10.1109/DAS.2008.55","DOIUrl":"https://doi.org/10.1109/DAS.2008.55","url":null,"abstract":"The use of gradients in text images is nowadays quite frequent. Existing segmentation methods encounter serious problems when it comes to modern text images where gradients might appear in the background or the foreground or both at the same time. This paper presents an approach for lightness gradient areas detection based on the Hough Transform. The issues arising are discussed, and results are presented on a dataset comprising Web images, logos and scanned documents.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121517825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an edge-directed super-resolution algorithm for document images without using any training set. This technique creates an image with smooth regions in both the foreground and the background, while allowing sharp discontinuities across and smoothness along the edges. Our method preserves sharp corners in text images by using the local edge direction, which is computed first by evaluating the gradient field and then taking its tangent. Super-resolution of document images is characterized by bimodality, smoothness along the edges as well as subsampling consistency. These characteristics are enforced in a Markov random field (MRF) framework by defining an appropriate energy function. In our method, subsampling of super-resolution image will return the original low-resolution one, proving the correctness of the method. The super-resolution image, is generated by iteratively reducing this energy function. Experimental results on a variety of input images, demonstrate the effectiveness of our method for document image super-resolution.
{"title":"Super-Resolution of Text Images Using Edge-Directed Tangent Field","authors":"Jyotirmoy Banerjee, C. V. Jawahar","doi":"10.1109/DAS.2008.26","DOIUrl":"https://doi.org/10.1109/DAS.2008.26","url":null,"abstract":"This paper presents an edge-directed super-resolution algorithm for document images without using any training set. This technique creates an image with smooth regions in both the foreground and the background, while allowing sharp discontinuities across and smoothness along the edges. Our method preserves sharp corners in text images by using the local edge direction, which is computed first by evaluating the gradient field and then taking its tangent. Super-resolution of document images is characterized by bimodality, smoothness along the edges as well as subsampling consistency. These characteristics are enforced in a Markov random field (MRF) framework by defining an appropriate energy function. In our method, subsampling of super-resolution image will return the original low-resolution one, proving the correctness of the method. The super-resolution image, is generated by iteratively reducing this energy function. Experimental results on a variety of input images, demonstrate the effectiveness of our method for document image super-resolution.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"155 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133909660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}