This paper proposes a hybrid system for text detection in video frames. The system consists of two main stages. In the first stage text regions are detected based on the edge map of the image leading in a high recall rate with minimum computation requirements. In the sequel, a refinement stage uses an SVM classifier trained on features obtained by a new local binary pattern based operator which results in diminishing false alarms. Experimental results show the overall performance of the system that proves the discriminating ability of the proposed feature set.
{"title":"A Hybrid System for Text Detection in Video Frames","authors":"M. Anthimopoulos, B. Gatos, I. Pratikakis","doi":"10.1109/DAS.2008.72","DOIUrl":"https://doi.org/10.1109/DAS.2008.72","url":null,"abstract":"This paper proposes a hybrid system for text detection in video frames. The system consists of two main stages. In the first stage text regions are detected based on the edge map of the image leading in a high recall rate with minimum computation requirements. In the sequel, a refinement stage uses an SVM classifier trained on features obtained by a new local binary pattern based operator which results in diminishing false alarms. Experimental results show the overall performance of the system that proves the discriminating ability of the proposed feature set.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131192132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Character recognition in complex real scene images is a very challenging undertaking. The most popular approach is to segment the text area using some extra pre-knowledge, such as "characters are in a signboard'', etc. This approach makes it possible to construct a very time-consuming method, but generality is still a problem. In this paper, we propose a more general method by utilizing only character features. Our algorithm consists of five steps: pre-processing to extract connected components, initial classification using primitive rules, strong classification using AdaBoost, Markov random field (MRF) clustering to combine connected components with similar properties, and post-processing using optical character recognition (OCR) results. The results of experiments using 11 images containing 1691 characters (including characters in bad condition) indicated the effectiveness of the proposed system, namely, that 52.9% of characters were extracted correctly with 625 noise components extracted as characters.
{"title":"Kanji Character Detection from Complex Real Scene Images based on Character Properties","authors":"Lianli Xu, H. Nagayoshi, H. Sako","doi":"10.1109/DAS.2008.34","DOIUrl":"https://doi.org/10.1109/DAS.2008.34","url":null,"abstract":"Character recognition in complex real scene images is a very challenging undertaking. The most popular approach is to segment the text area using some extra pre-knowledge, such as \"characters are in a signboard'', etc. This approach makes it possible to construct a very time-consuming method, but generality is still a problem. In this paper, we propose a more general method by utilizing only character features. Our algorithm consists of five steps: pre-processing to extract connected components, initial classification using primitive rules, strong classification using AdaBoost, Markov random field (MRF) clustering to combine connected components with similar properties, and post-processing using optical character recognition (OCR) results. The results of experiments using 11 images containing 1691 characters (including characters in bad condition) indicated the effectiveness of the proposed system, namely, that 52.9% of characters were extracted correctly with 625 noise components extracted as characters.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134533071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Krstovski, Ehry MacRostie, R. Prasad, P. Natarajan
In this paper we present an end-to-end trainable optical character recognition (OCR) system for recognizing machine-printed text in Thai documents. The end-to-end OCR system is based on a script-independent methodology using hidden Markov models. Our system provides an integrated workflow beginning with annotation and transcription of training images to performing OCR on new images with models trained on transcribed training images. The efficacy of our end-to-end OCR system is demonstrated by rapidly configuring our OCR engine for the Thai script. We present experimental results on Thai documents to highlight the specific challenges posed by the Thai script and analyze the recognition performance as a function of amount of training data.
{"title":"End-to-End Trainable Thai OCR System Using Hidden Markov Models","authors":"K. Krstovski, Ehry MacRostie, R. Prasad, P. Natarajan","doi":"10.1109/DAS.2008.76","DOIUrl":"https://doi.org/10.1109/DAS.2008.76","url":null,"abstract":"In this paper we present an end-to-end trainable optical character recognition (OCR) system for recognizing machine-printed text in Thai documents. The end-to-end OCR system is based on a script-independent methodology using hidden Markov models. Our system provides an integrated workflow beginning with annotation and transcription of training images to performing OCR on new images with models trained on transcribed training images. The efficacy of our end-to-end OCR system is demonstrated by rapidly configuring our OCR engine for the Thai script. We present experimental results on Thai documents to highlight the specific challenges posed by the Thai script and analyze the recognition performance as a function of amount of training data.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132395641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present a robust system to accurately detect and localize texts in natural scene images. For text detection, a region-based method utilizing multiple features and cascade AdaBoost classifier is adopted. For text localization, a window grouping method integrating text line competition analysis is used to generate text lines. Then within each text line, local binarization is used to extract candidate connected components (CCs) and non-text CCs are filtered out by Markov Random Fields (MRF) model, through which text line can be localized accurately. Experiments on the public benchmark ICDAR 2003 Robust Reading and Text Locating Dataset show that our system is comparable to the best existing methods both in accuracy and speed.
{"title":"A Robust System to Detect and Localize Texts in Natural Scene Images","authors":"Yi-Feng Pan, Xinwen Hou, Cheng-Lin Liu","doi":"10.1109/DAS.2008.42","DOIUrl":"https://doi.org/10.1109/DAS.2008.42","url":null,"abstract":"In this paper, we present a robust system to accurately detect and localize texts in natural scene images. For text detection, a region-based method utilizing multiple features and cascade AdaBoost classifier is adopted. For text localization, a window grouping method integrating text line competition analysis is used to generate text lines. Then within each text line, local binarization is used to extract candidate connected components (CCs) and non-text CCs are filtered out by Markov Random Fields (MRF) model, through which text line can be localized accurately. Experiments on the public benchmark ICDAR 2003 Robust Reading and Text Locating Dataset show that our system is comparable to the best existing methods both in accuracy and speed.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122160738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe experimental results for unsupervised recognition of the textual contents of book-images using fully automatic mutual-entropy-based model adaptation. Each experiment starts with approximate iconic and linguistic models---derived from (generally errorful) OCR results and (generally incomplete) dictionaries---and then runs a fully automatic adaptation algorithm which, guided entirely by evidence internal to the test set, attempts to correct the models for improved accuracy. The iconic model describes image formation and determines the behavior of a character-image classifier. The linguistic model describes word-occurrence probabilities. Our adaptation algorithm detects disagreements between the models by analyzing mutual entropy between (1) the a posteriori probability distribution of character classes (the recognition results from image classification alone), and (2) the a posteriori probability distribution of word classes (the recognition results from image classification combined with linguistic constraints). Disagreements identify candidates for automatic model corrections. We report experiments on 40 textlines in which word error rates fall monotonicaly with passage lengths. We also report experiments on an enhanced algorithm which can cope with character-segmentation errors (a single split, or a single merge, per word). In order to scale up experiments, soon, to whole book images, we have revised data structures and implemented speed enhancements. For this algorithm, we report results on three increasingly long passage lengths: (a) one full page, (b) five pages, and (b) ten pages. We observe that error rates on long words fall monotonically with passage lengths.
{"title":"Towards Whole-Book Recognition","authors":"Pingping Xiu, H. Baird","doi":"10.1109/DAS.2008.50","DOIUrl":"https://doi.org/10.1109/DAS.2008.50","url":null,"abstract":"We describe experimental results for unsupervised recognition of the textual contents of book-images using fully automatic mutual-entropy-based model adaptation. Each experiment starts with approximate iconic and linguistic models---derived from (generally errorful) OCR results and (generally incomplete) dictionaries---and then runs a fully automatic adaptation algorithm which, guided entirely by evidence internal to the test set, attempts to correct the models for improved accuracy. The iconic model describes image formation and determines the behavior of a character-image classifier. The linguistic model describes word-occurrence probabilities. Our adaptation algorithm detects disagreements between the models by analyzing mutual entropy between (1) the a posteriori probability distribution of character classes (the recognition results from image classification alone), and (2) the a posteriori probability distribution of word classes (the recognition results from image classification combined with linguistic constraints). Disagreements identify candidates for automatic model corrections. We report experiments on 40 textlines in which word error rates fall monotonicaly with passage lengths. We also report experiments on an enhanced algorithm which can cope with character-segmentation errors (a single split, or a single merge, per word). In order to scale up experiments, soon, to whole book images, we have revised data structures and implemented speed enhancements. For this algorithm, we report results on three increasingly long passage lengths: (a) one full page, (b) five pages, and (b) ten pages. We observe that error rates on long words fall monotonically with passage lengths.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122355633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, an object extraction method from ancient colour maps is proposed. It consists on the localization of quarters inside a given cadastral map. The colour aspect is exploited thanks to a colour restoration algorithm and the selection of a relevant hybrid colour model. Objects composing the map are located using a multi-components gradient. To identify quarters, a peeling the onion method is adopted. This selective method starts by separated text and graphics. On the graphic layer, a connected component analysis is carried out through the use of a neighbourhood graph. This graph is smartly pruned to consider only significant areas. Consequently, the quarter boundaries are found using a snake which is a computer-generated curve that moves within an image to fit a given object. The performance of our method is measured up in two steps: Firstly, the colour space selection is assessed according to the colour distinction capacity while being robust to variations/noise then the automatic extraction approach is compared to the user ground truth. Results show the good behaviour of the whole system.
{"title":"Object Extraction from Colour Cadastral Maps","authors":"R. Raveaux, J. Burie, J. Ogier","doi":"10.1109/DAS.2008.9","DOIUrl":"https://doi.org/10.1109/DAS.2008.9","url":null,"abstract":"In this paper, an object extraction method from ancient colour maps is proposed. It consists on the localization of quarters inside a given cadastral map. The colour aspect is exploited thanks to a colour restoration algorithm and the selection of a relevant hybrid colour model. Objects composing the map are located using a multi-components gradient. To identify quarters, a peeling the onion method is adopted. This selective method starts by separated text and graphics. On the graphic layer, a connected component analysis is carried out through the use of a neighbourhood graph. This graph is smartly pruned to consider only significant areas. Consequently, the quarter boundaries are found using a snake which is a computer-generated curve that moves within an image to fit a given object. The performance of our method is measured up in two steps: Firstly, the colour space selection is assessed according to the colour distinction capacity while being robust to variations/noise then the automatic extraction approach is compared to the user ground truth. Results show the good behaviour of the whole system.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125156422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a writer-dependent handwriting recognition system based on hidden Markov models (HMMs). This system, which has been developed in the context of research on smart meeting rooms, operates in two stages. First, a Gaussian mixture model (GMM)-based writer identification system developed for smart meeting rooms identifies the person writing on the whiteboard. Then a recognition system adapted to the individual writer is applied. Two different methods for obtaining writer-dependent recognizers are proposed. The first method uses the available writer-specific data to train an individual recognition system for each writer from scratch, while the second method takes a writer-independent recognizer and adapts it with the data from the considered writer. The experiments have been performed on the IAM-OnDB. In the first stage,the writer identification system produces a perfect identification rate. In the second stage, the writer-specific recognition system gets significantly better recognition results, compared to the writer-independent recognizer. The final word recognition rate on the IAM-OnDB-t1 benchmark task is close to 80 %.
{"title":"Writer-Dependent Recognition of Handwritten Whiteboard Notes in Smart Meeting Room Environments","authors":"M. Liwicki, A. Schlapbach, H. Bunke","doi":"10.1109/DAS.2008.8","DOIUrl":"https://doi.org/10.1109/DAS.2008.8","url":null,"abstract":"In this paper we present a writer-dependent handwriting recognition system based on hidden Markov models (HMMs). This system, which has been developed in the context of research on smart meeting rooms, operates in two stages. First, a Gaussian mixture model (GMM)-based writer identification system developed for smart meeting rooms identifies the person writing on the whiteboard. Then a recognition system adapted to the individual writer is applied. Two different methods for obtaining writer-dependent recognizers are proposed. The first method uses the available writer-specific data to train an individual recognition system for each writer from scratch, while the second method takes a writer-independent recognizer and adapts it with the data from the considered writer. The experiments have been performed on the IAM-OnDB. In the first stage,the writer identification system produces a perfect identification rate. In the second stage, the writer-specific recognition system gets significantly better recognition results, compared to the writer-independent recognizer. The final word recognition rate on the IAM-OnDB-t1 benchmark task is close to 80 %.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124225029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Expanding on an earlier study to objectively validate the hypothesis that handwriting is individualistic, we extend the study to include handwriting in the Arabic script. Handwriting samples from twelve native speakers of Arabic were obtained. Analyzing differences in handwriting was done by using computer algorithms for extracting features from scanned images of handwriting. Attributes characteristic of the handwriting were obtained, e.g., line separation, slant, character shapes, etc. These attributes, which are a subset of attributes used by forensic document examiners (FDEs), were used to quantitatively establish individuality by using machine learning approaches. Using global attributes of handwriting, the ability to determine the writer with a high degree of confidence was established. The work is a step towards providing scientific support for admitting handwriting evidence in court.
{"title":"Writer Verification of Arabic Handwriting","authors":"S. Srihari, G. R. Ball","doi":"10.1109/DAS.2008.81","DOIUrl":"https://doi.org/10.1109/DAS.2008.81","url":null,"abstract":"Expanding on an earlier study to objectively validate the hypothesis that handwriting is individualistic, we extend the study to include handwriting in the Arabic script. Handwriting samples from twelve native speakers of Arabic were obtained. Analyzing differences in handwriting was done by using computer algorithms for extracting features from scanned images of handwriting. Attributes characteristic of the handwriting were obtained, e.g., line separation, slant, character shapes, etc. These attributes, which are a subset of attributes used by forensic document examiners (FDEs), were used to quantitatively establish individuality by using machine learning approaches. Using global attributes of handwriting, the ability to determine the writer with a high degree of confidence was established. The work is a step towards providing scientific support for admitting handwriting evidence in court.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123648246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Most optical character recognition (OCR) systems need to be trained and tested on the symbols that are to be recognized. Therefore, ground truth data is needed. This data consists of character images together with their ASCII code. Among the approaches for generating ground truth of real world data, one promising technique is to use electronic version of the scanned documents. Using an alignment method, the character bounding boxes extracted from the electronic document are matched to the scanned image. Current alignment methods are not robust to different similarity transforms. They also need calibration to deal with non-linear local distortions introduced by the printing/scanning process. In this paper we present a significant improvement over existing methods, allowing to skip the calibration step and having a more accurate alignment, under all similarity transforms. Our method finds a robust and pixel accurate scanner independent alignment of the scanned image with the electronic document, allowing the extraction of accurate ground truth character information. The accuracy of the alignment is demonstrated using documents from the UW3 dataset. The results show that the mean distance between the estimated and the ground truth character bounding box position is less than one pixel.
{"title":"Automated OCR Ground Truth Generation","authors":"J. V. Beusekom, F. Shafait, T. Breuel","doi":"10.1109/DAS.2008.59","DOIUrl":"https://doi.org/10.1109/DAS.2008.59","url":null,"abstract":"Most optical character recognition (OCR) systems need to be trained and tested on the symbols that are to be recognized. Therefore, ground truth data is needed. This data consists of character images together with their ASCII code. Among the approaches for generating ground truth of real world data, one promising technique is to use electronic version of the scanned documents. Using an alignment method, the character bounding boxes extracted from the electronic document are matched to the scanned image. Current alignment methods are not robust to different similarity transforms. They also need calibration to deal with non-linear local distortions introduced by the printing/scanning process. In this paper we present a significant improvement over existing methods, allowing to skip the calibration step and having a more accurate alignment, under all similarity transforms. Our method finds a robust and pixel accurate scanner independent alignment of the scanned image with the electronic document, allowing the extraction of accurate ground truth character information. The accuracy of the alignment is demonstrated using documents from the UW3 dataset. The results show that the mean distance between the estimated and the ground truth character bounding box position is less than one pixel.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"237 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123649377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patent document images maintained by the U.S. patent database have a specific format, in which figures and text descriptions are separated into different sections. This makes it difficult for users to refer to a figure while reading the description or vice versa. The system introduced in this paper is to prepare these patent images for a friendly user browsing interface. The system is able to extract captions and labels from figures. After obtaining captions and labels, figures and the relevant descriptions are linked together. Hence, users are able to easily find the relevant figure by clicking captions or labels in the description, or vice versa.
{"title":"A Graphics Image Processing System","authors":"Linlin Li, C. Tan","doi":"10.1109/DAS.2008.84","DOIUrl":"https://doi.org/10.1109/DAS.2008.84","url":null,"abstract":"Patent document images maintained by the U.S. patent database have a specific format, in which figures and text descriptions are separated into different sections. This makes it difficult for users to refer to a figure while reading the description or vice versa. The system introduced in this paper is to prepare these patent images for a friendly user browsing interface. The system is able to extract captions and labels from figures. After obtaining captions and labels, figures and the relevant descriptions are linked together. Hence, users are able to easily find the relevant figure by clicking captions or labels in the description, or vice versa.","PeriodicalId":423207,"journal":{"name":"2008 The Eighth IAPR International Workshop on Document Analysis Systems","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114583608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}