Pub Date : 2019-09-01DOI: 10.1109/ICDAR.2019.00124
V. Venugopal, S. Sundaram
This paper focuses on a method to ascertain the identity of an online handwritten document. The proposed methodology makes use of a set of descriptors that are derived from features obtained in a probabilistic sense. In this regard, we employ a GMM-based feature representation where in each point-based feature vector in the online trace is represented by a vector. Each element of the aforementioned vector quantify the membership to a particular Gaussian in the GMM. A differing aspect is in the proposal of a weighting scheme that measures the influence of each Gaussian of a writer in the probabilistic space. For deriving these weights, we rely on the information obtained from a histogram, by formulating a function of the sum-pooled posterior probabilities obtained across all the enrolled documents in the database. The identification is performed by an ensemble of SVMs where each SVM is modelled for a given writer. The experiments are performed on the publicly available IAM Online handwriting database and the results are competitive with respect to prior works in literature.
{"title":"Online Writer Identification using GMM Based Feature Representation and Writer-Specific Weights","authors":"V. Venugopal, S. Sundaram","doi":"10.1109/ICDAR.2019.00124","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00124","url":null,"abstract":"This paper focuses on a method to ascertain the identity of an online handwritten document. The proposed methodology makes use of a set of descriptors that are derived from features obtained in a probabilistic sense. In this regard, we employ a GMM-based feature representation where in each point-based feature vector in the online trace is represented by a vector. Each element of the aforementioned vector quantify the membership to a particular Gaussian in the GMM. A differing aspect is in the proposal of a weighting scheme that measures the influence of each Gaussian of a writer in the probabilistic space. For deriving these weights, we rely on the information obtained from a histogram, by formulating a function of the sum-pooled posterior probabilities obtained across all the enrolled documents in the database. The identification is performed by an ensemble of SVMs where each SVM is modelled for a given writer. The experiments are performed on the publicly available IAM Online handwriting database and the results are competitive with respect to prior works in literature.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121315456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICDAR.2019.00037
Junyang Cai, Liangrui Peng, Yejun Tang, Changsong Liu, Pengchao Li
Historical Chinese character recognition faces problems including low image quality and lack of labeled training samples. We propose a generative adversarial network (GAN) based transfer learning method to ease these problems. The proposed TH-GAN architecture includes a discriminator and a generator. The network structure of the discriminator is based on a convolutional neural network (CNN). Inspired by Wasserstein GAN, the loss function of the discriminator aims to measure the probabilistic distribution distance of the generated images and the target images. The network structure of the generator is a CNN based encoder-decoder. The loss function of the generator aims to minimize the distribution distance between the real samples and the generated samples. In order to preserve the complex glyph structure of a historical Chinese character, a weighted mean squared error (MSE) criterion by incorporating both the edge and the skeleton information in the ground truth image is proposed as the weighted pixel loss in the generator. These loss functions are used for joint training of the discriminator and the generator. Experiments are conducted on two tasks to evaluate the performance of the proposed TH-GAN. The first task is carried out on style transfer mapping for multi-font printed traditional Chinese character samples. The second task is carried out on transfer learning for historical Chinese character samples by adding samples generated by TH-GAN. Experimental results show that the proposed TH-GAN is effective.
{"title":"TH-GAN: Generative Adversarial Network Based Transfer Learning for Historical Chinese Character Recognition","authors":"Junyang Cai, Liangrui Peng, Yejun Tang, Changsong Liu, Pengchao Li","doi":"10.1109/ICDAR.2019.00037","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00037","url":null,"abstract":"Historical Chinese character recognition faces problems including low image quality and lack of labeled training samples. We propose a generative adversarial network (GAN) based transfer learning method to ease these problems. The proposed TH-GAN architecture includes a discriminator and a generator. The network structure of the discriminator is based on a convolutional neural network (CNN). Inspired by Wasserstein GAN, the loss function of the discriminator aims to measure the probabilistic distribution distance of the generated images and the target images. The network structure of the generator is a CNN based encoder-decoder. The loss function of the generator aims to minimize the distribution distance between the real samples and the generated samples. In order to preserve the complex glyph structure of a historical Chinese character, a weighted mean squared error (MSE) criterion by incorporating both the edge and the skeleton information in the ground truth image is proposed as the weighted pixel loss in the generator. These loss functions are used for joint training of the discriminator and the generator. Experiments are conducted on two tasks to evaluate the performance of the proposed TH-GAN. The first task is carried out on style transfer mapping for multi-font printed traditional Chinese character samples. The second task is carried out on transfer learning for historical Chinese character samples by adding samples generated by TH-GAN. Experimental results show that the proposed TH-GAN is effective.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121331539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICDAR.2019.00162
Prateek Keserwani, K. De, P. Roy, U. Pal
The text recognition system for natural images or video frames containing multilingual text needs a method to first identify the written script and then recognize the word in the identified script. However, the occurrence of some scripts is rare as compared to others. Due to the availability of a few samples of the rare script, the supervised learning of the deep neural networks is difficult. To overcome this problem, we have proposed a zero-shot learning based method for script identification. We have also proposed architecture for script identification which fuses the global feature vector and the semantic embedding vector. The semantic embedding of the script is obtained by using the spatial dependency of the stroke's sequence via the recurrent neural network. The proposed architecture shows superior results as compared to the baseline approaches.
{"title":"Zero Shot Learning Based Script Identification in the Wild","authors":"Prateek Keserwani, K. De, P. Roy, U. Pal","doi":"10.1109/ICDAR.2019.00162","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00162","url":null,"abstract":"The text recognition system for natural images or video frames containing multilingual text needs a method to first identify the written script and then recognize the word in the identified script. However, the occurrence of some scripts is rare as compared to others. Due to the availability of a few samples of the rare script, the supervised learning of the deep neural networks is difficult. To overcome this problem, we have proposed a zero-shot learning based method for script identification. We have also proposed architecture for script identification which fuses the global feature vector and the semantic embedding vector. The semantic embedding of the script is obtained by using the spatial dependency of the stroke's sequence via the recurrent neural network. The proposed architecture shows superior results as compared to the baseline approaches.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128883314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICDAR.2019.00136
Aven Le Zhou, Qiu-Feng Wang, Kaizhu Huang, C. Lo
Chinese Shanshui is a landscape painting document mainly drawing mountain and water, which is popular in Chinese culture. However, it is very challenging to create this by general people. In this paper, we propose an interactive and generative approach to automatically generate the Chinese Shanshui painting documents based on users' input, where the users only need to sketch simple lines to represent their ideal landscape without any professional Shanshui painting skills. This sketch-to-Shanshui translation is optimized by the model of cycle Generative Adversarial Networks (GAN). To evaluate the proposed approach, we collected a large set of both sketch data and Chinese Shanshui painting data to train the model of cycle-GAN, and developed an interactive system called Shanshui-DaDA (i.e., Design and Draw with AI) to generate Chinese Shanshui painting documents in real-time. The experimental results show that this system can generate satisfied Chinese Shanshui painting documents by general users.
{"title":"An Interactive and Generative Approach for Chinese Shanshui Painting Document","authors":"Aven Le Zhou, Qiu-Feng Wang, Kaizhu Huang, C. Lo","doi":"10.1109/ICDAR.2019.00136","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00136","url":null,"abstract":"Chinese Shanshui is a landscape painting document mainly drawing mountain and water, which is popular in Chinese culture. However, it is very challenging to create this by general people. In this paper, we propose an interactive and generative approach to automatically generate the Chinese Shanshui painting documents based on users' input, where the users only need to sketch simple lines to represent their ideal landscape without any professional Shanshui painting skills. This sketch-to-Shanshui translation is optimized by the model of cycle Generative Adversarial Networks (GAN). To evaluate the proposed approach, we collected a large set of both sketch data and Chinese Shanshui painting data to train the model of cycle-GAN, and developed an interactive system called Shanshui-DaDA (i.e., Design and Draw with AI) to generate Chinese Shanshui painting documents in real-time. The experimental results show that this system can generate satisfied Chinese Shanshui painting documents by general users.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131636914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICDAR.2019.00042
Qingqing Wang, W. Jia, Xiangjian He, Yue Lu, M. Blumenstein, Ye Huang, Shujing Lyu
In this paper, we address the issue of scene text detection in the way of direct regression and successfully adapt an effective semantic segmentation model, DeepLab v3+ [1], for this application. In order to handle texts with arbitrary orientations and sizes and improve the recall of small texts, we propose to extract features of multiple scales by inserting multiple Atrous Spatial Pyramid Pooling (ASPP) layers to the DeepLab after the feature maps with different resolutions. Then, we set multiple auxiliary IoU losses at the decoding stage and make auxiliary connections from the intermediate encoding layers to the decoder to assist network training and enhance the discrimination ability of lower encoding layers. Experiments conducted on the benchmark scene text dataset ICDAR2015 demonstrate the superior performance of our proposed network, named as DeepText, over the state-of-the-art approaches.
{"title":"DeepText: Detecting Text from the Wild with Multi-ASPP-Assembled DeepLab","authors":"Qingqing Wang, W. Jia, Xiangjian He, Yue Lu, M. Blumenstein, Ye Huang, Shujing Lyu","doi":"10.1109/ICDAR.2019.00042","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00042","url":null,"abstract":"In this paper, we address the issue of scene text detection in the way of direct regression and successfully adapt an effective semantic segmentation model, DeepLab v3+ [1], for this application. In order to handle texts with arbitrary orientations and sizes and improve the recall of small texts, we propose to extract features of multiple scales by inserting multiple Atrous Spatial Pyramid Pooling (ASPP) layers to the DeepLab after the feature maps with different resolutions. Then, we set multiple auxiliary IoU losses at the decoding stage and make auxiliary connections from the intermediate encoding layers to the decoder to assist network training and enhance the discrimination ability of lower encoding layers. Experiments conducted on the benchmark scene text dataset ICDAR2015 demonstrate the superior performance of our proposed network, named as DeepText, over the state-of-the-art approaches.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131079124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICDAR.2019.00203
Kenny Davila, B. Kota, S. Setlur, V. Govindaraju, Chris Tensmeyer, Sumit Shekhar, Ritwick Chaudhry
This work summarizes the results of the first Competition on Harvesting Raw Tables from Infographics (ICDAR 2019 CHART-Infographics). The complex process of automatic chart recognition is divided into multiple tasks for the purpose of this competition, including Chart Image Classification (Task 1), Text Detection and Recognition (Task 2), Text Role Classification (Task 3), Axis Analysis (Task 4), Legend Analysis (Task 5), Plot Element Detection and Classification (Task 6.a), Data Extraction (Task 6.b), and End-to-End Data Extraction (Task 7). We provided a large synthetic training set and evaluated submitted systems using newly proposed metrics on both synthetic charts and manually-annotated real charts taken from scientific literature. A total of 8 groups registered for the competition out of which 5 submitted results for tasks 1-5. The results show that some tasks can be performed highly accurately on synthetic data, but all systems did not perform as well on real world charts. The data, annotation tools, and evaluation scripts have been publicly released for academic use.
{"title":"ICDAR 2019 Competition on Harvesting Raw Tables from Infographics (CHART-Infographics)","authors":"Kenny Davila, B. Kota, S. Setlur, V. Govindaraju, Chris Tensmeyer, Sumit Shekhar, Ritwick Chaudhry","doi":"10.1109/ICDAR.2019.00203","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00203","url":null,"abstract":"This work summarizes the results of the first Competition on Harvesting Raw Tables from Infographics (ICDAR 2019 CHART-Infographics). The complex process of automatic chart recognition is divided into multiple tasks for the purpose of this competition, including Chart Image Classification (Task 1), Text Detection and Recognition (Task 2), Text Role Classification (Task 3), Axis Analysis (Task 4), Legend Analysis (Task 5), Plot Element Detection and Classification (Task 6.a), Data Extraction (Task 6.b), and End-to-End Data Extraction (Task 7). We provided a large synthetic training set and evaluated submitted systems using newly proposed metrics on both synthetic charts and manually-annotated real charts taken from scientific literature. A total of 8 groups registered for the competition out of which 5 submitted results for tasks 1-5. The results show that some tasks can be performed highly accurately on synthetic data, but all systems did not perform as well on real world charts. The data, annotation tools, and evaluation scripts have been publicly released for academic use.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134010434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICDAR.2019.00079
Laiphangbam Melinda, C. Bhagvati
In this paper, we propose two parameter-free table detection methods: one for the closed tables and other for open tables. The unifying idea is multigaussian analysis. Multigaussian analysis of text height histograms classifies the document content into text and non-text blocks. Closed tables are classified as non-text and their identification from the non-text blocks is similar to many earlier methods that remove the separators. We do not need any parameters to identify rows and columns and discriminate them from text blocks because of multigaussian analysis. Open tables are initially classified as text blocks and are detected by extending the multigaussian analysis to the heights and widths of text blocks. The text-blocks are grouped into three categories by multigaussian analysis. These groups are used to classify table cells and distinguish them from text blocks. Table blocks are merged to obtain the table region. Evaluation on various Indic script newspapers and ICDAR2013 table competition dataset shows that our methods achieve more than 90% in table recognition. The strength of our algorithm is that it is a parameter-free approach and requires no training dataset.
{"title":"Parameter-Free Table Detection Method","authors":"Laiphangbam Melinda, C. Bhagvati","doi":"10.1109/ICDAR.2019.00079","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00079","url":null,"abstract":"In this paper, we propose two parameter-free table detection methods: one for the closed tables and other for open tables. The unifying idea is multigaussian analysis. Multigaussian analysis of text height histograms classifies the document content into text and non-text blocks. Closed tables are classified as non-text and their identification from the non-text blocks is similar to many earlier methods that remove the separators. We do not need any parameters to identify rows and columns and discriminate them from text blocks because of multigaussian analysis. Open tables are initially classified as text blocks and are detected by extending the multigaussian analysis to the heights and widths of text blocks. The text-blocks are grouped into three categories by multigaussian analysis. These groups are used to classify table cells and distinguish them from text blocks. Table blocks are merged to obtain the table region. Evaluation on various Indic script newspapers and ICDAR2013 table competition dataset shows that our methods achieve more than 90% in table recognition. The strength of our algorithm is that it is a parameter-free approach and requires no training dataset.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"17 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134105087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICDAR.2019.00227
M. Asim, Muhammad Usman Ghani Khan, M. I. Malik, K. Razzaque, A. Dengel, Sheraz Ahmed
This paper presents a novel two-stream approach for document image classification. The proposed approach leverages textual and visual modalities to classify document images into ten categories, including letter, memo, news article, etc. In order to alleviate dependency of textual stream on performance of underlying OCR (which is the case with general content based document image classifiers), we utilize a filter based feature-ranking algorithm. This algorithm ranks the features of each class based on their ability to discriminate document images and selects a set of top 'K' features that are retained for further processing. In parallel, the visual stream uses deep CNN models to extract structural features of document images.Finally, textual and visual streams are concatenated together using an average ensembling method. Experimental results reveal that the proposed approach outperforms the state-of-the-art system with a significant margin of 4.5% on publicly available Tobacco-3482 dataset.
{"title":"Two Stream Deep Network for Document Image Classification","authors":"M. Asim, Muhammad Usman Ghani Khan, M. I. Malik, K. Razzaque, A. Dengel, Sheraz Ahmed","doi":"10.1109/ICDAR.2019.00227","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00227","url":null,"abstract":"This paper presents a novel two-stream approach for document image classification. The proposed approach leverages textual and visual modalities to classify document images into ten categories, including letter, memo, news article, etc. In order to alleviate dependency of textual stream on performance of underlying OCR (which is the case with general content based document image classifiers), we utilize a filter based feature-ranking algorithm. This algorithm ranks the features of each class based on their ability to discriminate document images and selects a set of top 'K' features that are retained for further processing. In parallel, the visual stream uses deep CNN models to extract structural features of document images.Finally, textual and visual streams are concatenated together using an average ensembling method. Experimental results reveal that the proposed approach outperforms the state-of-the-art system with a significant margin of 4.5% on publicly available Tobacco-3482 dataset.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134063184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICDAR.2019.00165
Andrew Naoum, J. Nothman, J. Curran
Document analysis and recognition is increasingly used to digitise collections of historical books, newspapers and other periodicals. In the digital humanities, it is often the goal to apply information retrieval (IR) and natural language processing (NLP) techniques to help researchers analyse and navigate these digitised archives. The lack of article segmentation is impairing many IR and NLP systems, which assume text is split into ordered, error-free documents. We define a document analysis and image processing task for segmenting digitised newspapers into articles and other content, e.g. adverts, and we automatically create a dataset of 11602 articles. Using this dataset, we develop and evaluate an innovative 2D Markov model that encodes reading order and substantially outperforms the current state-of-the-art, reaching similar accuracy to human annotators.
{"title":"Article Segmentation in Digitised Newspapers with a 2D Markov Model","authors":"Andrew Naoum, J. Nothman, J. Curran","doi":"10.1109/ICDAR.2019.00165","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00165","url":null,"abstract":"Document analysis and recognition is increasingly used to digitise collections of historical books, newspapers and other periodicals. In the digital humanities, it is often the goal to apply information retrieval (IR) and natural language processing (NLP) techniques to help researchers analyse and navigate these digitised archives. The lack of article segmentation is impairing many IR and NLP systems, which assume text is split into ordered, error-free documents. We define a document analysis and image processing task for segmenting digitised newspapers into articles and other content, e.g. adverts, and we automatically create a dataset of 11602 articles. Using this dataset, we develop and evaluate an innovative 2D Markov model that encodes reading order and substantially outperforms the current state-of-the-art, reaching similar accuracy to human annotators.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"38 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114031343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/ICDAR.2019.00-42
Abhash Sinha, Martin Jenckel, S. S. Bukhari, A. Dengel
Optical Character Recognition (OCR) has achieved its state-of-the-art performance with the use of Deep Learning for character recognition. Deep Learning techniques need large amount of data along with ground truth. Out of the available data, small portion of it has to be used for validation purpose as well. Preparing ground truth for historical documents is expensive and hence availability of data is of utmost concern. Jenckel et al. jenckel came up with an idea of using all the available data for training the OCR model and for the purpose of validation, they generated the input image from Softmax layer of the OCR model; using the decoder setup which can be used to compare with the original input image to validate the OCR model. In this paper, we have explored the possibilities of using Generative Adversial Networks (GANs) gan for generating the image directly from the text obtained from OCR model instead of using the Softmax layer which is not always accessible for all the Deep Learning based OCR models. Using text directly to generate the input image back gives us the advantage to use this pipeline for any OCR models even whose Softmax layer is not accessible. In the results section, we have shown that the current state of using GANs for unsupervised OCR model evaluation.
光学字符识别(OCR)通过使用深度学习进行字符识别,达到了最先进的性能。深度学习技术需要大量的数据和基础事实。在可用的数据中,有一小部分也必须用于验证目的。为历史文献准备事实是昂贵的,因此数据的可用性是最重要的。Jenckel et al. Jenckel提出了使用所有可用数据来训练OCR模型的想法,为了验证,他们从OCR模型的Softmax层生成输入图像;使用解码器设置,可用于与原始输入图像进行比较,以验证OCR模型。在本文中,我们探索了使用生成对抗网络(gan) gan直接从OCR模型获得的文本生成图像的可能性,而不是使用Softmax层,这对于所有基于深度学习的OCR模型来说都是不可访问的。使用文本直接生成输入图像给我们的好处是使用这个管道的任何OCR模型,即使Softmax层是不可访问的。在结果部分,我们展示了使用gan进行无监督OCR模型评估的现状。
{"title":"Unsupervised OCR Model Evaluation Using GAN","authors":"Abhash Sinha, Martin Jenckel, S. S. Bukhari, A. Dengel","doi":"10.1109/ICDAR.2019.00-42","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00-42","url":null,"abstract":"Optical Character Recognition (OCR) has achieved its state-of-the-art performance with the use of Deep Learning for character recognition. Deep Learning techniques need large amount of data along with ground truth. Out of the available data, small portion of it has to be used for validation purpose as well. Preparing ground truth for historical documents is expensive and hence availability of data is of utmost concern. Jenckel et al. jenckel came up with an idea of using all the available data for training the OCR model and for the purpose of validation, they generated the input image from Softmax layer of the OCR model; using the decoder setup which can be used to compare with the original input image to validate the OCR model. In this paper, we have explored the possibilities of using Generative Adversial Networks (GANs) gan for generating the image directly from the text obtained from OCR model instead of using the Softmax layer which is not always accessible for all the Deep Learning based OCR models. Using text directly to generate the input image back gives us the advantage to use this pipeline for any OCR models even whose Softmax layer is not accessible. In the results section, we have shown that the current state of using GANs for unsupervised OCR model evaluation.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114848202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}