首页 > 最新文献

2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献

英文 中文
Hybrid Training Data for Historical Text OCR 历史文本OCR混合训练数据
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00096
J. Martínek, Ladislav Lenc, P. Král, Anguelos Nicolaou, V. Christlein
Current optical character recognition (OCR) systems commonly make use of recurrent neural networks (RNN) that process whole text lines. Such systems avoid the task of character segmentation necessary for character-based approaches. A disadvantage of this approach is a need of a large amount of annotated data. This can be solved by sing generated synthetic data instead of costly manually annotated ones. Unfortunately, such data is often not suitable for historical documents particularly for quality reasons. This work presents a hybrid approach for generating annotated data for OCR at a low cost. We first collect a small dataset of isolated characters from historical document images. Then, we generate historical looking text lines from the generated characters. Another contribution lies in the design and implementation of an OCR system based on a convolutional-LSTM network. We first pre-train this system on hybrid data. Afterwards, the network is fine-tuned with real printed text lines. We demonstrate that this training strategy is efficient for obtaining state-of-the-art results. We also show that the score of the proposed system is comparable or even better in comparison to several state-of-the-art systems.
当前的光学字符识别(OCR)系统通常使用递归神经网络(RNN)来处理整行文本。这样的系统避免了基于字符的方法所必需的字符分割任务。这种方法的缺点是需要大量带注释的数据。这可以通过生成合成数据而不是昂贵的手工注释数据来解决。不幸的是,由于质量原因,这些数据通常不适合用于历史文档。本文提出了一种低成本生成OCR注释数据的混合方法。我们首先从历史文档图像中收集孤立字符的小数据集。然后,我们从生成的字符中生成具有历史外观的文本行。另一个贡献在于基于卷积- lstm网络的OCR系统的设计和实现。我们首先在混合数据上预训练这个系统。然后,使用真实打印的文本行对网络进行微调。我们证明了这种训练策略对于获得最先进的结果是有效的。我们还表明,与几个最先进的系统相比,所提出的系统的分数是相当的,甚至更好。
{"title":"Hybrid Training Data for Historical Text OCR","authors":"J. Martínek, Ladislav Lenc, P. Král, Anguelos Nicolaou, V. Christlein","doi":"10.1109/ICDAR.2019.00096","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00096","url":null,"abstract":"Current optical character recognition (OCR) systems commonly make use of recurrent neural networks (RNN) that process whole text lines. Such systems avoid the task of character segmentation necessary for character-based approaches. A disadvantage of this approach is a need of a large amount of annotated data. This can be solved by sing generated synthetic data instead of costly manually annotated ones. Unfortunately, such data is often not suitable for historical documents particularly for quality reasons. This work presents a hybrid approach for generating annotated data for OCR at a low cost. We first collect a small dataset of isolated characters from historical document images. Then, we generate historical looking text lines from the generated characters. Another contribution lies in the design and implementation of an OCR system based on a convolutional-LSTM network. We first pre-train this system on hybrid data. Afterwards, the network is fine-tuned with real printed text lines. We demonstrate that this training strategy is efficient for obtaining state-of-the-art results. We also show that the score of the proposed system is comparable or even better in comparison to several state-of-the-art systems.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122076164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Cascaded Detail-Preserving Networks for Super-Resolution of Document Images 用于文档图像超分辨率的级联细节保留网络
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00047
Zhichao Fu, Yu Kong, Yingbin Zheng, Hao Ye, Wenxin Hu, Jing Yang, Liang He
The accuracy of OCR is usually affected by the quality of the input document image and different kinds of marred document images hamper the OCR results. Among these scenarios, the low-resolution image is a common and challenging case. In this paper, we propose the cascaded networks for document image super-resolution. Our model is composed by the Detail-Preserving Networks with small magnification. The loss function with perceptual terms is designed to simultaneously preserve the original patterns and enhance the edge of the characters. These networks are trained with the same architecture and different parameters and then assembled into a pipeline model with a larger magnification. The low-resolution images can upscale gradually by passing through each Detail-Preserving Network until the final high-resolution images. Through extensive experiments on two scanning document image datasets, we demonstrate that the proposed approach outperforms recent state-of-the-art image super-resolution methods, and combining it with standard OCR system lead to signification improvements on the recognition results.
OCR的准确性通常受到输入文档图像质量的影响,而各种文档图像的损坏会影响OCR的结果。在这些场景中,低分辨率图像是一种常见且具有挑战性的情况。在本文中,我们提出了用于文档图像超分辨率的级联网络。该模型由小放大的细节保持网络组成。设计了带有感知项的损失函数,在保留原始图案的同时增强了字符的边缘。这些网络使用相同的架构和不同的参数进行训练,然后组装成一个具有更大放大倍数的管道模型。低分辨率图像可以通过每个细节保持网络逐步升级,直到最终的高分辨率图像。通过在两个扫描文档图像数据集上的大量实验,我们证明了所提出的方法优于最近最先进的图像超分辨率方法,并且将其与标准OCR系统相结合可以显著提高识别结果。
{"title":"Cascaded Detail-Preserving Networks for Super-Resolution of Document Images","authors":"Zhichao Fu, Yu Kong, Yingbin Zheng, Hao Ye, Wenxin Hu, Jing Yang, Liang He","doi":"10.1109/ICDAR.2019.00047","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00047","url":null,"abstract":"The accuracy of OCR is usually affected by the quality of the input document image and different kinds of marred document images hamper the OCR results. Among these scenarios, the low-resolution image is a common and challenging case. In this paper, we propose the cascaded networks for document image super-resolution. Our model is composed by the Detail-Preserving Networks with small magnification. The loss function with perceptual terms is designed to simultaneously preserve the original patterns and enhance the edge of the characters. These networks are trained with the same architecture and different parameters and then assembled into a pipeline model with a larger magnification. The low-resolution images can upscale gradually by passing through each Detail-Preserving Network until the final high-resolution images. Through extensive experiments on two scanning document image datasets, we demonstrate that the proposed approach outperforms recent state-of-the-art image super-resolution methods, and combining it with standard OCR system lead to signification improvements on the recognition results.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116834359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Detecting Named Entities in Unstructured Bengali Manuscript Images 检测非结构化孟加拉语手稿图像中的命名实体
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00040
Chandranath Adak, B. Chaudhuri, Chin-Teng Lin, M. Blumenstein
In this paper, we undertake a task to find named entities directly from unstructured handwritten document images without any intermediate text/character recognition. Here, we do not receive any assistance from natural language processing. Therefore, it becomes more challenging to detect the named entities. We work on Bengali script which brings some additional hurdles due to its own unique script characteristics. Here, we propose a new deep neural network-based architecture to extract the latent features from a text image. The embedding is then fed to a BLSTM (Bidirectional Long Short-Term Memory) layer. After that, the attention mechanism is adapted to an approach for named entity detection. We perform experimentation on two publicly-available offline handwriting repositories containing 420 Bengali handwritten pages in total. The experimental outcome of our system is quite impressive as it attains 95.43% balanced accuracy on overall named entity detection.
在本文中,我们承担了一个任务,直接从非结构化的手写文档图像中找到命名实体,而不需要任何中间的文本/字符识别。在这里,我们没有得到自然语言处理的任何帮助。因此,检测命名实体变得更具挑战性。我们的孟加拉语脚本由于其独特的脚本特征而带来了一些额外的障碍。在这里,我们提出了一种新的基于深度神经网络的架构来从文本图像中提取潜在特征。然后将嵌入送入双向长短期记忆层(BLSTM)。然后,将注意机制适应于命名实体检测的方法。我们在两个公开可用的离线手写库上进行实验,总共包含420个孟加拉语手写页面。我们系统的实验结果令人印象深刻,因为它在整体命名实体检测上达到了95.43%的平衡准确率。
{"title":"Detecting Named Entities in Unstructured Bengali Manuscript Images","authors":"Chandranath Adak, B. Chaudhuri, Chin-Teng Lin, M. Blumenstein","doi":"10.1109/ICDAR.2019.00040","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00040","url":null,"abstract":"In this paper, we undertake a task to find named entities directly from unstructured handwritten document images without any intermediate text/character recognition. Here, we do not receive any assistance from natural language processing. Therefore, it becomes more challenging to detect the named entities. We work on Bengali script which brings some additional hurdles due to its own unique script characteristics. Here, we propose a new deep neural network-based architecture to extract the latent features from a text image. The embedding is then fed to a BLSTM (Bidirectional Long Short-Term Memory) layer. After that, the attention mechanism is adapted to an approach for named entity detection. We perform experimentation on two publicly-available offline handwriting repositories containing 420 Bengali handwritten pages in total. The experimental outcome of our system is quite impressive as it attains 95.43% balanced accuracy on overall named entity detection.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129400458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Binarization of Degraded Document Images using Convolutional Neural Networks Based on Predicted Two-Channel Images 基于预测双通道图像的卷积神经网络退化文档图像二值化
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00160
Y. Akbari, A. Britto, S. Al-Maadeed, Luiz Oliveira
Due to the poor condition of most of historical documents, binarization is difficult to separate document image background pixels from foreground pixels. This paper proposes Convolutional Neural Networks (CNNs) based on predicted two-channel images in which CNNs are trained to classify the foreground pixels. The promising results from the use of multispectral images for semantic segmentation inspired our efforts to create a novel prediction-based two-channel image. In our method, the original image is binarized by the structural symmetric pixels (SSPs) method, and the two-channel image is constructed from the original image and its binarized image. In order to explore impact of proposed two-channel images as network inputs, we use two popular CNNs architectures, namely SegNet and U-net. The results presented in this work show that our approach fully outperforms SegNet and U-net when trained by the original images and demonstrates competitiveness and robustness compared with state-of-the-art results using the DIBCO database.
由于大多数历史文档的状况较差,二值化很难将文档图像的背景像素与前景像素分离开来。本文提出了基于预测的双通道图像的卷积神经网络(cnn),训练cnn对前景像素进行分类。使用多光谱图像进行语义分割的有希望的结果激发了我们创建一种新的基于预测的双通道图像的努力。该方法采用结构对称像素(ssp)法对原始图像进行二值化处理,并将原始图像与其二值化后的图像构造成双通道图像。为了探索所提出的双通道图像作为网络输入的影响,我们使用了两种流行的cnn架构,即SegNet和U-net。在这项工作中提出的结果表明,我们的方法在原始图像训练时完全优于SegNet和U-net,并且与使用DIBCO数据库的最新结果相比,显示出竞争力和鲁棒性。
{"title":"Binarization of Degraded Document Images using Convolutional Neural Networks Based on Predicted Two-Channel Images","authors":"Y. Akbari, A. Britto, S. Al-Maadeed, Luiz Oliveira","doi":"10.1109/ICDAR.2019.00160","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00160","url":null,"abstract":"Due to the poor condition of most of historical documents, binarization is difficult to separate document image background pixels from foreground pixels. This paper proposes Convolutional Neural Networks (CNNs) based on predicted two-channel images in which CNNs are trained to classify the foreground pixels. The promising results from the use of multispectral images for semantic segmentation inspired our efforts to create a novel prediction-based two-channel image. In our method, the original image is binarized by the structural symmetric pixels (SSPs) method, and the two-channel image is constructed from the original image and its binarized image. In order to explore impact of proposed two-channel images as network inputs, we use two popular CNNs architectures, namely SegNet and U-net. The results presented in this work show that our approach fully outperforms SegNet and U-net when trained by the original images and demonstrates competitiveness and robustness compared with state-of-the-art results using the DIBCO database.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129331034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
ReS2TIM: Reconstruct Syntactic Structures from Table Images ReS2TIM:从表图像中重建语法结构
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00125
Wenyuan Xue, Qingyong Li, D. Tao
Tables often represent densely packed but structured data. Understanding table semantics is vital for effective information retrieval and data mining. Unlike web tables, whose semantics are readable directly from markup language and contents, the full analysis of tables published as images requires the conversion of discrete data into structured information. This paper presents a novel framework to convert a table image into its syntactic representation through the relationships between its cells. In order to reconstruct the syntactic structures of a table, we build a cell relationship network to predict the neighbors of each cell in four directions. During the training stage, a distance-based sample weight is proposed to handle the class imbalance problem. According to the detected relationships, the table is represented by a weighted graph that is then employed to infer the basic syntactic table structure. Experimental evaluation of the proposed framework using two datasets demonstrates the effectiveness of our model for cell relationship detection and table structure inference.
表通常表示密集的结构化数据。理解表语义对于有效的信息检索和数据挖掘至关重要。与语义可直接从标记语言和内容中读取的web表不同,作为图像发布的表的完整分析需要将离散数据转换为结构化信息。本文提出了一种新的框架,通过单元格之间的关系将表图像转换为其语法表示形式。为了重建表的句法结构,我们构建了一个单元格关系网络,在四个方向上预测每个单元格的邻居。在训练阶段,提出了基于距离的样本权值来处理类不平衡问题。根据检测到的关系,表用加权图表示,然后使用加权图来推断基本的语法表结构。使用两个数据集对所提出的框架进行实验评估,证明了我们的模型在细胞关系检测和表结构推断方面的有效性。
{"title":"ReS2TIM: Reconstruct Syntactic Structures from Table Images","authors":"Wenyuan Xue, Qingyong Li, D. Tao","doi":"10.1109/ICDAR.2019.00125","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00125","url":null,"abstract":"Tables often represent densely packed but structured data. Understanding table semantics is vital for effective information retrieval and data mining. Unlike web tables, whose semantics are readable directly from markup language and contents, the full analysis of tables published as images requires the conversion of discrete data into structured information. This paper presents a novel framework to convert a table image into its syntactic representation through the relationships between its cells. In order to reconstruct the syntactic structures of a table, we build a cell relationship network to predict the neighbors of each cell in four directions. During the training stage, a distance-based sample weight is proposed to handle the class imbalance problem. According to the detected relationships, the table is represented by a weighted graph that is then employed to infer the basic syntactic table structure. Experimental evaluation of the proposed framework using two datasets demonstrates the effectiveness of our model for cell relationship detection and table structure inference.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127120199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
A New Document Image Quality Assessment Method Based on Hast Derivations 一种基于哈斯特导数的文档图像质量评估新方法
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00201
Alireza Alaei
With the rapid emergence of new technologies, a voluminous number of images including document images is generated every day. Considering the volume of data and complexity of processes, manual analysis, annotation, recognition, classification, and retrieval, of such document images is impossible. To automatically deal with such processes, many document image analysis applications exist in the literature and many of them are currently in place in different organisation and institutes. The performance of those applications are directly affected by the quality of document images. Therefore, a document image quality assessment (DIQA) method is of primary need to allow users capture, compress and forward good quality (readable) document images to various information systems, such as online business and insurance, for further processing. To assess the quality of document images, this paper proposes a new full-reference DIQA method using first followed by second order Hast derivations. A similarity map is then created using second order Hast derivation maps obtained by employing Hast filters on both reference and distorted images. An average pooling is then employed to obtain a quality score for the distorted document image. To evaluate the proposed method, two different datasets were used. Both datasets are composed of images with the mean human opinion scores (MHOS) considered as ground truth. The results obtained from the proposed DIQA method are superior to the results reported in the literature.
随着新技术的快速发展,每天都会产生大量的图像,包括文档图像。考虑到数据量和过程的复杂性,手工分析、注释、识别、分类和检索这些文档图像是不可能的。为了自动处理这些过程,文献中存在许多文档图像分析应用程序,其中许多应用程序目前在不同的组织和研究所中使用。这些应用程序的性能直接受到文档图像质量的影响。因此,文档图像质量评估(DIQA)方法是允许用户捕获、压缩并将高质量(可读)文档图像转发到各种信息系统(如在线业务和保险)以进行进一步处理的主要需要。为了评估文档图像的质量,本文提出了一种新的全参考DIQA方法,该方法采用二阶哈斯特导数。然后使用二阶哈斯特衍生图,通过对参考图像和扭曲图像使用哈斯特滤波器获得相似图。然后使用平均池化来获得失真文档图像的质量分数。为了评估所提出的方法,使用了两个不同的数据集。这两个数据集都是由人类意见平均分(MHOS)作为真实值的图像组成的。所提出的DIQA方法得到的结果优于文献报道的结果。
{"title":"A New Document Image Quality Assessment Method Based on Hast Derivations","authors":"Alireza Alaei","doi":"10.1109/ICDAR.2019.00201","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00201","url":null,"abstract":"With the rapid emergence of new technologies, a voluminous number of images including document images is generated every day. Considering the volume of data and complexity of processes, manual analysis, annotation, recognition, classification, and retrieval, of such document images is impossible. To automatically deal with such processes, many document image analysis applications exist in the literature and many of them are currently in place in different organisation and institutes. The performance of those applications are directly affected by the quality of document images. Therefore, a document image quality assessment (DIQA) method is of primary need to allow users capture, compress and forward good quality (readable) document images to various information systems, such as online business and insurance, for further processing. To assess the quality of document images, this paper proposes a new full-reference DIQA method using first followed by second order Hast derivations. A similarity map is then created using second order Hast derivation maps obtained by employing Hast filters on both reference and distorted images. An average pooling is then employed to obtain a quality score for the distorted document image. To evaluate the proposed method, two different datasets were used. Both datasets are composed of images with the mean human opinion scores (MHOS) considered as ground truth. The results obtained from the proposed DIQA method are superior to the results reported in the literature.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125687967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Layout Analysis on Challenging Historical Arabic Manuscripts using Siamese Network 利用Siamese网络分析阿拉伯历史手稿的布局
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00123
Reem Alaasam, Berat Kurar, Jihad El-Sana
This paper presents layout analysis for historical Arabic documents using siamese network. Given pages from different documents, we divide them into patches of similar sizes. We train a siamese network model that takes as an input a pair of patches and gives as an output a distance that corresponds to the similarity between the two patches. We used the trained model to calculate a distance matrix which in turn is used to cluster the patches of a page as either main text, side text or a background patch. We evaluate our method on challenging historical Arabic manuscripts dataset and report the F-measure. We show the effectiveness of our method by comparing with other works that use deep learning approaches, and show that we have state of art results.
本文介绍了利用暹罗网络对阿拉伯历史文献进行版面分析的方法。给定来自不同文档的页面,我们将它们划分为大小相似的补丁。我们训练了一个暹罗网络模型,该模型将一对补丁作为输入,并给出对应于两个补丁之间相似度的距离作为输出。我们使用训练好的模型来计算距离矩阵,该矩阵反过来用于将页面的补丁聚类为主文本,侧文本或背景补丁。我们在具有挑战性的历史阿拉伯手稿数据集上评估了我们的方法,并报告了f值。通过与其他使用深度学习方法的作品进行比较,我们展示了我们方法的有效性,并展示了我们拥有最先进的结果。
{"title":"Layout Analysis on Challenging Historical Arabic Manuscripts using Siamese Network","authors":"Reem Alaasam, Berat Kurar, Jihad El-Sana","doi":"10.1109/ICDAR.2019.00123","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00123","url":null,"abstract":"This paper presents layout analysis for historical Arabic documents using siamese network. Given pages from different documents, we divide them into patches of similar sizes. We train a siamese network model that takes as an input a pair of patches and gives as an output a distance that corresponds to the similarity between the two patches. We used the trained model to calculate a distance matrix which in turn is used to cluster the patches of a page as either main text, side text or a background patch. We evaluate our method on challenging historical Arabic manuscripts dataset and report the F-measure. We show the effectiveness of our method by comparing with other works that use deep learning approaches, and show that we have state of art results.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131716758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Cross-Modal Prototype Learning for Zero-Shot Handwriting Recognition 零射击手写识别的跨模态原型学习
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00100
Xiang Ao, Xu-Yao Zhang, Hong-Ming Yang, Fei Yin, Cheng-Lin Liu
In contrast to machine recognizers that rely on training with large handwriting data, humans can recognize handwriting accurately on learning from few samples, and can even generalize to handwritten characters from printed samples. Simulating this ability in machine recognition is important to alleviate the burden of labeling large handwriting data, especially for large category set as in Chinese text. In this paper, inspired by human learning, we propose a cross-modal prototype learning (CMPL) method for zero-shot online handwritten character recognition: for unseen categories, handwritten characters can be recognized without learning from handwritten samples, but instead from printed characters. Particularly, the printed characters (one for each class) are embedded into a convolutional neural network (CNN) feature space to obtain prototypes representing each class, while the online handwriting trajectories are embedded with a recurrent neural network (RNN). Via cross-modal joint learning, handwritten characters can be recognized according to the printed prototypes. For unseen categories, handwritten characters can be recognized by only feeding a printed sample per category. Experiments on a benchmark Chinese handwriting database have shown the effectiveness and potential of the proposed method for zero-shot handwriting recognition.
与依赖大量手写数据训练的机器识别器相比,人类可以通过少量样本学习准确识别手写,甚至可以从印刷样本中推广到手写字符。在机器识别中模拟这种能力对于减轻标记大型手写数据的负担非常重要,特别是对于像中文文本这样的大型类别集。在本文中,受人类学习的启发,我们提出了一种用于零射击在线手写字符识别的跨模态原型学习(CMPL)方法:对于未知类别,手写字符可以不从手写样本中学习,而是从印刷字符中学习。特别是,将打印的字符(每个类别一个)嵌入到卷积神经网络(CNN)特征空间中以获得代表每个类别的原型,而在线手写轨迹则嵌入到循环神经网络(RNN)中。通过跨模态联合学习,可以根据打印原型识别手写字符。对于看不见的类别,手写字符可以通过只输入每个类别的打印样本来识别。在一个基准中文手写体数据库上的实验证明了该方法在零射击手写体识别中的有效性和潜力。
{"title":"Cross-Modal Prototype Learning for Zero-Shot Handwriting Recognition","authors":"Xiang Ao, Xu-Yao Zhang, Hong-Ming Yang, Fei Yin, Cheng-Lin Liu","doi":"10.1109/ICDAR.2019.00100","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00100","url":null,"abstract":"In contrast to machine recognizers that rely on training with large handwriting data, humans can recognize handwriting accurately on learning from few samples, and can even generalize to handwritten characters from printed samples. Simulating this ability in machine recognition is important to alleviate the burden of labeling large handwriting data, especially for large category set as in Chinese text. In this paper, inspired by human learning, we propose a cross-modal prototype learning (CMPL) method for zero-shot online handwritten character recognition: for unseen categories, handwritten characters can be recognized without learning from handwritten samples, but instead from printed characters. Particularly, the printed characters (one for each class) are embedded into a convolutional neural network (CNN) feature space to obtain prototypes representing each class, while the online handwriting trajectories are embedded with a recurrent neural network (RNN). Via cross-modal joint learning, handwritten characters can be recognized according to the printed prototypes. For unseen categories, handwritten characters can be recognized by only feeding a printed sample per category. Experiments on a benchmark Chinese handwriting database have shown the effectiveness and potential of the proposed method for zero-shot handwriting recognition.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123414889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An End-to-End Trainable Framework for Joint Optimization of Document Enhancement and Recognition 文档增强和识别联合优化的端到端可训练框架
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00019
Anupama Ray, Manoj Sharma, Avinash Upadhyay, Megh Makwana, S. Chaudhury, Akkshita Trivedi, Ajay Pratap Singh, Anil K. Saini
Recognizing text from degraded and low-resolution document images is still an open challenge in the vision community. Existing text recognition systems require a certain resolution and fails if the document is of low-resolution or heavily degraded or noisy. This paper presents an end-to-end trainable deep-learning based framework for joint optimization of document enhancement and recognition. We are using a generative adversarial network (GAN) based framework to perform image denoising followed by deep back projection network (DBPN) for super-resolution and use these super-resolved features to train a bidirectional long short term memory (BLSTM) with Connectionist Temporal Classification (CTC) for recognition of textual sequences. The entire network is end-to-end trainable and we obtain improved results than state-of-the-art for both the image enhancement and document recognition tasks. We demonstrate results on both printed and handwritten degraded document datasets to show the generalization capability of our proposed robust framework.
从降级和低分辨率文档图像中识别文本仍然是视觉界的一个开放挑战。现有的文本识别系统需要一定的分辨率,如果文档是低分辨率或严重退化或噪声失败。提出了一种基于端到端可训练深度学习的文档增强和识别联合优化框架。我们使用基于生成对抗网络(GAN)的框架来执行图像去噪,然后使用深度反向投影网络(DBPN)进行超分辨率处理,并使用这些超分辨率特征来训练双向长短期记忆(BLSTM)和连接时间分类(CTC)来识别文本序列。整个网络是端到端可训练的,我们在图像增强和文档识别任务中获得了比最先进的结果。我们展示了打印和手写退化文档数据集的结果,以展示我们提出的鲁棒框架的泛化能力。
{"title":"An End-to-End Trainable Framework for Joint Optimization of Document Enhancement and Recognition","authors":"Anupama Ray, Manoj Sharma, Avinash Upadhyay, Megh Makwana, S. Chaudhury, Akkshita Trivedi, Ajay Pratap Singh, Anil K. Saini","doi":"10.1109/ICDAR.2019.00019","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00019","url":null,"abstract":"Recognizing text from degraded and low-resolution document images is still an open challenge in the vision community. Existing text recognition systems require a certain resolution and fails if the document is of low-resolution or heavily degraded or noisy. This paper presents an end-to-end trainable deep-learning based framework for joint optimization of document enhancement and recognition. We are using a generative adversarial network (GAN) based framework to perform image denoising followed by deep back projection network (DBPN) for super-resolution and use these super-resolved features to train a bidirectional long short term memory (BLSTM) with Connectionist Temporal Classification (CTC) for recognition of textual sequences. The entire network is end-to-end trainable and we obtain improved results than state-of-the-art for both the image enhancement and document recognition tasks. We demonstrate results on both printed and handwritten degraded document datasets to show the generalization capability of our proposed robust framework.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126097506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Document Binarization via Multi-resolutional Attention Model with DRD Loss 基于DRD损失的多分辨率注意力模型的文档二值化
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00017
Xujun Peng, Chao Wang, Huaigu Cao
Document binarization which separates text from background is a critical pre-processing step for many high level document analysis tasks. Conventional document binarization approaches tend to use hand-craft features and empirical rules to simulate the degradation process of document image and accomplish the binarization task. In this paper, we propose a deep learning framework where the probability of text areas is inferred through a multi-resolutional attention model, which is consequently fed into a convolutional conditional random field (ConvCRF) to obtain the final binarized document image. In the proposed approach, the features of degraded document image are learned by neural networks and the relations between text areas and backgrounds are inferred by ConvCRF, which avoids the dependence of domain knowledge from researchers and has more generalization capabilities. The experimental results on public datasets show that the proposed method has superior binarization performance than the existing state-of-the-art approaches.
将文本与背景分离的文档二值化是许多高级文档分析任务的关键预处理步骤。传统的文档二值化方法倾向于利用手工特征和经验规则来模拟文档图像的退化过程,完成二值化任务。在本文中,我们提出了一个深度学习框架,其中通过多分辨率注意力模型推断文本区域的概率,然后将其输入卷积条件随机场(ConvCRF)以获得最终的二值化文档图像。该方法利用神经网络学习退化后的文档图像特征,利用卷积神经循环算法推断文本区域与背景之间的关系,避免了对研究人员领域知识的依赖,具有更强的泛化能力。在公共数据集上的实验结果表明,该方法比现有的先进方法具有更好的二值化性能。
{"title":"Document Binarization via Multi-resolutional Attention Model with DRD Loss","authors":"Xujun Peng, Chao Wang, Huaigu Cao","doi":"10.1109/ICDAR.2019.00017","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00017","url":null,"abstract":"Document binarization which separates text from background is a critical pre-processing step for many high level document analysis tasks. Conventional document binarization approaches tend to use hand-craft features and empirical rules to simulate the degradation process of document image and accomplish the binarization task. In this paper, we propose a deep learning framework where the probability of text areas is inferred through a multi-resolutional attention model, which is consequently fed into a convolutional conditional random field (ConvCRF) to obtain the final binarized document image. In the proposed approach, the features of degraded document image are learned by neural networks and the relations between text areas and backgrounds are inferred by ConvCRF, which avoids the dependence of domain knowledge from researchers and has more generalization capabilities. The experimental results on public datasets show that the proposed method has superior binarization performance than the existing state-of-the-art approaches.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122660960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
期刊
2019 International Conference on Document Analysis and Recognition (ICDAR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1