2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献

英文中文

Amharic Text Image Recognition: Database, Algorithm, and Analysis 阿姆哈拉文字图像识别:数据库、算法和分析

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00205

B. Belay, T. Habtegebrial, M. Liwicki, Gebeyehu Belay, D. Stricker

This paper introduces a dataset for an exotic, but very interesting script, Amharic. Amharic follows a unique syllabic writing system which uses 33 consonant characters with their 7 vowels variants of each. Some labialized characters derived by adding diacritical marks on consonants and or removing part of it. These associated diacritics on consonant characters are relatively smaller in size and challenging to distinguish the derived (vowel and labialized) characters. In this paper we tackle the problem of Amharic text-line image recognition. In this work, we propose a recurrent neural network based method to recognize Amharic text-line images. The proposed method uses Long Short Term Memory (LSTM) networks together with CTC (Connectionist Temporal Classification). Furthermore, in order to overcome the lack of annotated data, we introduce a new dataset that contains 337,332 Amharic text-line images which is made freely available at http://www.dfki.uni-kl.de/~belay/. The performance of the proposed Amharic OCR model is tested by both printed and synthetically generated datasets, and promising results are obtained.

本文介绍了一个外来的，但非常有趣的脚本的数据集，Amharic。阿姆哈拉语遵循一种独特的音节书写系统，它使用33个辅音字符和每个辅音字符的7个元音变体。通过在辅音上加变音标和或去掉辅音的一部分而得到的一些阴唇化的字符。辅音字符上的这些相关变音符的大小相对较小，很难区分派生(元音和唇化)字符。本文主要研究阿姆哈拉语文本行图像识别问题。在这项工作中，我们提出了一种基于递归神经网络的方法来识别阿姆哈拉语文本行图像。该方法将长短期记忆(LSTM)网络与连接时间分类(CTC)相结合。此外，为了克服标注数据的缺乏，我们引入了一个包含337,332个阿姆哈拉语文本行图像的新数据集，该数据集可在http://www.dfki.uni-kl.de/~belay/上免费获得。本文提出的Amharic OCR模型在打印数据集和合成数据集上进行了性能测试，取得了令人满意的结果。

{"title":"Amharic Text Image Recognition: Database, Algorithm, and Analysis","authors":"B. Belay, T. Habtegebrial, M. Liwicki, Gebeyehu Belay, D. Stricker","doi":"10.1109/ICDAR.2019.00205","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00205","url":null,"abstract":"This paper introduces a dataset for an exotic, but very interesting script, Amharic. Amharic follows a unique syllabic writing system which uses 33 consonant characters with their 7 vowels variants of each. Some labialized characters derived by adding diacritical marks on consonants and or removing part of it. These associated diacritics on consonant characters are relatively smaller in size and challenging to distinguish the derived (vowel and labialized) characters. In this paper we tackle the problem of Amharic text-line image recognition. In this work, we propose a recurrent neural network based method to recognize Amharic text-line images. The proposed method uses Long Short Term Memory (LSTM) networks together with CTC (Connectionist Temporal Classification). Furthermore, in order to overcome the lack of annotated data, we introduce a new dataset that contains 337,332 Amharic text-line images which is made freely available at http://www.dfki.uni-kl.de/~belay/. The performance of the proposed Amharic OCR model is tested by both printed and synthetically generated datasets, and promising results are obtained.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"426 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131813089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 14

ICDAR 2019 Time-Quality Binarization Competition ICDAR 2019时间质量二值化竞赛

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00248

R. Lins, E. Kavallieratou, E. B. Smith, R. Bernardino, D. Jesus

The ICDAR 2019 Time-Quality Binarization Competition assessed the performance of seventeen new together with thirty previously published binarization algorithms. The quality of the resulting two-tone image and the execution time were assessed. Comparisons were on both in "real-world" and synthetic scanned images, and in documents photographed with four models of widely used portable phones. Most of the submitted algorithms employed machine learning techniques and performed best on the most complex images. Traditional algorithms provided very good results at a fraction of the time.

ICDAR 2019时间质量二值化竞赛评估了17种新二值化算法和30种先前发布的二值化算法的性能。评估了得到的双色图像的质量和执行时间。研究人员对“真实世界”和合成扫描图像进行了比较，并对使用四种型号的广泛使用的便携式手机拍摄的文件进行了比较。大多数提交的算法都采用了机器学习技术，在最复杂的图像上表现最好。传统算法在很短的时间内提供了很好的结果。

引用次数: 14

DECO: A Dataset of Annotated Spreadsheets for Layout and Table Recognition DECO:用于布局和表识别的注释电子表格数据集

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00207

Elvis Koci, Maik Thiele, Josephine Rehak, Oscar Romero, Wolfgang Lehner

This paper presents DECO (Dresden Enron COrpus), a dataset of spreadsheet files, annotated on the basis of layout and contents. It comprises of 1,165 files, extracted from the Enron corpus. Three different annotators (judges) assigned layout roles (e.g., Header, Data, and Notes) to non-empty cells and marked the borders of tables. Files that do not contain tables were flagged using categories such as Template, Form, and Report. Subsequently, a thorough analysis is performed to uncover the characteristics of the overall dataset and specific annotations. The results are discussed in this paper, providing several takeaways for future works. Furthermore, this work describes in detail the annotation methodology, going through the individual steps. The dataset, methodology, and tools are made publicly available, so that they can be adopted for further studies. DECO is available at: https://wwwdb.inf.tu-dresden.de/research-projects/deexcelarator/

本文介绍了DECO(德累斯顿安然语料库)，一个电子表格文件的数据集，在布局和内容的基础上进行了注释。它包括从安然语料库中提取的1,165个文件。三个不同的注释者(法官)为非空单元格分配布局角色(例如，Header、Data和Notes)，并标记表格的边界。使用模板、表单和报告等类别标记不包含表的文件。随后，执行彻底的分析，以揭示整个数据集和特定注释的特征。本文对研究结果进行了讨论，并对今后的工作提出了几点建议。此外，本工作详细描述了注释方法，通过各个步骤。数据集、方法和工具都是公开的，以便它们可以用于进一步的研究。DECO网站:https://wwwdb.inf.tu-dresden.de/research-projects/deexcelarator/

引用次数: 16

A Text-Context-Aware CNN Network for Multi-oriented and Multi-language Scene Text Detection 面向多语言场景文本检测的文本-上下文感知CNN网络

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00116

Yao Xiao, Minglong Xue, Tong Lu, Yirui Wu, P. Shivakumara

The existing deep learning based state-of-theart scene text detection methods treat scene texts a type of general objects, or segment text regions directly. The latter category achieves remarkable detection results on arbitraryorientation and large aspect ratios of scene texts based on instance segmentation algorithms. However, due to the lack of context information with consideration of scene text unique characteristics, directly applying instance segmentation to text detection task is prone to result in low accuracy, especially producing false positive detection results. To ease this problem, we propose a novel text-context-aware scene text detection CNN structure, which appropriately encodes channel and spatial attention information to construct context-aware and discriminative feature map for multi-oriented and multi-language text detection tasks. With high representation ability of textcontext-aware feature map, the proposed instance segmentation based method can not only robustly detect multi-oriented and multi-language text from natural scene images, but also produce better text detection results by greatly reducing false positives. Experiments on ICDAR2015 and ICDAR2017-MLT datasets show that the proposed method has achieved superior performances in precision, recall and F-measure than most of the existing studies.

现有的基于深度学习的场景文本检测方法将场景文本作为一般对象，或者直接分割文本区域。后一类基于实例分割算法的场景文本在任意方向和大宽高比下的检测效果显著。然而，考虑到场景文本的独特性，由于缺乏上下文信息，直接将实例分割应用于文本检测任务容易导致准确率低，特别是产生假阳性检测结果。为了解决这一问题，我们提出了一种新的文本-上下文感知场景文本检测CNN结构，该结构对通道和空间注意信息进行适当编码，构建上下文感知和判别特征映射，用于多方向、多语言的文本检测任务。该方法具有较高的文本上下文感知特征映射表示能力，不仅可以鲁棒地检测自然场景图像中的多方向、多语言文本，而且大大降低了误报率，产生了较好的文本检测效果。在ICDAR2015和ICDAR2017-MLT数据集上的实验表明，该方法在查全率、查全率和F-measure等方面都取得了较好的效果。

{"title":"A Text-Context-Aware CNN Network for Multi-oriented and Multi-language Scene Text Detection","authors":"Yao Xiao, Minglong Xue, Tong Lu, Yirui Wu, P. Shivakumara","doi":"10.1109/ICDAR.2019.00116","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00116","url":null,"abstract":"The existing deep learning based state-of-theart scene text detection methods treat scene texts a type of general objects, or segment text regions directly. The latter category achieves remarkable detection results on arbitraryorientation and large aspect ratios of scene texts based on instance segmentation algorithms. However, due to the lack of context information with consideration of scene text unique characteristics, directly applying instance segmentation to text detection task is prone to result in low accuracy, especially producing false positive detection results. To ease this problem, we propose a novel text-context-aware scene text detection CNN structure, which appropriately encodes channel and spatial attention information to construct context-aware and discriminative feature map for multi-oriented and multi-language text detection tasks. With high representation ability of textcontext-aware feature map, the proposed instance segmentation based method can not only robustly detect multi-oriented and multi-language text from natural scene images, but also produce better text detection results by greatly reducing false positives. Experiments on ICDAR2015 and ICDAR2017-MLT datasets show that the proposed method has achieved superior performances in precision, recall and F-measure than most of the existing studies.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128283920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A Novel Procedure to Speed up the Transcription of Historical Handwritten Documents by Interleaving Keyword Spotting and user Validation 一种基于交错关键字识别和用户验证的加速历史手写文档转录的新方法

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00198

Adolfo Santoro, A. Marcelli

We propose a novel procedure to speed-up the content transcription of handwritten documents in digital historical archives when a keyword spotting system is used for the purpose. Instead of performing the validation of the system outputs in a single step, as it is customary, the proposed methodology envisaged a multi-step validation process to be embedded into a human-in-the-loop approach. At each step, the system outputs are validated and, whenever an image word that does not correspond to any entry of the keyword list is mistakenly returned by the system, its correct transcription is entered and used to query the system in the next step. The performance of our approach has been experimentally evaluated in terms of the total time to achieve the complete transcription of a subset of documents from the Bentham dataset. The results confirm that interleaving keyword spotting by the system and validation by the user leads to a significant reduction of the time required to transcribe the document content with respect to both the manual transcription and the traditional end-of-the-loop validation process.

本文提出了一种利用关键词识别系统加快数字历史档案中手写文献内容转录速度的新方法。拟议的方法不是按照惯例在单个步骤中执行系统输出的验证，而是设想将多步骤验证过程嵌入到人在循环方法中。在每一步，验证系统输出，当系统错误地返回与关键字列表中的任何条目不对应的图像词时，将输入其正确的转录并用于下一步查询系统。我们的方法的性能已经通过实验评估，根据从边沁数据集实现文档子集的完整转录的总时间。结果证实，与手动转录和传统的循环末端验证过程相比，系统的关键字识别和用户的验证相结合可以显著减少转录文档内容所需的时间。

{"title":"A Novel Procedure to Speed up the Transcription of Historical Handwritten Documents by Interleaving Keyword Spotting and user Validation","authors":"Adolfo Santoro, A. Marcelli","doi":"10.1109/ICDAR.2019.00198","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00198","url":null,"abstract":"We propose a novel procedure to speed-up the content transcription of handwritten documents in digital historical archives when a keyword spotting system is used for the purpose. Instead of performing the validation of the system outputs in a single step, as it is customary, the proposed methodology envisaged a multi-step validation process to be embedded into a human-in-the-loop approach. At each step, the system outputs are validated and, whenever an image word that does not correspond to any entry of the keyword list is mistakenly returned by the system, its correct transcription is entered and used to query the system in the next step. The performance of our approach has been experimentally evaluated in terms of the total time to achieve the complete transcription of a subset of documents from the Bentham dataset. The results confirm that interleaving keyword spotting by the system and validation by the user leads to a significant reduction of the time required to transcribe the document content with respect to both the manual transcription and the traditional end-of-the-loop validation process.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134086431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Hiding Security Feature Into Text Content for Securing Documents Using Generated Font 将安全功能隐藏到文本内容中以使用生成的字体保护文档

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00196

Vinh Loc Cu, J. Burie, J. Ogier, Cheng-Lin Liu

Motivated by increasing possibility of the tampering of genuine documents during a transmission over digital channels, we focus on developing a watermarking framework for determining whether a given document is genuine or falsified. The proposed framework is performed by hiding a security feature or secret information within the document. In order to hide the security feature, we replace the appropriate characters of legal document by the equivalent characters coming from generated fonts, called hereafter the variations of characters. These variations are produced by training generative adversarial networks (GAN) with the features of character's skeleton and normal shape. Regarding the process of detecting hidden information, we make use of fully convolutional networks (FCN) to produce salient regions from the watermarked document. The salient regions mark positions of document where the characters are substituted by their variations, and these positions are used as a reference for extracting the hidden information. Lastly, we demonstrate that our approach gives high precision of data detection, and competitive performance compared to state-of-the-art approaches.

由于在数字渠道传输过程中真实文件被篡改的可能性越来越大，我们专注于开发一种水印框架，以确定给定文件是真实的还是伪造的。建议的框架是通过在文档中隐藏安全特性或秘密信息来执行的。为了隐藏法律文件的防伪特性，我们将法律文件中相应的字符替换为生成字体中相应的字符，以下称为字符变体。这些变化是通过训练具有人物骨架和正常形状特征的生成对抗网络(GAN)产生的。在检测隐藏信息的过程中，我们利用全卷积网络(FCN)从水印文档中产生显著区域。突出区域标记了文档中字符被其变体替换的位置，这些位置作为提取隐藏信息的参考。最后，我们证明了我们的方法提供了高精度的数据检测，与最先进的方法相比，具有竞争力的性能。

{"title":"Hiding Security Feature Into Text Content for Securing Documents Using Generated Font","authors":"Vinh Loc Cu, J. Burie, J. Ogier, Cheng-Lin Liu","doi":"10.1109/ICDAR.2019.00196","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00196","url":null,"abstract":"Motivated by increasing possibility of the tampering of genuine documents during a transmission over digital channels, we focus on developing a watermarking framework for determining whether a given document is genuine or falsified. The proposed framework is performed by hiding a security feature or secret information within the document. In order to hide the security feature, we replace the appropriate characters of legal document by the equivalent characters coming from generated fonts, called hereafter the variations of characters. These variations are produced by training generative adversarial networks (GAN) with the features of character's skeleton and normal shape. Regarding the process of detecting hidden information, we make use of fully convolutional networks (FCN) to produce salient regions from the watermarked document. The salient regions mark positions of document where the characters are substituted by their variations, and these positions are used as a reference for extracting the hidden information. Lastly, we demonstrate that our approach gives high precision of data detection, and competitive performance compared to state-of-the-art approaches.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134109919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

ICDAR 2019 Competition on Large-Scale Street View Text with Partial Labeling - RRC-LSVT ICDAR 2019大尺度街景文本部分标注竞赛- RRC-LSVT

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/icdar.2019.00250

Yipeng Sun, Zihan Ni, Chee-Kheng Chng, Yuliang Liu, Canjie Luo, Chun Chet Ng, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin

Robust text reading from street view images provides valuable information for various applications. Performance improvement of existing methods in such a challenging scenario heavily relies on the amount of fully annotated training data, which is costly and in-efficient to obtain. To scale up the amount of training data while keeping the labeling procedure cost-effective, this competition introduces a new challenge on Large-scale Street View Text with Partial Labeling (LSVT), providing 5,0000 and 400,000 images in full and weak annotations, respectively. This competition aims to explore the abilities of state-of-the-art methods to detect and recognize text instances from large-scale street view images, closing gaps between research benchmarks and real applications. During the competition period, a total number of 41 teams participate in the two tasks with 132 valid submissions, i.e., text detection and end-to-end text spotting. This paper includes dataset descriptions, task definitions, evaluation protocols and results summaries of ICDAR 2019-LSVT challenge.

强大的文本读取街景图像为各种应用提供了有价值的信息。在这种具有挑战性的场景中，现有方法的性能改进严重依赖于完全注释的训练数据的数量，而这些数据的获取成本高且效率低。为了扩大训练数据量，同时保持标注过程的成本效益，本次比赛引入了一个新的挑战，即大规模街景文本部分标注(LSVT)，分别提供了50万张和40万张完整和弱标注的图像。本次竞赛旨在探索从大规模街景图像中检测和识别文本实例的最先进方法的能力，缩小研究基准和实际应用之间的差距。在比赛期间，共有41支队伍参与了文本检测和端到端文本识别两项任务，共提交了132份有效参赛作品。本文包括ICDAR 2019-LSVT挑战的数据集描述、任务定义、评估协议和结果总结。

{"title":"ICDAR 2019 Competition on Large-Scale Street View Text with Partial Labeling - RRC-LSVT","authors":"Yipeng Sun, Zihan Ni, Chee-Kheng Chng, Yuliang Liu, Canjie Luo, Chun Chet Ng, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin","doi":"10.1109/icdar.2019.00250","DOIUrl":"https://doi.org/10.1109/icdar.2019.00250","url":null,"abstract":"Robust text reading from street view images provides valuable information for various applications. Performance improvement of existing methods in such a challenging scenario heavily relies on the amount of fully annotated training data, which is costly and in-efficient to obtain. To scale up the amount of training data while keeping the labeling procedure cost-effective, this competition introduces a new challenge on Large-scale Street View Text with Partial Labeling (LSVT), providing 5,0000 and 400,000 images in full and weak annotations, respectively. This competition aims to explore the abilities of state-of-the-art methods to detect and recognize text instances from large-scale street view images, closing gaps between research benchmarks and real applications. During the competition period, a total number of 41 teams participate in the two tasks with 132 valid submissions, i.e., text detection and end-to-end text spotting. This paper includes dataset descriptions, task definitions, evaluation protocols and results summaries of ICDAR 2019-LSVT challenge.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133219525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 79

Welcome Message from the Program Chairs 节目主持人的欢迎辞

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/icdar.2019.00007

G. Feigin

Welcome to Vancouver and the 20 Annual Conference of the Production and Operations Management Society! We received 988 abstracts for this conference. These submissions have been clustered into 19 tracks across the entire spectrum of Operations Management. In keeping with the theme of the conference, one of these tracks is titled “Operations in Emerging Economies” and features presentations from researchers in various countries. We would like to thank all the track chairs for their hard work in soliciting speakers and helping to put the program together. The track chairs are:

欢迎来到温哥华参加第20届生产经营管理学会年会!这次会议我们收到了988份摘要。这些提交的内容被分成19个轨道，横跨整个运营管理领域。为配合会议的主题，其中一个专题题为“新兴经济体的运作”，由各国研究人员发表演讲。我们要感谢所有的田径主席，感谢他们在邀请演讲者和帮助组织这个项目方面所做的辛勤工作。轨道椅有:

引用次数: 0

Handwriting Recognition Based on Temporal Order Restored by the End-to-End System 基于端到端系统时间顺序还原的手写识别

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00199

Besma Rabhi, A. Elbaati, Y. Hamdi, A. Alimi

In this paper, we present an original framework for offline handwriting recognition. Our developed recognition system is based on Sequence to Sequence model employing the encoder decoder LSTM, for recovering temporal order from offline handwriting. Handwriting temporal recovery consists of two parts which are respectively extracting features using a Convolution Neural Network (CNN) followed by an LSTM layer and decoding the encoded vectors to generate temporal information using BLSTM. To produce a human-like velocity, we make a Sampling operation by the consideration of trajectory curvatures. Our work is validated by the LSTM recognition system based on Beta Elliptic model that is applied on Arabic and Latin On/Off dual handwriting character database.

在本文中，我们提出了一个原始的离线手写识别框架。我们开发的识别系统基于序列到序列模型，采用编码器-解码器LSTM，用于从离线手写中恢复时间顺序。笔迹时间恢复包括两个部分，分别是使用卷积神经网络(CNN)和LSTM层提取特征，并使用BLSTM对编码向量进行解码以生成时间信息。为了产生类似人的速度，我们考虑了轨迹曲率，进行了采样操作。基于Beta椭圆模型的LSTM识别系统在阿拉伯文和拉丁文开/关双手写体字符数据库上得到了验证。

引用次数: 12

WiSe — Slide Segmentation in the Wild 明智的幻灯片分割在野外

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00062

Monica Haurilet, Alina Roitberg, Manuel Martínez, R. Stiefelhagen

We address the task of segmenting presentation slides, where the examined page was captured as a live photo during lectures. Slides are important document types used as visual components accompanying presentations in a variety of fields ranging from education to business. However, automatic analysis of presentation slides has not been researched sufficiently, and, so far, only preprocessed images of already digitalized slide documents were considered. We aim to introduce the task of analyzing unconstrained photos of slides taken during lectures and present a novel dataset for Page Segmentation with slides captured in the Wild (WiSe). Our dataset covers pixel-wise annotations of 25 classes on 1300 pages, allowing overlapping regions (i.e., multi-class assignments). To evaluate the performance, we define multiple benchmark metrics and baseline methods for our dataset. We further implement two different deep neural network approaches previously used for segmenting natural images and adopt them for the task. Our evaluation results demonstrate the effectiveness of the deep learning-based methods, surpassing the baseline methods by over 30%. To foster further research of slide analysis in unconstrained photos, we make the WiSe dataset publicly available to the community.

我们解决了分割演示幻灯片的任务，其中检查的页面在讲座期间被捕获为实时照片。幻灯片是一种重要的文档类型，在从教育到商业的各种领域中都用作演示文稿的视觉组件。然而，对演示幻灯片的自动分析研究还不够充分，目前只考虑了已经数字化的幻灯片文档的预处理图像。我们的目标是介绍分析讲座期间拍摄的幻灯片的无约束照片的任务，并提出一个新的数据集，用于使用在野外(WiSe)捕获的幻灯片进行页面分割。我们的数据集涵盖了1300页上25个类的逐像素注释，允许重叠区域(即多类分配)。为了评估性能，我们为我们的数据集定义了多个基准度量和基线方法。我们进一步实现了以前用于分割自然图像的两种不同的深度神经网络方法，并将它们用于该任务。我们的评估结果证明了基于深度学习的方法的有效性，超过了基准方法30%以上。为了促进无约束照片中幻灯片分析的进一步研究，我们向社区公开了WiSe数据集。

{"title":"WiSe — Slide Segmentation in the Wild","authors":"Monica Haurilet, Alina Roitberg, Manuel Martínez, R. Stiefelhagen","doi":"10.1109/ICDAR.2019.00062","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00062","url":null,"abstract":"We address the task of segmenting presentation slides, where the examined page was captured as a live photo during lectures. Slides are important document types used as visual components accompanying presentations in a variety of fields ranging from education to business. However, automatic analysis of presentation slides has not been researched sufficiently, and, so far, only preprocessed images of already digitalized slide documents were considered. We aim to introduce the task of analyzing unconstrained photos of slides taken during lectures and present a novel dataset for Page Segmentation with slides captured in the Wild (WiSe). Our dataset covers pixel-wise annotations of 25 classes on 1300 pages, allowing overlapping regions (i.e., multi-class assignments). To evaluate the performance, we define multiple benchmark metrics and baseline methods for our dataset. We further implement two different deep neural network approaches previously used for segmenting natural images and adopt them for the task. Our evaluation results demonstrate the effectiveness of the deep learning-based methods, surpassing the baseline methods by over 30%. To foster further research of slide analysis in unconstrained photos, we make the WiSe dataset publicly available to the community.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122270756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 International Conference on Document Analysis and Recognition (ICDAR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀