首页 > 最新文献

2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献

英文 中文
ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard ICDAR 2019稳健阅读挑战赛:阅读标牌上的中文文字
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00253
Xi Liu, Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, Mingkun Yang, X. Bai, Baoguang Shi, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar
Chinese scene text reading is one of the most challenging problems in computer vision and has attracted great interest. Different from English text, Chinese has more than 6000 commonly used characters and Chinese characters can be arranged in various layouts with numerous fonts. The Chinese signboards in street view are a good choice for Chinese scene text images since they have different backgrounds, fonts and layouts. We organized a competition called ICDAR2019-ReCTS, which mainly focuses on reading Chinese text on signboard. This report presents the final results of the competition. A large-scale dataset of 25,000 annotated signboard images, in which all the text lines and characters are annotated with locations and transcriptions, were released. Four tasks, namely character recognition, text line recognition, text line detection and end-to-end recognition were set up. Besides, considering the Chinese text ambiguity issue, we proposed a multi ground truth (multi-GT) evaluation method to make evaluation fairer. The competition started on March 1, 2019 and ended on April 30, 2019. 262 submissions from 46 teams are received. Most of the participants come from universities, research institutes, and tech companies in China. There are also some participants from the United States, Australia, Singapore, and Korea. 21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4. The official website for the competition is http://rrc.cvc.uab.es/?ch=12.
中文场景文本读取是计算机视觉中最具挑战性的问题之一,引起了人们的广泛关注。与英文文字不同,中文有6000多个常用汉字,汉字可以排列成各种布局,字体繁多。街景中的中文招牌是一个很好的选择,因为它们有不同的背景,字体和布局。我们组织了一个名为ICDAR2019-ReCTS的比赛,主要是阅读广告牌上的中文文字。这份报告介绍了比赛的最终结果。发布了一个包含25000张标注广告牌图像的大规模数据集,其中所有的文本行和字符都标注了位置和转录。设置了字符识别、文本行识别、文本行检测和端到端识别四个任务。此外,考虑到中文文本歧义问题,我们提出了一种多基础真值(multi- gt)评价方法,使评价更加公平。比赛于2019年3月1日开始,2019年4月30日结束。共收到46支参赛队伍的262份参赛作品。大多数参与者来自中国的大学、研究机构和科技公司。还有一些来自美国、澳大利亚、新加坡和韩国的参与者。21个团队提交任务1的结果,23个团队提交任务2的结果,24个团队提交任务3的结果,13个团队提交任务4的结果。比赛的官方网站是http://rrc.cvc.uab.es/?ch=12。
{"title":"ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard","authors":"Xi Liu, Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, Mingkun Yang, X. Bai, Baoguang Shi, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar","doi":"10.1109/ICDAR.2019.00253","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00253","url":null,"abstract":"Chinese scene text reading is one of the most challenging problems in computer vision and has attracted great interest. Different from English text, Chinese has more than 6000 commonly used characters and Chinese characters can be arranged in various layouts with numerous fonts. The Chinese signboards in street view are a good choice for Chinese scene text images since they have different backgrounds, fonts and layouts. We organized a competition called ICDAR2019-ReCTS, which mainly focuses on reading Chinese text on signboard. This report presents the final results of the competition. A large-scale dataset of 25,000 annotated signboard images, in which all the text lines and characters are annotated with locations and transcriptions, were released. Four tasks, namely character recognition, text line recognition, text line detection and end-to-end recognition were set up. Besides, considering the Chinese text ambiguity issue, we proposed a multi ground truth (multi-GT) evaluation method to make evaluation fairer. The competition started on March 1, 2019 and ended on April 30, 2019. 262 submissions from 46 teams are received. Most of the participants come from universities, research institutes, and tech companies in China. There are also some participants from the United States, Australia, Singapore, and Korea. 21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4. The official website for the competition is http://rrc.cvc.uab.es/?ch=12.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114263880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
CRNN Based Jersey-Bib Number/Text Recognition in Sports and Marathon Images 基于CRNN的运动和马拉松图像中球衣号码布号码/文本识别
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00186
Sauradip Nag, Raghavendra Ramachandra, P. Shivakumara, U. Pal, Tong Lu, Mohan S. Kankanhalli
The primary challenge in tracing the participants in sports and marathon video or images is to detect and localize the jersey/Bib number that may present in different regions of their outfit captured in cluttered environment conditions. In this work, we proposed a new framework based on detecting the human body parts such that both Jersey Bib number and text is localized reliably. To achieve this, the proposed method first detects and localize the human in a given image using Single Shot Multibox Detector (SSD). In the next step, different human body parts namely, Torso, Left Thigh, Right Thigh, that generally contain a Bib number or text region is automatically extracted. These detected individual parts are processed individually to detect the Jersey Bib number/text using a deep CNN network based on the 2-channel architecture based on the novel adaptive weighting loss function. Finally, the detected text is cropped out and fed to a CNN-RNN based deep model abbreviated as CRNN for recognizing jersey/Bib/text. Extensive experiments are carried out on the four different datasets including both bench-marking dataset and a new dataset. The performance of the proposed method is compared with the state-of-the-art methods on all four datasets that indicates the improved performance of the proposed method on all four datasets.
追踪运动和马拉松视频或图像中的参与者的主要挑战是检测和定位在混乱的环境条件下可能出现在他们服装不同区域的球衣/号码布。在这项工作中,我们提出了一个基于检测人体部位的新框架,使球衣号码和文本都能可靠地定位。为此,该方法首先使用单镜头多盒检测器(Single Shot Multibox Detector, SSD)对给定图像中的人进行检测和定位。在下一步,不同的人体部位,即躯干,左大腿,右大腿,通常包含一个号码布或文本区域被自动提取。使用基于新型自适应加权损失函数的2通道架构的深度CNN网络,对这些检测到的单个部件进行单独处理,以检测Jersey Bib号码/文本。最后,将检测到的文本裁剪出来并馈送到基于CNN-RNN的深度模型(简称CRNN)中,用于识别球衣/Bib/文本。在四种不同的数据集上进行了广泛的实验,包括基准数据集和新数据集。将所提出的方法的性能与所有四个数据集上最先进的方法进行比较,表明所提出的方法在所有四个数据集上的性能都有所提高。
{"title":"CRNN Based Jersey-Bib Number/Text Recognition in Sports and Marathon Images","authors":"Sauradip Nag, Raghavendra Ramachandra, P. Shivakumara, U. Pal, Tong Lu, Mohan S. Kankanhalli","doi":"10.1109/ICDAR.2019.00186","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00186","url":null,"abstract":"The primary challenge in tracing the participants in sports and marathon video or images is to detect and localize the jersey/Bib number that may present in different regions of their outfit captured in cluttered environment conditions. In this work, we proposed a new framework based on detecting the human body parts such that both Jersey Bib number and text is localized reliably. To achieve this, the proposed method first detects and localize the human in a given image using Single Shot Multibox Detector (SSD). In the next step, different human body parts namely, Torso, Left Thigh, Right Thigh, that generally contain a Bib number or text region is automatically extracted. These detected individual parts are processed individually to detect the Jersey Bib number/text using a deep CNN network based on the 2-channel architecture based on the novel adaptive weighting loss function. Finally, the detected text is cropped out and fed to a CNN-RNN based deep model abbreviated as CRNN for recognizing jersey/Bib/text. Extensive experiments are carried out on the four different datasets including both bench-marking dataset and a new dataset. The performance of the proposed method is compared with the state-of-the-art methods on all four datasets that indicates the improved performance of the proposed method on all four datasets.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128737811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
HoughNet: Neural Network Architecture for Vanishing Points Detection HoughNet:用于消失点检测的神经网络架构
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00140
A. Sheshkus, A. Ingacheva, V. Arlazarov, D. Nikolaev
In this paper we introduce a novel neural network architecture based on Fast Hough Transform layer. The layer of this type allows our neural network to accumulate features from linear areas across the entire image instead of local areas. We demonstrate its potential by solving the problem of vanishing points detection in the images of documents. Such problem occurs when dealing with camera shots of the documents in uncontrolled conditions. In this case, the document image can suffer several specific distortions including projective transform. To train our model, we use MIDV-500 dataset and provide testing results. Strong generalization ability of the suggested method is proven with its applying to a completely different ICDAR 2011 dewarping contest. In previously published papers considering this dataset authors measured quality of vanishing point detection by counting correctly recognized words with open OCR engine Tesseract. To compare with them, we reproduce this experiment and show that our method outperforms the state-of-the-art result.
本文介绍了一种基于快速霍夫变换层的神经网络结构。这种类型的层允许我们的神经网络从整个图像的线性区域累积特征,而不是局部区域。我们通过解决文档图像中的消失点检测问题来展示其潜力。当在不受控制的条件下处理相机拍摄的文件时,就会出现这种问题。在这种情况下,文档图像可能遭受几种特定的失真,包括投影变换。为了训练我们的模型,我们使用MIDV-500数据集并提供测试结果。将该方法应用于一个完全不同的2011年ICDAR脱模比赛,证明了该方法具有较强的泛化能力。在先前发表的考虑该数据集的论文中,作者通过使用开放OCR引擎Tesseract计算正确识别的单词来测量消失点检测的质量。为了与他们进行比较,我们重现了这个实验,并表明我们的方法优于最先进的结果。
{"title":"HoughNet: Neural Network Architecture for Vanishing Points Detection","authors":"A. Sheshkus, A. Ingacheva, V. Arlazarov, D. Nikolaev","doi":"10.1109/ICDAR.2019.00140","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00140","url":null,"abstract":"In this paper we introduce a novel neural network architecture based on Fast Hough Transform layer. The layer of this type allows our neural network to accumulate features from linear areas across the entire image instead of local areas. We demonstrate its potential by solving the problem of vanishing points detection in the images of documents. Such problem occurs when dealing with camera shots of the documents in uncontrolled conditions. In this case, the document image can suffer several specific distortions including projective transform. To train our model, we use MIDV-500 dataset and provide testing results. Strong generalization ability of the suggested method is proven with its applying to a completely different ICDAR 2011 dewarping contest. In previously published papers considering this dataset authors measured quality of vanishing point detection by counting correctly recognized words with open OCR engine Tesseract. To compare with them, we reproduce this experiment and show that our method outperforms the state-of-the-art result.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129156953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Towards Automated Evaluation of Handwritten Assessments 迈向手写评估的自动化评估
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00075
Vijay Rowtula, S. Oota, C. V. Jawahar
Automated evaluation of handwritten answers has been a challenging problem for scaling the education system for many years. Speeding up the evaluation remains as the major bottleneck for enhancing the throughput of instructors. This paper describes an effective method for automatically evaluating the short descriptive handwritten answers from the digitized images. Our goal is to evaluate a student's handwritten answer by assigning an evaluation score that is comparable to the human-assigned scores. Existing works in this domain mainly focused on evaluating handwritten essays with handcrafted, non-semantic features. Our contribution is two-fold: 1) we model this problem as a self-supervised, feature-based classification problem, which can fine-tune itself for each question without any explicit supervision. 2) We introduce the usage of semantic analysis for auto-evaluation in handwritten text space using the combination of Information Retrieval and Extraction (IRE) and, Natural Language Processing (NLP) methods to derive a set of useful features. We tested our method on three datasets created from various domains, using the help of students of different age groups. Experiments show that our method performs comparably to that of human evaluators.
多年来,手写答案的自动评估一直是扩展教育系统的一个具有挑战性的问题。加快评估速度仍然是提高教师吞吐量的主要瓶颈。本文描述了一种从数字化图像中自动评价简短描述性手写答案的有效方法。我们的目标是通过分配与人工分配分数相当的评估分数来评估学生的手写答案。该领域的现有工作主要集中在评估具有手工制作,非语义特征的手写文章。我们的贡献有两个方面:1)我们将这个问题建模为一个自我监督的、基于特征的分类问题,它可以在没有任何明确监督的情况下对每个问题进行自我微调。2)结合信息检索与提取(IRE)和自然语言处理(NLP)方法,介绍了语义分析在手写文本空间中自动评价的用法,以获得一组有用的特征。在不同年龄段学生的帮助下,我们在三个不同领域创建的数据集上测试了我们的方法。实验表明,我们的方法与人工评估器的性能相当。
{"title":"Towards Automated Evaluation of Handwritten Assessments","authors":"Vijay Rowtula, S. Oota, C. V. Jawahar","doi":"10.1109/ICDAR.2019.00075","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00075","url":null,"abstract":"Automated evaluation of handwritten answers has been a challenging problem for scaling the education system for many years. Speeding up the evaluation remains as the major bottleneck for enhancing the throughput of instructors. This paper describes an effective method for automatically evaluating the short descriptive handwritten answers from the digitized images. Our goal is to evaluate a student's handwritten answer by assigning an evaluation score that is comparable to the human-assigned scores. Existing works in this domain mainly focused on evaluating handwritten essays with handcrafted, non-semantic features. Our contribution is two-fold: 1) we model this problem as a self-supervised, feature-based classification problem, which can fine-tune itself for each question without any explicit supervision. 2) We introduce the usage of semantic analysis for auto-evaluation in handwritten text space using the combination of Information Retrieval and Extraction (IRE) and, Natural Language Processing (NLP) methods to derive a set of useful features. We tested our method on three datasets created from various domains, using the help of students of different age groups. Experiments show that our method performs comparably to that of human evaluators.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121525417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Recurrent Neural Network Approach for Table Field Extraction in Business Documents 递归神经网络在商务文档表字段提取中的应用
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00211
Clément Sage, A. Aussem, H. Elghazel, V. Eglin, Jérémy Espinas
Efficiently extracting information from documents issued by their partners is crucial for companies that face huge daily document flows. Particularly, tables contain most valuable information of business documents. However, their contents are challenging to automatically parse as tables from industrial contexts may have complex and ambiguous physical structure. Bypassing their structure recognition, we propose a generic method for end-to-end table field extraction that starts with the sequence of document tokens segmented by an OCR engine and directly tags each token with one of the possible field types. Similar to the state-of-the-art methods for non-tabular field extraction, our approach resorts to a token level recurrent neural network combining spatial and textual features. We empirically assess the effectiveness of recurrent connections for our task by comparing our method with a baseline feedforward network having local context knowledge added to its inputs. We train and evaluate both approaches on a dataset of 28,570 purchase orders to retrieve the ID numbers and quantities of the ordered products. Our method outperforms the baseline with micro F1 score on unknown document layouts of 0.821 compared to 0.764.
有效地从合作伙伴发布的文档中提取信息对于每天面临大量文档流的公司至关重要。特别是,表包含业务文档中最有价值的信息。然而,它们的内容很难自动解析,因为来自工业上下文的表可能具有复杂和模糊的物理结构。绕过它们的结构识别,我们提出了一种端到端表字段提取的通用方法,该方法从由OCR引擎分割的文档标记序列开始,并直接用一种可能的字段类型标记每个标记。与最先进的非表格字段提取方法类似,我们的方法采用结合空间和文本特征的令牌级递归神经网络。我们通过将我们的方法与将本地上下文知识添加到其输入的基线前馈网络进行比较,以经验评估循环连接对我们任务的有效性。我们在28,570个采购订单的数据集上训练和评估这两种方法,以检索订购产品的ID号和数量。我们的方法在未知文档布局上的微F1得分为0.821,优于基线的0.764。
{"title":"Recurrent Neural Network Approach for Table Field Extraction in Business Documents","authors":"Clément Sage, A. Aussem, H. Elghazel, V. Eglin, Jérémy Espinas","doi":"10.1109/ICDAR.2019.00211","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00211","url":null,"abstract":"Efficiently extracting information from documents issued by their partners is crucial for companies that face huge daily document flows. Particularly, tables contain most valuable information of business documents. However, their contents are challenging to automatically parse as tables from industrial contexts may have complex and ambiguous physical structure. Bypassing their structure recognition, we propose a generic method for end-to-end table field extraction that starts with the sequence of document tokens segmented by an OCR engine and directly tags each token with one of the possible field types. Similar to the state-of-the-art methods for non-tabular field extraction, our approach resorts to a token level recurrent neural network combining spatial and textual features. We empirically assess the effectiveness of recurrent connections for our task by comparing our method with a baseline feedforward network having local context knowledge added to its inputs. We train and evaluate both approaches on a dataset of 28,570 purchase orders to retrieve the ID numbers and quantities of the ordered products. Our method outperforms the baseline with micro F1 score on unknown document layouts of 0.821 compared to 0.764.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126374804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Indiscapes: Instance Segmentation Networks for Layout Parsing of Historical Indic Manuscripts 缺陷:用于历史印度手稿布局解析的实例分割网络
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00164
A. Prusty, Sowmya Aitha, Abhishek Trivedi, Ravi Kiran Sarvadevabhatla
Historical palm-leaf manuscript and early paper documents from Indian subcontinent form an important part of the world's literary and cultural heritage. Despite their importance, large-scale annotated Indic manuscript image datasets do not exist. To address this deficiency, we introduce Indiscapes, the first ever dataset with multi-regional layout annotations for historical Indic manuscripts. To address the challenge of large diversity in scripts and presence of dense, irregular layout elements (e.g. text lines, pictures, multiple documents per image), we adapt a Fully Convolutional Deep Neural Network architecture for fully automatic, instance-level spatial layout parsing of manuscript images. We demonstrate the effectiveness of proposed architecture on images from the Indiscapes dataset. For annotation flexibility and keeping the non-technical nature of domain experts in mind, we also contribute a custom, web-based GUI annotation tool and a dashboard-style analytics portal. Overall, our contributions set the stage for enabling downstream applications such as OCR and word-spotting in historical Indic manuscripts at scale.
来自印度次大陆的历史棕榈叶手稿和早期纸质文献是世界文学和文化遗产的重要组成部分。尽管它们很重要,但大规模的注释印度手稿图像数据集并不存在。为了解决这一不足,我们引入了Indiscapes,这是有史以来第一个为历史印度手稿提供多区域布局注释的数据集。为了解决脚本的巨大多样性和密集、不规则布局元素(例如文本行、图片、每张图像的多个文档)的存在的挑战,我们采用了全卷积深度神经网络架构,用于手稿图像的全自动、实例级空间布局解析。我们在来自Indiscapes数据集的图像上验证了所提出的架构的有效性。为了注释的灵活性和保持领域专家的非技术性质,我们还提供了一个定制的、基于web的GUI注释工具和一个仪表板风格的分析门户。总的来说,我们的贡献为下游应用的实现奠定了基础,比如在历史印度手稿中大规模地使用OCR和单词识别。
{"title":"Indiscapes: Instance Segmentation Networks for Layout Parsing of Historical Indic Manuscripts","authors":"A. Prusty, Sowmya Aitha, Abhishek Trivedi, Ravi Kiran Sarvadevabhatla","doi":"10.1109/ICDAR.2019.00164","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00164","url":null,"abstract":"Historical palm-leaf manuscript and early paper documents from Indian subcontinent form an important part of the world's literary and cultural heritage. Despite their importance, large-scale annotated Indic manuscript image datasets do not exist. To address this deficiency, we introduce Indiscapes, the first ever dataset with multi-regional layout annotations for historical Indic manuscripts. To address the challenge of large diversity in scripts and presence of dense, irregular layout elements (e.g. text lines, pictures, multiple documents per image), we adapt a Fully Convolutional Deep Neural Network architecture for fully automatic, instance-level spatial layout parsing of manuscript images. We demonstrate the effectiveness of proposed architecture on images from the Indiscapes dataset. For annotation flexibility and keeping the non-technical nature of domain experts in mind, we also contribute a custom, web-based GUI annotation tool and a dashboard-style analytics portal. Overall, our contributions set the stage for enabling downstream applications such as OCR and word-spotting in historical Indic manuscripts at scale.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121950923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Training-Free and Segmentation-Free Word Spotting using Feature Matching and Query Expansion 使用特征匹配和查询扩展的无训练和无分词点词
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00209
Ekta Vats, A. Hast, A. Fornés
Historical handwritten text recognition is an interesting yet challenging problem. In recent times, deep learning based methods have achieved significant performance in handwritten text recognition. However, handwriting recognition using deep learning needs training data, and often, text must be previously segmented into lines (or even words). These limitations constrain the application of HTR techniques in document collections, because training data or segmented words are not always available. Therefore, this paper proposes a training-free and segmentation-free word spotting approach that can be applied in unconstrained scenarios. The proposed word spotting framework is based on document query word expansion and relaxed feature matching algorithm, which can easily be parallelised. Since handwritten words posses distinct shape and characteristics, this work uses a combination of different keypoint detectors and Fourier-based descriptors to obtain a sufficient degree of relaxed matching. The effectiveness of the proposed method is empirically evaluated on well-known benchmark datasets using standard evaluation measures. The use of informative features along with query expansion significantly contributed in efficient performance of the proposed method.
历史手写体文本识别是一个既有趣又具有挑战性的问题。近年来,基于深度学习的方法在手写文本识别方面取得了显著的成绩。然而,使用深度学习的手写识别需要训练数据,并且通常,文本必须预先分割成行(甚至单词)。这些限制限制了HTR技术在文档集合中的应用,因为训练数据或分段词并不总是可用的。因此,本文提出了一种可以应用于无约束场景的无训练和无分词的词识别方法。该框架基于文档查询词扩展和松弛特征匹配算法,易于并行化。由于手写文字具有独特的形状和特征,本工作使用不同关键点检测器和基于傅立叶的描述符的组合来获得足够程度的轻松匹配。在已知的基准数据集上,采用标准的评价指标对该方法的有效性进行了实证评价。信息特征的使用以及查询扩展显著地提高了所提方法的性能。
{"title":"Training-Free and Segmentation-Free Word Spotting using Feature Matching and Query Expansion","authors":"Ekta Vats, A. Hast, A. Fornés","doi":"10.1109/ICDAR.2019.00209","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00209","url":null,"abstract":"Historical handwritten text recognition is an interesting yet challenging problem. In recent times, deep learning based methods have achieved significant performance in handwritten text recognition. However, handwriting recognition using deep learning needs training data, and often, text must be previously segmented into lines (or even words). These limitations constrain the application of HTR techniques in document collections, because training data or segmented words are not always available. Therefore, this paper proposes a training-free and segmentation-free word spotting approach that can be applied in unconstrained scenarios. The proposed word spotting framework is based on document query word expansion and relaxed feature matching algorithm, which can easily be parallelised. Since handwritten words posses distinct shape and characteristics, this work uses a combination of different keypoint detectors and Fourier-based descriptors to obtain a sufficient degree of relaxed matching. The effectiveness of the proposed method is empirically evaluated on well-known benchmark datasets using standard evaluation measures. The use of informative features along with query expansion significantly contributed in efficient performance of the proposed method.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"475 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131990135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
CNN Based Binarization of MultiSpectral Document Images 基于CNN的多光谱文档图像二值化
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00091
Fabian Hollaus, Simon Brenner, Robert Sablatnig
This work is concerned with the binarization of ancient manuscripts that have been imaged with a MultiSpectral Imaging (MSI) system. We introduce a new dataset for this purpose that is composed of 130 multispectral images taken from two medieval manuscripts. We propose to apply an end-to-end Convolutional Neural Network (CNN) for the segmentation of the historical writings. The performance of the CNN based method is superior compared to two state-of-the-art methods that are especially designed for multispectral document images. The CNN based method is also evaluated on a previous and smaller database, where its performance is slightly worse than the two state-of-the-art techniques.
这项工作是关于二值化的古代手稿,已成像与多光谱成像(MSI)系统。为此,我们引入了一个新的数据集,该数据集由来自两份中世纪手稿的130张多光谱图像组成。我们建议应用端到端卷积神经网络(CNN)对历史著作进行分割。与专门为多光谱文档图像设计的两种最先进的方法相比,基于CNN的方法的性能优越。基于CNN的方法也在先前较小的数据库上进行了评估,其性能略差于两种最先进的技术。
{"title":"CNN Based Binarization of MultiSpectral Document Images","authors":"Fabian Hollaus, Simon Brenner, Robert Sablatnig","doi":"10.1109/ICDAR.2019.00091","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00091","url":null,"abstract":"This work is concerned with the binarization of ancient manuscripts that have been imaged with a MultiSpectral Imaging (MSI) system. We introduce a new dataset for this purpose that is composed of 130 multispectral images taken from two medieval manuscripts. We propose to apply an end-to-end Convolutional Neural Network (CNN) for the segmentation of the historical writings. The performance of the CNN based method is superior compared to two state-of-the-art methods that are especially designed for multispectral document images. The CNN based method is also evaluated on a previous and smaller database, where its performance is slightly worse than the two state-of-the-art techniques.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130440192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Exploration of CNN Features for Online Handwriting Recognition 在线手写识别CNN特征的探索
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00138
S. Mandal, S. Prasanna, S. Sundaram
Recently, convolution neural network (CNN) has demonstrated its powerful ability in learning features particularly from image data. In this work, its capability of feature learning in online handwriting is explored, by constructing various CNN architectures. The developed CNNs can process online handwriting directly unlike the existing works that convert the online handwriting to an image to utilize the architecture. The first convolution layer accepts the sequence of (x; y) coordinates along the trace of the character as an input and outputs a convolved filtered signal. Thereafter, via alternating steps of convolution and Rectified Linear Unit layers, in a hierarchical fashion, we obtain a set of deep features that can be employed for classification. We utilize the proposed CNN features to develop a Support Vector Machine (SVM)-based character recognition system and an implicit-segmentation based large vocabulary word recognition system employing hidden Markov model (HMM) framework. To the best of our knowledge, this is the first work of its kind that applies CNN directly on the (x; y) coordinates of the online handwriting data. Experiments are carried out on two publicly available English online handwritten database: UNIPEN character and UNIPEN ICROW-03 word databases. The obtained results are promising over the reported works employing the point-based features.
近年来,卷积神经网络(CNN)在学习特征,尤其是从图像数据中学习特征方面已经展示出了强大的能力。在这项工作中,通过构建各种CNN架构来探索其在在线手写中的特征学习能力。开发的cnn可以直接处理在线笔迹,而不像现有的作品那样将在线笔迹转换为图像来利用该架构。第一个卷积层接受序列(x;Y)坐标沿着字符的轨迹作为输入和输出一个卷积滤波信号。然后,通过卷积和校正线性单元层的交替步骤,以分层的方式,我们获得了一组可用于分类的深度特征。我们利用提出的CNN特征开发了基于支持向量机(SVM)的字符识别系统和基于隐式分割的基于隐式马尔可夫模型(HMM)框架的大词汇词识别系统。据我们所知,这是同类作品中第一次将CNN直接应用于(x;Y)在线手写数据的坐标。在两个公开的英文在线手写数据库:UNIPEN字符数据库和UNIPEN ICROW-03单词数据库上进行了实验。所得结果与已有报道的基于点的特征相比是有希望的。
{"title":"Exploration of CNN Features for Online Handwriting Recognition","authors":"S. Mandal, S. Prasanna, S. Sundaram","doi":"10.1109/ICDAR.2019.00138","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00138","url":null,"abstract":"Recently, convolution neural network (CNN) has demonstrated its powerful ability in learning features particularly from image data. In this work, its capability of feature learning in online handwriting is explored, by constructing various CNN architectures. The developed CNNs can process online handwriting directly unlike the existing works that convert the online handwriting to an image to utilize the architecture. The first convolution layer accepts the sequence of (x; y) coordinates along the trace of the character as an input and outputs a convolved filtered signal. Thereafter, via alternating steps of convolution and Rectified Linear Unit layers, in a hierarchical fashion, we obtain a set of deep features that can be employed for classification. We utilize the proposed CNN features to develop a Support Vector Machine (SVM)-based character recognition system and an implicit-segmentation based large vocabulary word recognition system employing hidden Markov model (HMM) framework. To the best of our knowledge, this is the first work of its kind that applies CNN directly on the (x; y) coordinates of the online handwriting data. Experiments are carried out on two publicly available English online handwritten database: UNIPEN character and UNIPEN ICROW-03 word databases. The obtained results are promising over the reported works employing the point-based features.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134154443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Fast Distributional Smoothing for Regularization in CTC Applied to Text Recognition 基于快速分布平滑的CTC正则化算法在文本识别中的应用
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00056
Ryohei Tanaka, Soichiro Ono, Akio Furuhata
Many recent text recognition studies achieved successful performance by applying a sequential-label prediction framework such as connectionist temporal classification. Meanwhile, regularization is known to be essential to avoid overfitting when training deep neural networks. Regularization techniques that allow for semi-supervised learning have a greater impact than those that do not. Among widely researched single-label regularization techniques, virtual adversarial training (VAT) performs successfully by smoothing posterior distributions around training data points. However, VAT is almost solely applied to single-label prediction tasks, not to sequential-label prediction tasks. This is because the number of possible candidates in the label sequence exponentially increases with the sequence length, making it impractical to calculate posterior distributions and the divergence between them. Investigating this problem, we have found that there is an easily computable upper bound for divergence. Here, we propose fast distributional smoothing (FDS) as a method for drastically reducing computational costs by minimizing this upper bound. FDS allows regularization at practical computational costs in both supervised and semi-supervised learning. An experiment under simple settings confirmed that upper-bound minimization decreases divergence. Experiments also show that FDS improves scene text recognition performance and enhances state-of-the-art regularization performance. Furthermore, experiments show that FDS enables efficient semi-supervised learning in sequential-label prediction tasks and that it outperforms a conventional semi-supervised method.
最近的许多文本识别研究通过应用序列标签预测框架(如连接主义时态分类)取得了成功的性能。同时,在训练深度神经网络时,正则化是避免过拟合的必要条件。允许半监督学习的正则化技术比那些不允许的有更大的影响。在广泛研究的单标签正则化技术中,虚拟对抗训练(VAT)通过平滑训练数据点周围的后验分布而获得成功。然而,增值税几乎只适用于单标签预测任务,而不适用于顺序标签预测任务。这是因为标签序列中可能的候选数随着序列长度呈指数增长,使得计算后验分布和它们之间的散度变得不切实际。研究这个问题,我们发现散度有一个容易计算的上界。在这里,我们提出快速分布平滑(FDS)作为一种通过最小化上界来大幅减少计算成本的方法。FDS允许在监督和半监督学习中以实际的计算成本进行正则化。简单设置下的实验证实,上界极小化降低了散度。实验还表明,FDS提高了场景文本识别性能,提高了最先进的正则化性能。此外,实验表明,FDS在序列标签预测任务中实现了高效的半监督学习,并且优于传统的半监督学习方法。
{"title":"Fast Distributional Smoothing for Regularization in CTC Applied to Text Recognition","authors":"Ryohei Tanaka, Soichiro Ono, Akio Furuhata","doi":"10.1109/ICDAR.2019.00056","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00056","url":null,"abstract":"Many recent text recognition studies achieved successful performance by applying a sequential-label prediction framework such as connectionist temporal classification. Meanwhile, regularization is known to be essential to avoid overfitting when training deep neural networks. Regularization techniques that allow for semi-supervised learning have a greater impact than those that do not. Among widely researched single-label regularization techniques, virtual adversarial training (VAT) performs successfully by smoothing posterior distributions around training data points. However, VAT is almost solely applied to single-label prediction tasks, not to sequential-label prediction tasks. This is because the number of possible candidates in the label sequence exponentially increases with the sequence length, making it impractical to calculate posterior distributions and the divergence between them. Investigating this problem, we have found that there is an easily computable upper bound for divergence. Here, we propose fast distributional smoothing (FDS) as a method for drastically reducing computational costs by minimizing this upper bound. FDS allows regularization at practical computational costs in both supervised and semi-supervised learning. An experiment under simple settings confirmed that upper-bound minimization decreases divergence. Experiments also show that FDS improves scene text recognition performance and enhances state-of-the-art regularization performance. Furthermore, experiments show that FDS enables efficient semi-supervised learning in sequential-label prediction tasks and that it outperforms a conventional semi-supervised method.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134511050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2019 International Conference on Document Analysis and Recognition (ICDAR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1