首页 > 最新文献

2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献

英文 中文
Identifying the Central Figure of a Scientific Paper 识别科学论文的中心图形
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00173
Sean T. Yang, Po-Shen Lee, L. Kazakova, Abhishek Joshi, B. M. Oh, Jevin D. West, B. Howe
Publishers are increasingly using graphical abstracts to facilitate scientific search, especially across disciplinary boundaries. They are presented on various media, easily shared and information rich. However, very small amount of scientific publications are equipped with graphical abstracts. What can we do with the vast majority of papers with no selected graphical abstract? In this paper, we first hypothesize that scientific papers actually include a "central figure" that serve as a graphical abstract. These figures convey the key results and provide a visual identity for the paper. Using survey data collected from 6,263 authors regarding 8,353 papers over 15 years, we find that over 87% of papers are considered to contain a central figure, and that these central figures are primarily used to summarize important results, explain the key methods, or provide additional discussion. We then train a model to automatically recognize the central figure, achieving top-3 accuracy of 78% and exact match accuracy of 34%. We find that the primary boost in accuracy comes from figure captions that resemble the abstract. We make all our data and results publicly available at https://github.com/viziometrics/centraul_figure. Our goal is to automate central figure identification to improve search engine performance and to help scientists connect ideas across the literature.
出版商越来越多地使用图形摘要来促进科学搜索,特别是跨学科的搜索。它们呈现在各种媒体上,易于共享且信息丰富。然而,只有极少数的科学出版物配有图形摘要。对于绝大多数没有精选图形摘要的论文,我们该怎么办?在本文中,我们首先假设科学论文实际上包括一个作为图形抽象的“中心人物”。这些数字传达了关键的结果,并为论文提供了视觉识别。通过对15年来8653篇论文的6263位作者的调查数据,我们发现超过87%的论文被认为包含中心人物,这些中心人物主要用于总结重要结果,解释关键方法,或提供额外的讨论。然后,我们训练了一个模型来自动识别中心图形,达到了78%的前3名准确率和34%的精确匹配准确率。我们发现,准确度的主要提升来自于类似于摘要的图片标题。我们在https://github.com/viziometrics/centraul_figure上公开了所有的数据和结果。我们的目标是自动化中心图形识别,以提高搜索引擎的性能,并帮助科学家在文献中连接思想。
{"title":"Identifying the Central Figure of a Scientific Paper","authors":"Sean T. Yang, Po-Shen Lee, L. Kazakova, Abhishek Joshi, B. M. Oh, Jevin D. West, B. Howe","doi":"10.1109/ICDAR.2019.00173","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00173","url":null,"abstract":"Publishers are increasingly using graphical abstracts to facilitate scientific search, especially across disciplinary boundaries. They are presented on various media, easily shared and information rich. However, very small amount of scientific publications are equipped with graphical abstracts. What can we do with the vast majority of papers with no selected graphical abstract? In this paper, we first hypothesize that scientific papers actually include a \"central figure\" that serve as a graphical abstract. These figures convey the key results and provide a visual identity for the paper. Using survey data collected from 6,263 authors regarding 8,353 papers over 15 years, we find that over 87% of papers are considered to contain a central figure, and that these central figures are primarily used to summarize important results, explain the key methods, or provide additional discussion. We then train a model to automatically recognize the central figure, achieving top-3 accuracy of 78% and exact match accuracy of 34%. We find that the primary boost in accuracy comes from figure captions that resemble the abstract. We make all our data and results publicly available at https://github.com/viziometrics/centraul_figure. Our goal is to automate central figure identification to improve search engine performance and to help scientists connect ideas across the literature.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125237689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Bigram Label Regularization to Reduce Over-Segmentation on Inline Math Expression Detection 双图标签正则化减少内联数学表达式检测的过度分割
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00069
Xing Wang, Zelun Wang, Jyh-Charn S. Liu
Inline Mathematical Expression refers to Math Expression (ME) that is blended into plaintext sentences in scientific papers. Detecting inline MEs is a non-trivial problem due to the unrestricted usage of font styles and blurred boundaries with plaintext in scientific publications. For instance, many inline MEs detected by existing algorithms are split into multiple parts incorrectly due to the misidentification of a few characters. In this paper, we propose a bigram regularization model to resolve the split problem in inline ME detection. The model incorporates neighboring constraints during labeling of ME vs. plaintext. Experimental results show that this technique significantly reduces the splits of inline MEs, with small gains in the false and miss rate. In comparison with a CRF model, our model achieves a higher F1 score and a lower miss rate.
内联数学表达式(Inline Mathematical Expression,简称ME)是指将数学表达式混合到科技论文的明文语句中。由于科学出版物中字体样式的无限制使用和明文的模糊边界,检测内联MEs是一个非常重要的问题。例如,现有算法检测到的许多内联MEs由于对几个字符的错误识别而被错误地分成多个部分。本文提出了一种双图正则化模型来解决内联ME检测中的分割问题。该模型在标记ME与明文时结合了相邻约束。实验结果表明,该技术显著降低了内联MEs的分裂,在误报率和漏报率上有较小的提高。与CRF模型相比,我们的模型获得了更高的F1分数和更低的缺失率。
{"title":"Bigram Label Regularization to Reduce Over-Segmentation on Inline Math Expression Detection","authors":"Xing Wang, Zelun Wang, Jyh-Charn S. Liu","doi":"10.1109/ICDAR.2019.00069","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00069","url":null,"abstract":"Inline Mathematical Expression refers to Math Expression (ME) that is blended into plaintext sentences in scientific papers. Detecting inline MEs is a non-trivial problem due to the unrestricted usage of font styles and blurred boundaries with plaintext in scientific publications. For instance, many inline MEs detected by existing algorithms are split into multiple parts incorrectly due to the misidentification of a few characters. In this paper, we propose a bigram regularization model to resolve the split problem in inline ME detection. The model incorporates neighboring constraints during labeling of ME vs. plaintext. Experimental results show that this technique significantly reduces the splits of inline MEs, with small gains in the false and miss rate. In comparison with a CRF model, our model achieves a higher F1 score and a lower miss rate.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116629209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Table Detection in Invoice Documents by Graph Neural Networks 基于图神经网络的发票文件表检测
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00028
Pau Riba, Anjan Dutta, Lutz Goldmann, A. Fornés, O. R. Terrades, J. Lladós
Tabular structures in documents offer a complementary dimension to the raw textual data, representing logical or quantitative relationships among pieces of information. In digital mail room applications, where a large amount of administrative documents must be processed with reasonable accuracy, the detection and interpretation of tables is crucial. Table recognition has gained interest in document image analysis, in particular in unconstrained formats (absence of rule lines, unknown information of rows and columns). In this work, we propose a graph-based approach for detecting tables in document images. Instead of using the raw content (recognized text), we make use of the location, context and content type, thus it is purely a structure perception approach, not dependent on the language and the quality of the text reading. Our framework makes use of Graph Neural Networks (GNNs) in order to describe the local repetitive structural information of tables in invoice documents. Our proposed model has been experimentally validated in two invoice datasets and achieved encouraging results. Additionally, due to the scarcity of benchmark datasets for this task, we have contributed to the community a novel dataset derived from the RVL-CDIP invoice data. It will be publicly released to facilitate future research.
文档中的表格结构为原始文本数据提供了一个补充维度,表示信息片段之间的逻辑或定量关系。在数字收发室应用中,必须以合理的准确性处理大量的行政文件,表的检测和解释是至关重要的。表识别在文档图像分析中引起了人们的兴趣,特别是在不受约束的格式中(没有规则行、行和列的未知信息)。在这项工作中,我们提出了一种基于图形的方法来检测文档图像中的表。我们不使用原始内容(识别文本),而是利用位置、上下文和内容类型,因此它是一种纯粹的结构感知方法,不依赖于语言和文本阅读的质量。我们的框架利用图神经网络(gnn)来描述发票文档中表格的局部重复结构信息。我们提出的模型已经在两个发票数据集上进行了实验验证,并取得了令人鼓舞的结果。此外,由于这项任务缺乏基准数据集,我们为社区贡献了一个来自RVL-CDIP发票数据的新数据集。它将被公开发布,以促进未来的研究。
{"title":"Table Detection in Invoice Documents by Graph Neural Networks","authors":"Pau Riba, Anjan Dutta, Lutz Goldmann, A. Fornés, O. R. Terrades, J. Lladós","doi":"10.1109/ICDAR.2019.00028","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00028","url":null,"abstract":"Tabular structures in documents offer a complementary dimension to the raw textual data, representing logical or quantitative relationships among pieces of information. In digital mail room applications, where a large amount of administrative documents must be processed with reasonable accuracy, the detection and interpretation of tables is crucial. Table recognition has gained interest in document image analysis, in particular in unconstrained formats (absence of rule lines, unknown information of rows and columns). In this work, we propose a graph-based approach for detecting tables in document images. Instead of using the raw content (recognized text), we make use of the location, context and content type, thus it is purely a structure perception approach, not dependent on the language and the quality of the text reading. Our framework makes use of Graph Neural Networks (GNNs) in order to describe the local repetitive structural information of tables in invoice documents. Our proposed model has been experimentally validated in two invoice datasets and achieved encouraging results. Additionally, due to the scarcity of benchmark datasets for this task, we have contributed to the community a novel dataset derived from the RVL-CDIP invoice data. It will be publicly released to facilitate future research.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117137500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
CNN-BLSTM-CRF Network for Semantic Labeling of Students' Online Handwritten Assignments CNN-BLSTM-CRF网络对学生在线手写作业的语义标注
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00169
Amirali Darvishzadeh, T. Stahovich, Amir H. Feghahati, Negin Entezari, Shaghayegh Gharghabi, Reed Kanemaru, C. Shelton
Automatic semantic labeling of strokes in online handwritten documents is a crucial task for many applications such as diagram interpretation, text recognition, and search. We formulate this task as a stroke classification problem in which each stroke is classified as a cross-out, free body diagram, or text. Separating free body diagram and text in this work is different than the traditional text/non-text separation problem because these two classes contain both text and graphics. The text class includes textual notes, mathematical symbols/equations, and graphics such as arrows that connect other elements. The free body diagram class also contains graphics and various alphanumeric characters and symbols that mark or explain the graphical objects. In this work, we present a novel deep neural network model for classification of strokes in online handwritten documents. There are two input sequences to the network. The first sequence contains the trajectories of the pen strokes while the second contains features of the strokes. Each of these sequences is fed to its own CNN-BLSTM channel to extract features and encode relationships between nearby strokes. The output of the two channels is concatenated and used as the input to a CRF layer that predicts the best sequence of labels for given input sequences. We evaluated our model on a dataset of 1,060 pages written by 132 students in an undergraduate statics course. Our model achieved an overall classification accuracy of 94.70% on this dataset.
在线手写文档笔画的自动语义标注是许多应用程序(如图表解释、文本识别和搜索)的关键任务。我们将此任务表述为笔画分类问题,其中每个笔画被分类为划线、自由体图或文本。在这项工作中,分离自由体图和文本与传统的文本/非文本分离问题不同,因为这两个类既包含文本又包含图形。text类包括文本注释、数学符号/方程和图形(如连接其他元素的箭头)。自由体图类还包含图形和各种字母数字字符以及标记或解释图形对象的符号。在这项工作中,我们提出了一种新的深度神经网络模型,用于在线手写文档的笔画分类。网络有两个输入序列。第一个序列包含笔画的轨迹,而第二个序列包含笔画的特征。每个序列都被送入自己的CNN-BLSTM通道,以提取特征并编码附近笔画之间的关系。两个通道的输出被连接起来并用作CRF层的输入,该层预测给定输入序列的最佳标签序列。我们在一个1060页的数据集上评估了我们的模型,该数据集由132名本科生在统计学课程中编写。我们的模型在该数据集上实现了94.70%的总体分类准确率。
{"title":"CNN-BLSTM-CRF Network for Semantic Labeling of Students' Online Handwritten Assignments","authors":"Amirali Darvishzadeh, T. Stahovich, Amir H. Feghahati, Negin Entezari, Shaghayegh Gharghabi, Reed Kanemaru, C. Shelton","doi":"10.1109/ICDAR.2019.00169","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00169","url":null,"abstract":"Automatic semantic labeling of strokes in online handwritten documents is a crucial task for many applications such as diagram interpretation, text recognition, and search. We formulate this task as a stroke classification problem in which each stroke is classified as a cross-out, free body diagram, or text. Separating free body diagram and text in this work is different than the traditional text/non-text separation problem because these two classes contain both text and graphics. The text class includes textual notes, mathematical symbols/equations, and graphics such as arrows that connect other elements. The free body diagram class also contains graphics and various alphanumeric characters and symbols that mark or explain the graphical objects. In this work, we present a novel deep neural network model for classification of strokes in online handwritten documents. There are two input sequences to the network. The first sequence contains the trajectories of the pen strokes while the second contains features of the strokes. Each of these sequences is fed to its own CNN-BLSTM channel to extract features and encode relationships between nearby strokes. The output of the two channels is concatenated and used as the input to a CRF layer that predicts the best sequence of labels for given input sequences. We evaluated our model on a dataset of 1,060 pages written by 132 students in an undergraduate statics course. Our model achieved an overall classification accuracy of 94.70% on this dataset.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127251153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Breaking the Code on Broken Tablets: The Learning Challenge for Annotated Cuneiform Script in Normalized 2D and 3D Datasets 破解破碎的平板电脑上的代码:标准化2D和3D数据集中注释楔形文字的学习挑战
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00032
H. Mara, B. Bogacz
The number of known cuneiform tablets is assumed to be in the hundreds of thousands. The Hilprecht Archive Online contains 1977 high-resolution 3D scans of tablets. The online cuneiform database CDLI catalogs metadata for more than 100.000 tablets. While both are accessible publicly, large-scale machine learning and pattern recognition on cuneiform tablets remain elusive. The data is only accessible by searching web pages, the tablet identifiers between collections are inconsistent, and the 3D data is unprepared and challenging for automated processing. We pave the way for large-scale analyses of cuneiform tablets by assembling a cross-referenced benchmark dataset of processed cuneiform tablets: (i) frontally aligned 3D tablets with pre-computed high-dimensional surface features, (ii) six-views raster images for off-the-shelf image processing, and (iii) metadata, transcriptions, and transliterations, for a subset of 707 tablets, for learning alignment between 3D data, image and linguistic expression. This is the first dataset of its kind and of its size in cuneiform research. This benchmark dataset is prepared for ease-of-use and immediate availability for computational researches, lowering the barrier to experiment and apply standard methods of analysis, at https://doi.org/10.11588/data/IE8CCN.
据推测,已知的楔形文字碑的数量有数十万块。希尔普雷希特在线档案包含1977年高分辨率3D扫描的平板电脑。在线楔形文字数据库CDLI收录了超过10万个平板电脑的元数据。虽然两者都可以公开访问,但楔形文字平板电脑的大规模机器学习和模式识别仍然难以捉摸。这些数据只能通过搜索网页来获取,收集之间的平板电脑标识符不一致,3D数据还没有准备好,难以进行自动化处理。我们通过组装一个经过处理的楔形文字平板的交叉参考基准数据集,为大规模分析楔形文字平板铺平了道路:(i)具有预先计算的高维表面特征的正面对齐3D平板,(ii)用于现成图像处理的六视图光栅图像,以及(iii)用于707块平板子集的元数据,转录和转写,用于学习3D数据,图像和语言表达之间的对齐。这是楔形文字研究中第一个这种类型和规模的数据集。这个基准数据集是为易于使用和立即可用的计算研究而准备的,降低了实验和应用标准分析方法的障碍,网址为https://doi.org/10.11588/data/IE8CCN。
{"title":"Breaking the Code on Broken Tablets: The Learning Challenge for Annotated Cuneiform Script in Normalized 2D and 3D Datasets","authors":"H. Mara, B. Bogacz","doi":"10.1109/ICDAR.2019.00032","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00032","url":null,"abstract":"The number of known cuneiform tablets is assumed to be in the hundreds of thousands. The Hilprecht Archive Online contains 1977 high-resolution 3D scans of tablets. The online cuneiform database CDLI catalogs metadata for more than 100.000 tablets. While both are accessible publicly, large-scale machine learning and pattern recognition on cuneiform tablets remain elusive. The data is only accessible by searching web pages, the tablet identifiers between collections are inconsistent, and the 3D data is unprepared and challenging for automated processing. We pave the way for large-scale analyses of cuneiform tablets by assembling a cross-referenced benchmark dataset of processed cuneiform tablets: (i) frontally aligned 3D tablets with pre-computed high-dimensional surface features, (ii) six-views raster images for off-the-shelf image processing, and (iii) metadata, transcriptions, and transliterations, for a subset of 707 tablets, for learning alignment between 3D data, image and linguistic expression. This is the first dataset of its kind and of its size in cuneiform research. This benchmark dataset is prepared for ease-of-use and immediate availability for computational researches, lowering the barrier to experiment and apply standard methods of analysis, at https://doi.org/10.11588/data/IE8CCN.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127436903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Robust Data Hiding Scheme Using Generated Content for Securing Genuine Documents 利用生成内容保护正版文档的健壮数据隐藏方案
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00131
Vinh Loc Cu, J. Burie, J. Ogier, Cheng-Lin Liu
Data hiding is an effective technique, compared to pervasive black-and-white code patterns such as barcode and quick response code, which can be used to secure document images against forgery or unauthorized intervention. In this work, we propose a robust digital watermarking scheme for securing genuine documents by leveraging generative adversarial networks (GAN). To begin with, the input document is adjusted to its right form by geometric correction. Next, the generated document is obtained from the input document by using the mentioned networks, and it is regarded as a reference for data hiding and detection. We then introduce an algorithm that hides a secret information into the document and produces a watermarked document whose content is minimally distorted in terms of normal observation. Furthermore, we also present a method that detects the hidden data from the watermarked document by measuring the distance of pixel values between the generated and watermarked document. For improving the security feature, we encode the secret information prior to hiding it by using pseudo random numbers. Lastly, we demonstrate that our approach gives high precision of data detection, and competitive performance compared to state-of-the-art approaches.
与条形码和快速响应代码等普遍存在的黑白代码模式相比,数据隐藏是一种有效的技术,可用于保护文档图像免受伪造或未经授权的干预。在这项工作中,我们提出了一种鲁棒的数字水印方案,通过利用生成对抗网络(GAN)来保护真实文档。首先,通过几何校正将输入文档调整为正确的形式。接下来,使用上述网络从输入文档中获得生成的文档,并将其作为数据隐藏和检测的参考。然后,我们引入一种算法,该算法将秘密信息隐藏到文档中,并生成一个带水印的文档,该文档的内容在正常观察中失真最小。此外,我们还提出了一种通过测量生成的像素值与水印文档之间的距离来检测水印文档中隐藏数据的方法。为了提高安全特性,我们先对秘密信息进行编码,然后再使用伪随机数进行隐藏。最后,我们证明了我们的方法提供了高精度的数据检测,与最先进的方法相比,具有竞争力的性能。
{"title":"A Robust Data Hiding Scheme Using Generated Content for Securing Genuine Documents","authors":"Vinh Loc Cu, J. Burie, J. Ogier, Cheng-Lin Liu","doi":"10.1109/ICDAR.2019.00131","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00131","url":null,"abstract":"Data hiding is an effective technique, compared to pervasive black-and-white code patterns such as barcode and quick response code, which can be used to secure document images against forgery or unauthorized intervention. In this work, we propose a robust digital watermarking scheme for securing genuine documents by leveraging generative adversarial networks (GAN). To begin with, the input document is adjusted to its right form by geometric correction. Next, the generated document is obtained from the input document by using the mentioned networks, and it is regarded as a reference for data hiding and detection. We then introduce an algorithm that hides a secret information into the document and produces a watermarked document whose content is minimally distorted in terms of normal observation. Furthermore, we also present a method that detects the hidden data from the watermarked document by measuring the distance of pixel values between the generated and watermarked document. For improving the security feature, we encode the secret information prior to hiding it by using pseudo random numbers. Lastly, we demonstrate that our approach gives high precision of data detection, and competitive performance compared to state-of-the-art approaches.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125184904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
OBC306: A Large-Scale Oracle Bone Character Recognition Dataset 一个大规模的甲骨文字符识别数据集
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00114
Shuangping Huang, Haobin Wang, Yong-ge Liu, Xiaosong Shi, Lianwen Jin
The oracle bone script from ancient China is among the world's most famous ancient writing systems. Identifying and deciphering oracle bone scripts is one of the most important topics in oracle bone study and requires a deep familiarity with the culture of ancient China. This task remains very challenging for two reasons. The first is that it is executed mainly by humans and requires a high level of experience, aptitude, and commitment. The second is due to the scarcity of domain-specific data, which hinders the advancement of automatic recognition research. A collection of well-labeled oracle-bone data is necessary to bridge the oracle bone and information processing fields; however, such a dataset has not yet been presented. Hence, in this paper, we construct a new large-scale dataset of oracle bone characters called OBC306. We also present the standard deep convolutional neural network-based evaluation for this dataset to serve as a benchmark. Through statistical and visual analyses, we describe the inherent difficulties of oracle bone recognition and propose future challenges for and extensions of oracle bone study using information processing. This dataset contains more than 300,000 character-level samples cropped from oracle-bone rubbings or images. It covers 306 glyph classes and is the largest existing raw oracle-bone character set, to the best of our knowledge. It is anticipated the publication of this dataset will facilitate the development of oracle bone research and lead to optimal algorithmic solutions.
中国古代的甲骨文是世界上最著名的古代文字系统之一。甲骨文的识别与破译是甲骨文研究的重要课题之一,需要对中国古代文化有深入的了解。由于两个原因,这项任务仍然非常具有挑战性。首先,它主要是由人类执行的,需要高水平的经验、能力和承诺。二是由于特定领域数据的稀缺性,阻碍了自动识别研究的推进。良好标记的甲骨文数据是连接甲骨文和信息处理领域的桥梁;然而,目前还没有这样的数据集。因此,本文构建了一个新的大规模甲骨文数据集OBC306。我们还提出了该数据集的标准深度卷积神经网络评估作为基准。通过统计和可视化分析,我们描述了甲骨文识别的固有困难,并提出了未来甲骨文信息处理研究的挑战和扩展。这个数据集包含了超过30万个字符级别的样本,这些样本是从甲骨文拓片或图像中裁剪出来的。据我们所知,它涵盖了306个字形类,是现存最大的原始甲骨文字符集。预计该数据集的出版将促进甲骨文研究的发展,并导致最佳算法解决方案。
{"title":"OBC306: A Large-Scale Oracle Bone Character Recognition Dataset","authors":"Shuangping Huang, Haobin Wang, Yong-ge Liu, Xiaosong Shi, Lianwen Jin","doi":"10.1109/ICDAR.2019.00114","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00114","url":null,"abstract":"The oracle bone script from ancient China is among the world's most famous ancient writing systems. Identifying and deciphering oracle bone scripts is one of the most important topics in oracle bone study and requires a deep familiarity with the culture of ancient China. This task remains very challenging for two reasons. The first is that it is executed mainly by humans and requires a high level of experience, aptitude, and commitment. The second is due to the scarcity of domain-specific data, which hinders the advancement of automatic recognition research. A collection of well-labeled oracle-bone data is necessary to bridge the oracle bone and information processing fields; however, such a dataset has not yet been presented. Hence, in this paper, we construct a new large-scale dataset of oracle bone characters called OBC306. We also present the standard deep convolutional neural network-based evaluation for this dataset to serve as a benchmark. Through statistical and visual analyses, we describe the inherent difficulties of oracle bone recognition and propose future challenges for and extensions of oracle bone study using information processing. This dataset contains more than 300,000 character-level samples cropped from oracle-bone rubbings or images. It covers 306 glyph classes and is the largest existing raw oracle-bone character set, to the best of our knowledge. It is anticipated the publication of this dataset will facilitate the development of oracle bone research and lead to optimal algorithmic solutions.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123314166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Logo Design Analysis by Ranking 从排名分析标志设计
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00238
Takuro Karamatsu, D. Suehiro, S. Uchida
In this paper, we analyze logo designs by using machine learning, as a promising trial of graphic design analysis. Specifically, we will focus on favicon images, which are tiny logos used as company icons on web browsers, and analyze them to understand their trends in individual industry classes. For example, if we can catch the subtle trends in favicons of financial companies, they will suggest to us how professional designers express the atmosphere of financial companies graphically. For the purpose, we will use top-rank learning, which is one of the recent machine learning methods for ranking and very suitable for revealing the subtle trends in graphic designs.
在本文中,我们使用机器学习来分析标志设计,作为图形设计分析的一个有前途的尝试。具体来说,我们将重点关注图标图像,即在网络浏览器上用作公司图标的小徽标,并对其进行分析,以了解其在各个行业类别中的趋势。例如,如果我们能抓住金融公司的图标中微妙的趋势,就会给我们建议专业设计师如何用图形化的方式表达金融公司的氛围。为此,我们将使用top-rank学习,这是最近的机器学习排名方法之一,非常适合揭示平面设计中的微妙趋势。
{"title":"Logo Design Analysis by Ranking","authors":"Takuro Karamatsu, D. Suehiro, S. Uchida","doi":"10.1109/ICDAR.2019.00238","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00238","url":null,"abstract":"In this paper, we analyze logo designs by using machine learning, as a promising trial of graphic design analysis. Specifically, we will focus on favicon images, which are tiny logos used as company icons on web browsers, and analyze them to understand their trends in individual industry classes. For example, if we can catch the subtle trends in favicons of financial companies, they will suggest to us how professional designers express the atmosphere of financial companies graphically. For the purpose, we will use top-rank learning, which is one of the recent machine learning methods for ranking and very suitable for revealing the subtle trends in graphic designs.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"397 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123542866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Target-Directed MixUp for Labeling Tangut Characters 用于标注切线字符的目标定向混淆
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00041
Guangwei Zhang, Yinliang Zhao
Deep learning largely improves the performance in computer vision and image understanding tasks depending on large training datasets of labeled images. However, it is usually expensive and time-consuming to label data although unlabeled data are much easier to get. It is practical to build the training dataset iteratively from a small set of manually labeled data because of the limited budget or emerging new categories. The labeled data could not only be used for training the model but also some knowledge could be mined from them for finding examples of the classes not included in the training dataset. Mixup [1] improves the model's accuracy and generalization by augmenting the training dataset with the "virtual examples" that are generated by mixing pairs of randomly selected examples from the training dataset. Motivated by Mixup, we propose the Target-Directed Mixup (TDM) method for building the training dataset of the deep learning-based Tangut character recognition system. The virtual examples are generated by mixing two or more similar examples in the training dataset, together with the target examples of unseen classes that need to be labeled, which is a kind of generative few-shot learning. This method can help expand the training dataset by finding real examples of unseen Tangut characters and provide virtual examples that could represent the rare characters that are used very limited in historical documents. According to our experiments, TDM can help recognize the unseen examples at the accuracy of 80% with only 4 to 5 real target examples, which largely reduces human labor in data annotation.
深度学习在很大程度上提高了计算机视觉和图像理解任务的性能,这些任务依赖于标记图像的大型训练数据集。然而,尽管未标记的数据更容易获得,但标记数据通常既昂贵又耗时。由于预算有限或新类别的出现,从一小组手动标记的数据迭代构建训练数据集是可行的。标记的数据不仅可以用于训练模型,还可以从中挖掘一些知识,以查找未包含在训练数据集中的类的示例。Mixup[1]通过使用“虚拟示例”来增强训练数据集,从而提高了模型的准确性和泛化性。虚拟示例是由从训练数据集中随机选择的示例对混合而产生的。在Mixup的激励下,我们提出了基于目标导向的Mixup (Target-Directed Mixup, TDM)方法来构建基于深度学习的切线字符识别系统的训练数据集。虚拟样例是将训练数据集中的两个或两个以上相似的样例与需要标记的未知类的目标样例混合生成的,这是一种生成式的少次学习。这种方法可以通过寻找未见过的切线字符的真实示例来帮助扩展训练数据集,并提供可以代表历史文档中使用非常有限的稀有字符的虚拟示例。根据我们的实验,TDM只需要4到5个真实的目标样本,就可以帮助识别未见的样本,准确率达到80%,大大减少了数据标注的人工劳动。
{"title":"Target-Directed MixUp for Labeling Tangut Characters","authors":"Guangwei Zhang, Yinliang Zhao","doi":"10.1109/ICDAR.2019.00041","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00041","url":null,"abstract":"Deep learning largely improves the performance in computer vision and image understanding tasks depending on large training datasets of labeled images. However, it is usually expensive and time-consuming to label data although unlabeled data are much easier to get. It is practical to build the training dataset iteratively from a small set of manually labeled data because of the limited budget or emerging new categories. The labeled data could not only be used for training the model but also some knowledge could be mined from them for finding examples of the classes not included in the training dataset. Mixup [1] improves the model's accuracy and generalization by augmenting the training dataset with the \"virtual examples\" that are generated by mixing pairs of randomly selected examples from the training dataset. Motivated by Mixup, we propose the Target-Directed Mixup (TDM) method for building the training dataset of the deep learning-based Tangut character recognition system. The virtual examples are generated by mixing two or more similar examples in the training dataset, together with the target examples of unseen classes that need to be labeled, which is a kind of generative few-shot learning. This method can help expand the training dataset by finding real examples of unseen Tangut characters and provide virtual examples that could represent the rare characters that are used very limited in historical documents. According to our experiments, TDM can help recognize the unseen examples at the accuracy of 80% with only 4 to 5 real target examples, which largely reduces human labor in data annotation.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"2018 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123783300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graphical Object Detection in Document Images 文档图像中的图形对象检测
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00018
Ranajit Saha, Ajoy Mondal, C. V. Jawahar
Graphical elements: particularly tables and figures contain a visual summary of the most valuable information contained in a document. Therefore, localization of such graphical objects in the document images is the initial step to understand the content of such graphical objects or document images. In this paper, we present a novel end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection ( GOD ). Our framework is data-driven and does not require any heuristics or meta-data to locate graphical objects in the document images. The GOD explores the concept of transfer learning and domain adaptation to handle scarcity of labeled training images for graphical object detection task in the document images. Performance analysis carried out on the various public benchmark data sets: ICDAR -2013, ICDAR - POD2017 and UNLV shows that our model yields promising results as compared to state-of-the-art techniques.
图形元素:特别是表格和图形,包含文件中最有价值信息的视觉摘要。因此,对这些图形对象在文档图像中的定位是理解这些图形对象或文档图像内容的第一步。在本文中,我们提出了一种新颖的端到端可训练的基于深度学习的框架来定位文档图像中的图形对象,称为图形对象检测(GOD)。我们的框架是数据驱动的,不需要任何启发式方法或元数据来定位文档图像中的图形对象。探讨了迁移学习和领域自适应的概念,以处理文档图像中图形目标检测任务中标记训练图像的稀缺性。对各种公共基准数据集(ICDAR -2013、ICDAR - POD2017和UNLV)进行的性能分析表明,与最先进的技术相比,我们的模型产生了有希望的结果。
{"title":"Graphical Object Detection in Document Images","authors":"Ranajit Saha, Ajoy Mondal, C. V. Jawahar","doi":"10.1109/ICDAR.2019.00018","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00018","url":null,"abstract":"Graphical elements: particularly tables and figures contain a visual summary of the most valuable information contained in a document. Therefore, localization of such graphical objects in the document images is the initial step to understand the content of such graphical objects or document images. In this paper, we present a novel end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection ( GOD ). Our framework is data-driven and does not require any heuristics or meta-data to locate graphical objects in the document images. The GOD explores the concept of transfer learning and domain adaptation to handle scarcity of labeled training images for graphical object detection task in the document images. Performance analysis carried out on the various public benchmark data sets: ICDAR -2013, ICDAR - POD2017 and UNLV shows that our model yields promising results as compared to state-of-the-art techniques.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116464450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
期刊
2019 International Conference on Document Analysis and Recognition (ICDAR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1