首页 > 最新文献

2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献

英文 中文
Selective Super-Resolution for Scene Text Images 选择超分辨率的场景文本图像
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00071
Ryo Nakao, Brian Kenji Iwana, S. Uchida
In this paper, we realize the enhancement of super-resolution using images with scene text. Specifically, this paper proposes the use of Super-Resolution Convolutional Neural Networks (SRCNN) which are constructed to tackle issues associated with characters and text. We demonstrate that standard SRCNNs trained for general object super-resolution is not sufficient and that the proposed method is a viable method in creating a robust model for text. To do so, we analyze the characteristics of SRCNNs through quantitative and qualitative evaluations with scene text data. In addition, analysis using the correlation between layers by Singular Vector Canonical Correlation Analysis (SVCCA) and comparison of filters of each SRCNN using t-SNE is performed. Furthermore, in order to create a unified super-resolution model specialized for both text and objects, a model using SRCNNs trained with the different data types and Content-wise Network Fusion (CNF) is used. We integrate the SRCNN trained for character images and then SRCNN trained for general object images, and verify the accuracy improvement of scene images which include text. We also examine how each SRCNN affects super-resolution images after fusion.
本文利用带有场景文本的图像实现了超分辨率的增强。具体来说,本文提出使用超分辨率卷积神经网络(SRCNN)来解决与字符和文本相关的问题。我们证明了针对一般目标超分辨率训练的标准srcnn是不够的,并且所提出的方法是创建文本鲁棒模型的可行方法。为此,我们通过场景文本数据的定量和定性评估来分析srcnn的特征。此外,利用奇异向量典型相关分析(SVCCA)对层间的相关性进行分析,并利用t-SNE对每个SRCNN的滤波器进行比较。此外,为了创建统一的文本和对象的超分辨率模型,使用了使用不同数据类型和内容智能网络融合(CNF)训练的srcnn模型。我们将训练好的针对字符图像的SRCNN与训练好的针对一般目标图像的SRCNN进行整合,验证了包含文本的场景图像的准确率提升。我们还研究了每个SRCNN如何影响融合后的超分辨率图像。
{"title":"Selective Super-Resolution for Scene Text Images","authors":"Ryo Nakao, Brian Kenji Iwana, S. Uchida","doi":"10.1109/ICDAR.2019.00071","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00071","url":null,"abstract":"In this paper, we realize the enhancement of super-resolution using images with scene text. Specifically, this paper proposes the use of Super-Resolution Convolutional Neural Networks (SRCNN) which are constructed to tackle issues associated with characters and text. We demonstrate that standard SRCNNs trained for general object super-resolution is not sufficient and that the proposed method is a viable method in creating a robust model for text. To do so, we analyze the characteristics of SRCNNs through quantitative and qualitative evaluations with scene text data. In addition, analysis using the correlation between layers by Singular Vector Canonical Correlation Analysis (SVCCA) and comparison of filters of each SRCNN using t-SNE is performed. Furthermore, in order to create a unified super-resolution model specialized for both text and objects, a model using SRCNNs trained with the different data types and Content-wise Network Fusion (CNF) is used. We integrate the SRCNN trained for character images and then SRCNN trained for general object images, and verify the accuracy improvement of scene images which include text. We also examine how each SRCNN affects super-resolution images after fusion.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126436414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Field Typing for Improved Recognition on Heterogeneous Handwritten Forms 改进异构手写表单识别的字段输入
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00084
Ciprian Tomoiaga, Paul Feng, M. Salzmann, PA Jayet
Offline handwriting recognition has undergone continuous progress over the past decades. However, existing methods are typically benchmarked on free-form text datasets that are biased towards good-quality images and handwriting styles, and homogeneous content. In this paper, we show that state-of-the-art algorithms, employing long short-term memory (LSTM) layers, do not readily generalize to real-world structured documents, such as forms, due to their highly heterogeneous and out-of-vocabulary content, and to the inherent ambiguities of this content. To address this, we propose to leverage the content type within an LSTM-based architecture. Furthermore, we introduce a procedure to generate synthetic data to train this architecture without requiring expensive manual annotations. We demonstrate the effectiveness of our approach at transcribing text on a challenging, real-world dataset of European Accident Statements.
离线手写识别在过去的几十年里经历了不断的进步。然而,现有的方法通常是在自由格式的文本数据集上进行基准测试的,这些数据集倾向于高质量的图像和手写样式,以及同质的内容。在本文中,我们表明,采用长短期记忆(LSTM)层的最先进算法,由于其高度异构和词汇外的内容,以及这些内容固有的模糊性,不容易推广到现实世界的结构化文档,如表单。为了解决这个问题,我们建议在基于lstm的体系结构中利用内容类型。此外,我们引入了一个过程来生成合成数据来训练这个体系结构,而不需要昂贵的手动注释。我们展示了我们的方法在一个具有挑战性的、真实世界的欧洲事故声明数据集上转录文本的有效性。
{"title":"Field Typing for Improved Recognition on Heterogeneous Handwritten Forms","authors":"Ciprian Tomoiaga, Paul Feng, M. Salzmann, PA Jayet","doi":"10.1109/ICDAR.2019.00084","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00084","url":null,"abstract":"Offline handwriting recognition has undergone continuous progress over the past decades. However, existing methods are typically benchmarked on free-form text datasets that are biased towards good-quality images and handwriting styles, and homogeneous content. In this paper, we show that state-of-the-art algorithms, employing long short-term memory (LSTM) layers, do not readily generalize to real-world structured documents, such as forms, due to their highly heterogeneous and out-of-vocabulary content, and to the inherent ambiguities of this content. To address this, we propose to leverage the content type within an LSTM-based architecture. Furthermore, we introduce a procedure to generate synthetic data to train this architecture without requiring expensive manual annotations. We demonstrate the effectiveness of our approach at transcribing text on a challenging, real-world dataset of European Accident Statements.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125815823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Sub-Word Embeddings for OCR Corrections in Highly Fusional Indic Languages 基于子词嵌入的高融合印度语OCR校正
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00034
Rohit Saluja, Mayur Punjabi, Mark J. Carman, Ganesh Ramakrishnan, P. Chaudhuri
Texts in Indic Languages contain a large proportion of out-of-vocabulary (OOV) words due to frequent fusion using conjoining rules (of which there are around 4000 in Sanskrit). OCR errors further accentuate this complexity for the error correction systems. Variations of sub-word units such as n-grams, possibly encapsulating the context, can be extracted from the OCR text as well as the language text individually. Some of the sub-word units that are derived from the texts in such languages highly correlate to the word conjoining rules. Signals such as frequency values (on a corpus) associated with such sub-word units have been used previously with log-linear classifiers for detecting errors in Indic OCR texts. We explore two different encodings to capture such signals and augment the input to Long Short Term Memory (LSTM) based OCR correction models, that have proven useful in the past for jointly learning the language as well as OCR-specific confusions. The first type of encoding makes direct use of sub-word unit frequency values, derived from the training data. The formulation results in faster convergence and better accuracy values of the error correction model on four different languages with varying complexities. The second type of encoding makes use of trainable sub-word embeddings. We introduce a new procedure for training fastText embeddings on the sub-word units and further observe a large gain in F-Scores, as well as word-level accuracy values.
由于频繁使用连接规则进行融合,印度语的文本中包含了很大比例的词汇外(OOV)单词(梵语中大约有4000个)。OCR误差进一步加剧了纠错系统的复杂性。子词单位的变化,如n-gram,可能封装上下文,可以分别从OCR文本和语言文本中提取。从这些语言的文本中衍生出来的一些子词单位与单词连接规则高度相关。与这些子词单位相关联的频率值等信号(在语料库上)以前已与对数线性分类器一起用于检测印度OCR文本中的错误。我们探索了两种不同的编码来捕获这些信号,并将输入增强到基于长短期记忆(LSTM)的OCR校正模型,这些模型在过去被证明对共同学习语言以及OCR特异性混淆很有用。第一种编码直接使用从训练数据中导出的子词单位频率值。在不同复杂程度的四种语言下,该模型的收敛速度更快,精度值更高。第二种类型的编码使用可训练的子词嵌入。我们引入了一种新的过程,在子词单元上训练快速文本嵌入,并进一步观察到F-Scores和词级精度值的大幅提高。
{"title":"Sub-Word Embeddings for OCR Corrections in Highly Fusional Indic Languages","authors":"Rohit Saluja, Mayur Punjabi, Mark J. Carman, Ganesh Ramakrishnan, P. Chaudhuri","doi":"10.1109/ICDAR.2019.00034","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00034","url":null,"abstract":"Texts in Indic Languages contain a large proportion of out-of-vocabulary (OOV) words due to frequent fusion using conjoining rules (of which there are around 4000 in Sanskrit). OCR errors further accentuate this complexity for the error correction systems. Variations of sub-word units such as n-grams, possibly encapsulating the context, can be extracted from the OCR text as well as the language text individually. Some of the sub-word units that are derived from the texts in such languages highly correlate to the word conjoining rules. Signals such as frequency values (on a corpus) associated with such sub-word units have been used previously with log-linear classifiers for detecting errors in Indic OCR texts. We explore two different encodings to capture such signals and augment the input to Long Short Term Memory (LSTM) based OCR correction models, that have proven useful in the past for jointly learning the language as well as OCR-specific confusions. The first type of encoding makes direct use of sub-word unit frequency values, derived from the training data. The formulation results in faster convergence and better accuracy values of the error correction model on four different languages with varying complexities. The second type of encoding makes use of trainable sub-word embeddings. We introduce a new procedure for training fastText embeddings on the sub-word units and further observe a large gain in F-Scores, as well as word-level accuracy values.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126290142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Blind Source Separation Based Framework for Multispectral Document Images Binarization 基于盲源分离的多光谱文档图像二值化框架
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00237
Abderrahmane Rahiche, A. Bakhta, M. Cheriet
In this paper, we propose a novel Blind Source Separation (BSS) based framework for multispectral (MS) document images binarization. This framework takes advantage of the multidimensional data representation of MS images and makes use of the Graph regularized Non-negative Matrix Factorization (GNMF) to decompose MS document images into their different constituting components, i.e., foreground (text, ink), background (paper, parchment), degradation information, etc. The proposed framework is validated on two different real-world data sets of manuscript images showing a high capability of dealing with: variable numbers of bands regardless of the acquisition protocol, different types of degradations, and illumination non-uniformity while outperforming the results reported in the state-of-the-art. Although the focus was put on the binary separation (i.e., foreground/background), the proposed framework is also used for the decomposition of document images into different components, i.e., background, text, and degradation, which allows full sources separation, whereby further analysis and characterization of each component can be possible. A comparative study is performed using Independent Component Analysis (ICA) and Principal Component Analysis (PCA) methods. Our framework is also validated on another third dataset of MS images of natural objects to demonstrate its generalizability beyond document samples.
本文提出了一种基于盲源分离(BSS)的多光谱文档图像二值化框架。该框架利用MS图像的多维数据表示,利用图正则化非负矩阵分解(GNMF)将MS文档图像分解为其不同的构成成分,即前景(文本、墨水)、背景(纸张、羊皮纸)、退化信息等。所提出的框架在两种不同的真实世界手稿图像数据集上进行了验证,显示出高处理能力:无论采集协议如何,可变数量的波段,不同类型的退化和光照不均匀性,同时优于最新技术报告的结果。虽然重点放在二值分离(即前景/背景)上,但提议的框架也用于将文档图像分解为不同的组件,即背景、文本和退化,这允许完整的源分离,从而可以进一步分析和表征每个组件。使用独立成分分析(ICA)和主成分分析(PCA)方法进行了比较研究。我们的框架还在另一个自然物体的MS图像数据集上进行了验证,以证明其超越文档样本的泛化性。
{"title":"Blind Source Separation Based Framework for Multispectral Document Images Binarization","authors":"Abderrahmane Rahiche, A. Bakhta, M. Cheriet","doi":"10.1109/ICDAR.2019.00237","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00237","url":null,"abstract":"In this paper, we propose a novel Blind Source Separation (BSS) based framework for multispectral (MS) document images binarization. This framework takes advantage of the multidimensional data representation of MS images and makes use of the Graph regularized Non-negative Matrix Factorization (GNMF) to decompose MS document images into their different constituting components, i.e., foreground (text, ink), background (paper, parchment), degradation information, etc. The proposed framework is validated on two different real-world data sets of manuscript images showing a high capability of dealing with: variable numbers of bands regardless of the acquisition protocol, different types of degradations, and illumination non-uniformity while outperforming the results reported in the state-of-the-art. Although the focus was put on the binary separation (i.e., foreground/background), the proposed framework is also used for the decomposition of document images into different components, i.e., background, text, and degradation, which allows full sources separation, whereby further analysis and characterization of each component can be possible. A comparative study is performed using Independent Component Analysis (ICA) and Principal Component Analysis (PCA) methods. Our framework is also validated on another third dataset of MS images of natural objects to demonstrate its generalizability beyond document samples.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114442377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Deep Visual Template-Free Form Parsing 深度可视化无模板表单解析
Pub Date : 2019-09-01 DOI: 10.1109/icdar.2019.00030
Brian L. Davis, B. Morse, Scott D. Cohen, Brian L. Price, Chris Tensmeyer
Automatic, template-free extraction of information from form images is challenging due to the variety of form layouts. This is even more challenging for historical forms due to noise and degradation. A crucial part of the extraction process is associating input text with pre-printed labels. We present a learned, template-free solution to detecting pre-printed text and input text/handwriting and predicting pair-wise relationships between them. While previous approaches to this problem have been focused on clean images and clear layouts, we show our approach is effective in the domain of noisy, degraded, and varied form images. We introduce a new dataset of historical form images (late 1800s, early 1900s) for training and validating our approach. Our method uses a convolutional network to detect pre-printed text and input text lines. We pool features from the detection network to classify possible relationships in a language-agnostic way. We show that our proposed pairing method outperforms heuristic rules and that visual features are critical to obtaining high accuracy.
由于表单布局的多样性,从表单图像中自动、无模板地提取信息是具有挑战性的。由于噪音和退化,这对历史形式来说更具挑战性。提取过程的关键部分是将输入文本与预打印的标签关联起来。我们提出了一个学习的、无模板的解决方案来检测预打印文本和输入文本/手写,并预测它们之间的成对关系。虽然以前解决这个问题的方法主要集中在干净的图像和清晰的布局上,但我们的方法在噪声、退化和各种形式的图像领域是有效的。我们引入了一个新的历史形式图像数据集(19世纪末,20世纪初),用于训练和验证我们的方法。我们的方法使用卷积网络来检测预打印文本和输入文本行。我们从检测网络中汇集特征,以语言不可知的方式对可能的关系进行分类。我们证明了我们提出的配对方法优于启发式规则,并且视觉特征是获得高精度的关键。
{"title":"Deep Visual Template-Free Form Parsing","authors":"Brian L. Davis, B. Morse, Scott D. Cohen, Brian L. Price, Chris Tensmeyer","doi":"10.1109/icdar.2019.00030","DOIUrl":"https://doi.org/10.1109/icdar.2019.00030","url":null,"abstract":"Automatic, template-free extraction of information from form images is challenging due to the variety of form layouts. This is even more challenging for historical forms due to noise and degradation. A crucial part of the extraction process is associating input text with pre-printed labels. We present a learned, template-free solution to detecting pre-printed text and input text/handwriting and predicting pair-wise relationships between them. While previous approaches to this problem have been focused on clean images and clear layouts, we show our approach is effective in the domain of noisy, degraded, and varied form images. We introduce a new dataset of historical form images (late 1800s, early 1900s) for training and validating our approach. Our method uses a convolutional network to detect pre-printed text and input text lines. We pool features from the detection network to classify possible relationships in a language-agnostic way. We show that our proposed pairing method outperforms heuristic rules and that visual features are critical to obtaining high accuracy.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"18 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127824650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
LPGA: Line-of-Sight Parsing with Graph-Based Attention for Math Formula Recognition LPGA:用于数学公式识别的基于图的视线解析
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00109
Mahshad Mahdavi, M. Condon, Kenny Davila
We present a model for recognizing typeset math formula images from connected components or symbols. In our approach, connected components are used to construct a line-of-sight (LOS) graph. The graph is used both to reduce the search space for formula structure interpretations, and to guide a classification attention model using separate channels for inputs and their local visual context. For classification, we used visual densities with Random Forests for initial development, and then converted this to a Convolutional Neural Network (CNN) with a second branch to capture context for each input image. Formula structure is extracted as a directed spanning tree from a weighted LOS graph using Edmonds' algorithm. We obtain strong results for formulas without grids or matrices in the InftyCDB-2 dataset (90.89% from components, 93.5% from symbols). Using tools from the CROHME handwritten formula recognition competitions, we were able to compile all symbol and structure recognition errors for analysis. Our data and source code are publicly available.
我们提出了一个从连接组件或符号中识别排版数学公式图像的模型。在我们的方法中,连接的组件用于构建视线(LOS)图。该图既用于减少公式结构解释的搜索空间,又用于指导使用输入及其局部视觉上下文的单独通道的分类注意模型。对于分类,我们使用随机森林的视觉密度进行初始开发,然后将其转换为具有第二个分支的卷积神经网络(CNN),以捕获每个输入图像的上下文。利用Edmonds算法从加权LOS图中提取公式结构为有向生成树。对于InftyCDB-2数据集中没有网格或矩阵的公式,我们获得了强有力的结果(90.89%来自组件,93.5%来自符号)。使用来自CROHME手写公式识别比赛的工具,我们能够编译所有符号和结构识别错误进行分析。我们的数据和源代码是公开的。
{"title":"LPGA: Line-of-Sight Parsing with Graph-Based Attention for Math Formula Recognition","authors":"Mahshad Mahdavi, M. Condon, Kenny Davila","doi":"10.1109/ICDAR.2019.00109","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00109","url":null,"abstract":"We present a model for recognizing typeset math formula images from connected components or symbols. In our approach, connected components are used to construct a line-of-sight (LOS) graph. The graph is used both to reduce the search space for formula structure interpretations, and to guide a classification attention model using separate channels for inputs and their local visual context. For classification, we used visual densities with Random Forests for initial development, and then converted this to a Convolutional Neural Network (CNN) with a second branch to capture context for each input image. Formula structure is extracted as a directed spanning tree from a weighted LOS graph using Edmonds' algorithm. We obtain strong results for formulas without grids or matrices in the InftyCDB-2 dataset (90.89% from components, 93.5% from symbols). Using tools from the CROHME handwritten formula recognition competitions, we were able to compile all symbol and structure recognition errors for analysis. Our data and source code are publicly available.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130149470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Do You Need More Data? The DeepSignDB On-Line Handwritten Signature Biometric Database 你需要更多的数据吗?DeepSignDB在线手写签名生物识别数据库
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00185
Rubén Tolosana, R. Vera-Rodríguez, Julian Fierrez, A. Morales, J. Ortega-Garcia
Data have become one of the most valuable things in this new era where deep learning technology seems to overcome traditional approaches. However, in some tasks, such as the verification of handwritten signatures, the amount of publicly available data is scarce, what makes difficult to test the real limits of deep learning. In addition to the lack of public data, it is not easy to evaluate the improvements of novel approaches compared with the state of the art as different experimental protocols and conditions are usually considered for different signature databases. To tackle all these mentioned problems, the main contribution of this study is twofold: i) we present and describe the new DeepSignDB on-line handwritten signature biometric public database, and ii) we propose a standard experimental protocol and benchmark to be used for the research community in order to perform a fair comparison of novel approaches with the state of the art. The DeepSignDB database is obtained through the combination of some of the most popular on-line signature databases, and a novel dataset not presented yet. It comprises more than 70K signatures acquired using both stylus and finger inputs from a total of 1526 users. Two acquisition scenarios are considered, office and mobile, with a total of 8 different devices. Additionally, different types of impostors and number of acquisition sessions are considered along the database. The DeepSignDB and benchmark results are available in GitHub.
在这个深度学习技术似乎超越传统方法的新时代,数据已经成为最有价值的东西之一。然而,在某些任务中,例如验证手写签名,公开可用的数据量很少,这使得很难测试深度学习的真正局限性。除了缺乏公共数据外,与现有技术相比,评估新方法的改进并不容易,因为不同的特征数据库通常考虑不同的实验方案和条件。为了解决所有这些提到的问题,本研究的主要贡献是双重的:i)我们提出并描述了新的DeepSignDB在线手写签名生物识别公共数据库,ii)我们提出了一个标准的实验协议和基准,用于研究界,以便对新方法与最先进的方法进行公平的比较。DeepSignDB数据库是通过结合一些最流行的在线特征数据库和一个尚未出现的新数据集而获得的。它包括超过70K的签名,使用手写笔和手指输入,从总共1526个用户。考虑办公和移动两种收购场景,共8种不同的设备。此外,数据库还考虑了不同类型的冒名顶替者和获取会话的数量。DeepSignDB和基准测试结果可在GitHub中获得。
{"title":"Do You Need More Data? The DeepSignDB On-Line Handwritten Signature Biometric Database","authors":"Rubén Tolosana, R. Vera-Rodríguez, Julian Fierrez, A. Morales, J. Ortega-Garcia","doi":"10.1109/ICDAR.2019.00185","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00185","url":null,"abstract":"Data have become one of the most valuable things in this new era where deep learning technology seems to overcome traditional approaches. However, in some tasks, such as the verification of handwritten signatures, the amount of publicly available data is scarce, what makes difficult to test the real limits of deep learning. In addition to the lack of public data, it is not easy to evaluate the improvements of novel approaches compared with the state of the art as different experimental protocols and conditions are usually considered for different signature databases. To tackle all these mentioned problems, the main contribution of this study is twofold: i) we present and describe the new DeepSignDB on-line handwritten signature biometric public database, and ii) we propose a standard experimental protocol and benchmark to be used for the research community in order to perform a fair comparison of novel approaches with the state of the art. The DeepSignDB database is obtained through the combination of some of the most popular on-line signature databases, and a novel dataset not presented yet. It comprises more than 70K signatures acquired using both stylus and finger inputs from a total of 1526 users. Two acquisition scenarios are considered, office and mobile, with a total of 8 different devices. Additionally, different types of impostors and number of acquisition sessions are considered along the database. The DeepSignDB and benchmark results are available in GitHub.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123879340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Rethinking Semantic Segmentation for Table Structure Recognition in Documents 对文档表结构识别语义分割的再思考
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00225
Shoaib Ahmed Siddiqui, Pervaiz Iqbal Khan, A. Dengel, Sheraz Ahmed
Based on the recent advancements in the domain of semantic segmentation, Fully-Convolutional Networks (FCN) have been successfully applied for the task of table structure recognition in the past. We analyze the efficacy of semantic segmentation networks for this purpose and simplify the problem by proposing prediction tiling based on the consistency assumption which holds for tabular structures. For an image of dimensions H × W, we predict a single column for the rows (ŷ_row ∊ H) and a predict a single row for the columns (ŷ_row ∊ W). We use a dual-headed architecture where initial feature maps (from the encoder-decoder model) are shared while the last two layers generate class specific (row/column) predictions. This allows us to generate predictions using a single model for both rows and columns simultaneously, where previous methods relied on two separate models for inference. With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13 image-based table structure recognition dataset with an average F-Measure of 92.39% (91.90% and 92.88% F-Measure for rows and columns respectively). With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13. The obtained results advocate that constraining the problem space in the case of FCN by imposing valid constraints can lead to significant performance gains.
基于语义分割领域的最新进展,全卷积网络(FCN)已经成功地应用于表结构识别任务。我们为此目的分析了语义分割网络的有效性,并通过提出基于一致性假设的预测平铺来简化问题,该假设适用于表格结构。对于尺寸为H × W的图像,我们预测行(ŷ_row H)为单列,列(ŷ_row W)为单行。我们使用双头架构,其中初始特征映射(来自编码器-解码器模型)是共享的,而最后两层生成特定于类(行/列)的预测。这允许我们同时使用单个模型对行和列生成预测,而以前的方法依赖于两个单独的模型进行推理。使用该方法,我们能够在ICDAR-13基于图像的表结构识别数据集上获得最先进的结果,平均F-Measure为92.39%(行和列分别为91.90%和92.88%)。通过提出的方法,我们能够在ICDAR-13上获得最先进的结果。所获得的结果表明,通过施加有效约束来约束FCN的问题空间可以显著提高性能。
{"title":"Rethinking Semantic Segmentation for Table Structure Recognition in Documents","authors":"Shoaib Ahmed Siddiqui, Pervaiz Iqbal Khan, A. Dengel, Sheraz Ahmed","doi":"10.1109/ICDAR.2019.00225","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00225","url":null,"abstract":"Based on the recent advancements in the domain of semantic segmentation, Fully-Convolutional Networks (FCN) have been successfully applied for the task of table structure recognition in the past. We analyze the efficacy of semantic segmentation networks for this purpose and simplify the problem by proposing prediction tiling based on the consistency assumption which holds for tabular structures. For an image of dimensions H × W, we predict a single column for the rows (ŷ_row ∊ H) and a predict a single row for the columns (ŷ_row ∊ W). We use a dual-headed architecture where initial feature maps (from the encoder-decoder model) are shared while the last two layers generate class specific (row/column) predictions. This allows us to generate predictions using a single model for both rows and columns simultaneously, where previous methods relied on two separate models for inference. With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13 image-based table structure recognition dataset with an average F-Measure of 92.39% (91.90% and 92.88% F-Measure for rows and columns respectively). With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13. The obtained results advocate that constraining the problem space in the case of FCN by imposing valid constraints can lead to significant performance gains.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124011225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Deformation Classification of Drawings for Assessment of Visual-Motor Perceptual Maturity 基于视觉-运动知觉成熟度评价的图形变形分类
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00155
Momina Moetesum, I. Siddiqi, N. Vincent
Sketches and drawings are popularly employed in clinical psychology to assess the visual-motor and perceptual development in children and adolescents. Drawn responses by subjects are mostly characterized by high degree of deformations that indicates presence of various visual, perceptual and motor disorders. Classification of deformations is a challenging task due to complex and extensive rule representation. In this study, we propose a novel technique to model clinical manifestations using Deep Convolutional Neural Networks (DCNNs). Drawn responses of nine templates used for assessment of perceptual orientation of individuals are employed as training samples. A number of defined deviations scored in each template are then modeled by applying fine tuning on a pre-trained DCNN architecture. Performance of the proposed technique is evaluated on samples of 106 children. Results of experiments show that pre-trained DCNNs can model and classify a number of deformations across multiple shapes with considerable success. Nevertheless some deformations are represented more reliably than the others. Overall promising classification results are observed that substantiate the effectiveness of our proposed technique.
在临床心理学中,素描和绘画被广泛用于评估儿童和青少年的视觉运动和知觉发展。受试者绘制的反应大多具有高度变形的特征,表明存在各种视觉、知觉和运动障碍。由于规则表示复杂而广泛,变形分类是一项具有挑战性的任务。在这项研究中,我们提出了一种使用深度卷积神经网络(DCNNs)来模拟临床表现的新技术。以个体知觉取向评估所用的9个模板的抽取结果作为训练样本。然后,通过在预训练的DCNN架构上应用微调,对每个模板中得分的许多定义偏差进行建模。在106名儿童的样本上评估了所提出的技术的性能。实验结果表明,预训练的DCNNs可以成功地对多个形状的形变进行建模和分类。然而,有些变形比其他变形更可靠。观察到的总体有希望的分类结果证实了我们提出的技术的有效性。
{"title":"Deformation Classification of Drawings for Assessment of Visual-Motor Perceptual Maturity","authors":"Momina Moetesum, I. Siddiqi, N. Vincent","doi":"10.1109/ICDAR.2019.00155","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00155","url":null,"abstract":"Sketches and drawings are popularly employed in clinical psychology to assess the visual-motor and perceptual development in children and adolescents. Drawn responses by subjects are mostly characterized by high degree of deformations that indicates presence of various visual, perceptual and motor disorders. Classification of deformations is a challenging task due to complex and extensive rule representation. In this study, we propose a novel technique to model clinical manifestations using Deep Convolutional Neural Networks (DCNNs). Drawn responses of nine templates used for assessment of perceptual orientation of individuals are employed as training samples. A number of defined deviations scored in each template are then modeled by applying fine tuning on a pre-trained DCNN architecture. Performance of the proposed technique is evaluated on samples of 106 children. Results of experiments show that pre-trained DCNNs can model and classify a number of deformations across multiple shapes with considerable success. Nevertheless some deformations are represented more reliably than the others. Overall promising classification results are observed that substantiate the effectiveness of our proposed technique.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"2005 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128824004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Semi-Synthetic Data Augmentation of Scanned Historical Documents 扫描历史文献的半合成数据增强
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00051
Romain Karpinski, A. Belaïd
This paper proposes a fully automatic new method for generating semi-synthetic images of historical documents to increase the number of training samples in small datasets. This method extracts and mixes background only images (BOI) with text only images (TOI) issued from two different sources to create semi-synthetic images. The TOIs are extracted with the help of a binary mask obtained by binarizing the image. The BOIs are reconstructed from the original image by replacing TOI pixels using an inpainting method. Finally, a TOI can be efficiently integrated in a BOI using the gradient domain, thus creating a new semi-synthetic image. The idea behind this technique is to automatically obtain documents close to real ones with different backgrounds to highlight the content. Experiments are conducted on the public HisDB dataset which contains few labeled images. We show that the proposed method improves the performance results of a semantic segmentation and baseline extraction task.
本文提出了一种全自动生成历史文献半合成图像的新方法,以增加小数据集的训练样本数量。该方法提取并混合来自两个不同来源的纯背景图像(BOI)和纯文本图像(TOI),以创建半合成图像。通过对图像进行二值化得到的二值掩模来提取toi。boi是由原始图像通过替换TOI像素使用一种油漆方法重建。最后,利用梯度域将TOI有效地集成到BOI中,从而生成新的半合成图像。这种技术背后的思想是自动获取具有不同背景的接近真实文档的文档,以突出显示内容。实验在包含少量标记图像的公共HisDB数据集上进行。我们证明了该方法提高了语义分割和基线提取任务的性能结果。
{"title":"Semi-Synthetic Data Augmentation of Scanned Historical Documents","authors":"Romain Karpinski, A. Belaïd","doi":"10.1109/ICDAR.2019.00051","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00051","url":null,"abstract":"This paper proposes a fully automatic new method for generating semi-synthetic images of historical documents to increase the number of training samples in small datasets. This method extracts and mixes background only images (BOI) with text only images (TOI) issued from two different sources to create semi-synthetic images. The TOIs are extracted with the help of a binary mask obtained by binarizing the image. The BOIs are reconstructed from the original image by replacing TOI pixels using an inpainting method. Finally, a TOI can be efficiently integrated in a BOI using the gradient domain, thus creating a new semi-synthetic image. The idea behind this technique is to automatically obtain documents close to real ones with different backgrounds to highlight the content. Experiments are conducted on the public HisDB dataset which contains few labeled images. We show that the proposed method improves the performance results of a semantic segmentation and baseline extraction task.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"21 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116374623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2019 International Conference on Document Analysis and Recognition (ICDAR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1