2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献

英文中文

Residual BiRNN Based Seq2Seq Model with Transition Probability Matrix for Online Handwritten Mathematical Expression Recognition 基于转移概率矩阵残差BiRNN的在线手写数学表达式识别Seq2Seq模型

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00107

Zelin Hong, Ning You, J. Tan, Ning Bi

In this paper, we present a Seq2Seq model for online handwritten mathematical expression recognition (OHMER), which consists of two major parts: a residual bidirectional RNN (BiRNN) based encoder that takes handwritten traces as the input and a transition probability matrix introduced decoder that generates LaTeX notations. We employ residual connection in the BiRNN layers to improve feature extraction. Markovian transition probability matrix is introduced in decoder and long-term information can be used in each decoding step through joint probability. Furthermore, we analyze the impact of the novel encoder and transition probability matrix through several specific instances. Experimental results on the CROHME 2014 and CROHME 2016 competition tasks show that our model outperforms the previous state-of-the-art single model by only using the official training dataset.

在本文中，我们提出了一个用于在线手写数学表达式识别(OHMER)的Seq2Seq模型，该模型由两个主要部分组成:一个基于残差双向RNN (BiRNN)的编码器，该编码器以手写轨迹作为输入;一个引入转移概率矩阵的解码器，该解码器生成LaTeX符号。我们在BiRNN层中使用残差连接来改进特征提取。在解码器中引入马尔可夫转移概率矩阵，通过联合概率在每一步解码中获取长期信息。此外，我们还通过几个具体实例分析了新编码器和转移概率矩阵的影响。在CROHME 2014和CROHME 2016比赛任务上的实验结果表明，我们的模型在只使用官方训练数据集的情况下优于之前最先进的单一模型。

引用次数: 10

HITHCD-2018: Handwritten Chinese Character Database of 21K-Category HITHCD-2018: 21k类手写体汉字数据库

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00222

Tonghua Su, Wei Pan, Lijuan Yu

Current state of handwritten Chinese character recognition (HCCR) conducted on well-confined character set, far from meeting industrial requirements. The paper describes the creation of a large-scale handwritten Chinese character database. Constructing the database is an effort to scale up Chinese handwritten character classification task to cover the full list of GBK character set specification. It consists of 21-thousand Chinese character categories and 20-million character images, larger than previous databases both in scale and diversity. We present solutions to the challenges of collecting and annotating such large-scale handwritten character samples. We elaborately design the sampling strategy, extract salient signals in a systematic way, annotate the tremendous characters through three distinct stages. Experiments are conducted the generalization to other handwritten character databases and our database demonstrates great values. Surely, its scale opens unprecedented opportunities both in evaluation of character recognition algorithms and in developing new techniques.

目前的手写体汉字识别(HCCR)是在受限的字符集上进行的，远远不能满足工业需求。本文描述了一个大型手写体汉字数据库的建立。该数据库的建立是为了将中文手写体字符分类任务扩展到GBK字符集规范的完整列表。它包括2.1万个汉字类别和2000万个汉字图像，在规模和多样性上都超过了以往的数据库。我们提出了收集和注释这种大规模手写字符样本的挑战的解决方案。我们精心设计了采样策略，系统地提取了显著信号，并通过三个不同的阶段标注了大量的特征。将该方法推广到其他手写体字符数据库进行了实验，验证了该数据库的应用价值。当然，它的规模为字符识别算法的评估和新技术的开发提供了前所未有的机会。

引用次数: 1

Training Full-Page Handwritten Text Recognition Models without Annotated Line Breaks 训练全页手写文本识别模型没有注释的换行符

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00011

Chris Tensmeyer, Curtis Wigington

Training Handwritten Text Recognition (HTR) models typically requires large amounts of labeled data which often are line or page images with corresponding line-level ground truth (GT) transcriptions. Many digital collections have page-level transcriptions for each image, but the transcription is unformatted, i.e., line breaks are not annotated. Can we train lined-based HTR models using such data? In this work, we present a novel alignment technique for segmenting page-level GT text into text lines during HTR model training. This text segmentation problem is formulated as an optimization problem to minimize the cost of aligning predicted lines with the GT text. Using both simulated and HTR model predictions, we show that the alignment method identifies line breaks accurately, even when the predicted lines have high character error rates (CER). We removed the GT line breaks from the ICDAR-2017 READ dataset and trained a HTR model using the proposed alignment method to predict line breaks on-the-fly. This model achieves comparable CER w.r.t. to the same model trained with the GT line breaks. Additionally, we downloaded an online digital collection of 50K English journal pages (not curated for HTR research) whose transcriptions do not contain line breaks, and achieve 11% CER.

训练手写文本识别(HTR)模型通常需要大量标记数据，这些数据通常是具有相应的行级地面真值(GT)转录的行或页图像。许多数字集合对每个图像都有页面级别的转录，但转录是未格式化的，即没有注释换行符。我们可以使用这些数据来训练基于线的HTR模型吗?在这项工作中，我们提出了一种新的对齐技术，用于在HTR模型训练期间将页面级GT文本分割为文本行。这个文本分割问题被制定为一个优化问题，以最小化对齐预测线与GT文本的成本。使用模拟和HTR模型预测，我们表明对齐方法可以准确地识别断行，即使预测的行具有高字符错误率(CER)。我们从ICDAR-2017 READ数据集中删除了GT换行，并使用提出的对齐方法训练了一个HTR模型来实时预测换行。该模型与使用GT换行符训练的相同模型实现了可比的CER w.r.t.。此外，我们下载了50K英文期刊页面的在线数字集合(不是为HTR研究设计的)，其转录不包含换行符，并达到11%的CER。

{"title":"Training Full-Page Handwritten Text Recognition Models without Annotated Line Breaks","authors":"Chris Tensmeyer, Curtis Wigington","doi":"10.1109/ICDAR.2019.00011","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00011","url":null,"abstract":"Training Handwritten Text Recognition (HTR) models typically requires large amounts of labeled data which often are line or page images with corresponding line-level ground truth (GT) transcriptions. Many digital collections have page-level transcriptions for each image, but the transcription is unformatted, i.e., line breaks are not annotated. Can we train lined-based HTR models using such data? In this work, we present a novel alignment technique for segmenting page-level GT text into text lines during HTR model training. This text segmentation problem is formulated as an optimization problem to minimize the cost of aligning predicted lines with the GT text. Using both simulated and HTR model predictions, we show that the alignment method identifies line breaks accurately, even when the predicted lines have high character error rates (CER). We removed the GT line breaks from the ICDAR-2017 READ dataset and trained a HTR model using the proposed alignment method to predict line breaks on-the-fly. This model achieves comparable CER w.r.t. to the same model trained with the GT line breaks. Additionally, we downloaded an online digital collection of 50K English journal pages (not curated for HTR research) whose transcriptions do not contain line breaks, and achieve 11% CER.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131560494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Table-of-Contents Generation on Contemporary Documents 当代文献目录生成

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/icdar.2019.00025

Najah-Imane Bentabet, Rémi Juge, Sira Ferradans

The generation of precise and detailed Table-Of-Contents (TOC) from a document is a problem of major importance for document understanding and information extraction. Despite its importance, it is still a challenging task, especially for non-standardized documents with rich layout information such as commercial documents. In this paper, we present a new neural-based pipeline for TOC generation applicable to any searchable document. Unlike previous methods, we do not use semantic labeling nor assume the presence of parsable TOC pages in the document. Moreover, we analyze the influence of using external knowledge encoded as a template. We empirically show that this approach is only useful in a very low resource environment. Finally, we propose a new domain-specific data set that sheds some light on the difficulties of TOC generation in real-world documents. The proposed method shows better performance than the state-of-the-art on a public data set and on the newly released data set.

从文档中生成精确、详细的目录(Table-Of-Contents, TOC)是文档理解和信息提取的一个重要问题。尽管它很重要，但它仍然是一项具有挑战性的任务，特别是对于具有丰富布局信息的非标准化文档，如商业文档。在本文中，我们提出了一种新的基于神经的TOC生成管道，适用于任何可搜索的文档。与以前的方法不同，我们不使用语义标记，也不假设文档中存在可解析的TOC页面。此外，我们还分析了使用外部知识编码作为模板的影响。我们的经验表明，这种方法只在资源非常少的环境中有用。最后，我们提出了一个新的特定于领域的数据集，它揭示了现实世界文档中TOC生成的一些困难。该方法在公共数据集和新发布的数据集上都表现出比现有方法更好的性能。

引用次数: 13

Instance Aware Document Image Segmentation using Label Pyramid Networks and Deep Watershed Transformation 基于标签金字塔网络和深度分水岭变换的实例感知文档图像分割

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00088

Xiaohui Li, Fei Yin, Tao Xue, Long Liu, J. Ogier, Cheng-Lin Liu

Segmentation of complex document images remains a challenge due to the large variability of layout and image degradation. In this paper, we propose a method to segment complex document images based on Label Pyramid Network (LPN) and Deep Watershed Transform (DWT). The method can segment document images into instance aware regions including text lines, text regions, figures, tables, etc. The backbone of LPN can be any type of Fully Convolutional Networks (FCN), and in training, label map pyramids on training images are provided to exploit the hierarchical boundary information of regions efficiently through multi-task learning. The label map pyramid is transformed from region class label map by distance transformation and multi-level thresholding. In segmentation, the outputs of multiple tasks of LPN are summed into one single probability map, on which watershed transformation is carried out to segment the document image into instance aware regions. In experiments on four public databases, our method is demonstrated effective and superior, yielding state of the art performance for text line segmentation, baseline detection and region segmentation.

复杂文档图像的分割仍然是一个挑战，由于大变异性的布局和图像退化。提出了一种基于标签金字塔网络(LPN)和深度分水岭变换(DWT)的复杂文档图像分割方法。该方法可以将文档图像分割为实例感知区域，包括文本行、文本区域、图形、表格等。LPN的主干可以是任意类型的全卷积网络(FCN)，在训练中，在训练图像上提供标签映射金字塔，通过多任务学习有效地利用区域的分层边界信息。通过距离变换和多级阈值分割，将区域类标签映射转化为标签映射金字塔。在分割中，将LPN的多个任务的输出求和成一个概率图，在此概率图上进行分水岭变换，将文档图像分割成实例感知区域。在四个公共数据库的实验中，我们的方法被证明是有效和优越的，在文本行分割、基线检测和区域分割方面产生了最先进的性能。

{"title":"Instance Aware Document Image Segmentation using Label Pyramid Networks and Deep Watershed Transformation","authors":"Xiaohui Li, Fei Yin, Tao Xue, Long Liu, J. Ogier, Cheng-Lin Liu","doi":"10.1109/ICDAR.2019.00088","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00088","url":null,"abstract":"Segmentation of complex document images remains a challenge due to the large variability of layout and image degradation. In this paper, we propose a method to segment complex document images based on Label Pyramid Network (LPN) and Deep Watershed Transform (DWT). The method can segment document images into instance aware regions including text lines, text regions, figures, tables, etc. The backbone of LPN can be any type of Fully Convolutional Networks (FCN), and in training, label map pyramids on training images are provided to exploit the hierarchical boundary information of regions efficiently through multi-task learning. The label map pyramid is transformed from region class label map by distance transformation and multi-level thresholding. In segmentation, the outputs of multiple tasks of LPN are summed into one single probability map, on which watershed transformation is carried out to segment the document image into instance aware regions. In experiments on four public databases, our method is demonstrated effective and superior, yielding state of the art performance for text line segmentation, baseline detection and region segmentation.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134315883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A Meaningful Information Extraction System for Interactive Analysis of Documents 面向文档交互分析的有意义信息抽取系统

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00024

Julien Maître, M. Ménard, Guillaume Chiron, A. Bouju, Nicolas Sidère

This paper is related to a project aiming at discovering weak signals from different streams of information, possibly sent by whistleblowers. The study presented in this paper tackles the particular problem of clustering topics at multi-levels from multiple documents, and then extracting meaningful descriptors, such as weighted lists of words for document representations in a multi-dimensions space. In this context, we present a novel idea which combines Latent Dirichlet Allocation and Word2vec (providing a consistency metric regarding the partitioned topics) as potential method for limiting the "a priori" number of cluster K usually needed in classical partitioning approaches. We proposed 2 implementations of this idea, respectively able to: (1) finding the best K for LDA in terms of topic consistency; (2) gathering the optimal clusters from different levels of clustering. We also proposed a non-traditional visualization approach based on a multi-agents system which combines both dimension reduction and interactivity.

本文与一个项目有关，该项目旨在发现来自不同信息流的微弱信号，这些信息流可能是由举报人发送的。本文的研究解决了从多个文档中进行多层次主题聚类的特殊问题，然后提取有意义的描述符，例如多维空间中用于文档表示的加权词表。在这种情况下，我们提出了一种新的想法，它结合了潜狄利克雷分配和Word2vec(提供关于分区主题的一致性度量)作为限制经典分区方法中通常需要的聚类K的“先验”数量的潜在方法。我们提出了这一思想的两种实现，分别能够:(1)在主题一致性方面找到LDA的最佳K;(2)从不同层次的聚类中聚出最优聚类。我们还提出了一种基于多智能体系统的非传统可视化方法，该方法结合了降维和交互性。

引用次数: 5

Do You Need More Data? The DeepSignDB On-Line Handwritten Signature Biometric Database 你需要更多的数据吗?DeepSignDB在线手写签名生物识别数据库

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00185

Rubén Tolosana, R. Vera-Rodríguez, Julian Fierrez, A. Morales, J. Ortega-Garcia

Data have become one of the most valuable things in this new era where deep learning technology seems to overcome traditional approaches. However, in some tasks, such as the verification of handwritten signatures, the amount of publicly available data is scarce, what makes difficult to test the real limits of deep learning. In addition to the lack of public data, it is not easy to evaluate the improvements of novel approaches compared with the state of the art as different experimental protocols and conditions are usually considered for different signature databases. To tackle all these mentioned problems, the main contribution of this study is twofold: i) we present and describe the new DeepSignDB on-line handwritten signature biometric public database, and ii) we propose a standard experimental protocol and benchmark to be used for the research community in order to perform a fair comparison of novel approaches with the state of the art. The DeepSignDB database is obtained through the combination of some of the most popular on-line signature databases, and a novel dataset not presented yet. It comprises more than 70K signatures acquired using both stylus and finger inputs from a total of 1526 users. Two acquisition scenarios are considered, office and mobile, with a total of 8 different devices. Additionally, different types of impostors and number of acquisition sessions are considered along the database. The DeepSignDB and benchmark results are available in GitHub.

在这个深度学习技术似乎超越传统方法的新时代，数据已经成为最有价值的东西之一。然而，在某些任务中，例如验证手写签名，公开可用的数据量很少，这使得很难测试深度学习的真正局限性。除了缺乏公共数据外，与现有技术相比，评估新方法的改进并不容易，因为不同的特征数据库通常考虑不同的实验方案和条件。为了解决所有这些提到的问题，本研究的主要贡献是双重的:i)我们提出并描述了新的DeepSignDB在线手写签名生物识别公共数据库，ii)我们提出了一个标准的实验协议和基准，用于研究界，以便对新方法与最先进的方法进行公平的比较。DeepSignDB数据库是通过结合一些最流行的在线特征数据库和一个尚未出现的新数据集而获得的。它包括超过70K的签名，使用手写笔和手指输入，从总共1526个用户。考虑办公和移动两种收购场景，共8种不同的设备。此外，数据库还考虑了不同类型的冒名顶替者和获取会话的数量。DeepSignDB和基准测试结果可在GitHub中获得。

{"title":"Do You Need More Data? The DeepSignDB On-Line Handwritten Signature Biometric Database","authors":"Rubén Tolosana, R. Vera-Rodríguez, Julian Fierrez, A. Morales, J. Ortega-Garcia","doi":"10.1109/ICDAR.2019.00185","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00185","url":null,"abstract":"Data have become one of the most valuable things in this new era where deep learning technology seems to overcome traditional approaches. However, in some tasks, such as the verification of handwritten signatures, the amount of publicly available data is scarce, what makes difficult to test the real limits of deep learning. In addition to the lack of public data, it is not easy to evaluate the improvements of novel approaches compared with the state of the art as different experimental protocols and conditions are usually considered for different signature databases. To tackle all these mentioned problems, the main contribution of this study is twofold: i) we present and describe the new DeepSignDB on-line handwritten signature biometric public database, and ii) we propose a standard experimental protocol and benchmark to be used for the research community in order to perform a fair comparison of novel approaches with the state of the art. The DeepSignDB database is obtained through the combination of some of the most popular on-line signature databases, and a novel dataset not presented yet. It comprises more than 70K signatures acquired using both stylus and finger inputs from a total of 1526 users. Two acquisition scenarios are considered, office and mobile, with a total of 8 different devices. Additionally, different types of impostors and number of acquisition sessions are considered along the database. The DeepSignDB and benchmark results are available in GitHub.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123879340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Rethinking Semantic Segmentation for Table Structure Recognition in Documents 对文档表结构识别语义分割的再思考

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00225

Shoaib Ahmed Siddiqui, Pervaiz Iqbal Khan, A. Dengel, Sheraz Ahmed

Based on the recent advancements in the domain of semantic segmentation, Fully-Convolutional Networks (FCN) have been successfully applied for the task of table structure recognition in the past. We analyze the efficacy of semantic segmentation networks for this purpose and simplify the problem by proposing prediction tiling based on the consistency assumption which holds for tabular structures. For an image of dimensions H × W, we predict a single column for the rows (ŷ_row ∊ H) and a predict a single row for the columns (ŷ_row ∊ W). We use a dual-headed architecture where initial feature maps (from the encoder-decoder model) are shared while the last two layers generate class specific (row/column) predictions. This allows us to generate predictions using a single model for both rows and columns simultaneously, where previous methods relied on two separate models for inference. With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13 image-based table structure recognition dataset with an average F-Measure of 92.39% (91.90% and 92.88% F-Measure for rows and columns respectively). With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13. The obtained results advocate that constraining the problem space in the case of FCN by imposing valid constraints can lead to significant performance gains.

基于语义分割领域的最新进展，全卷积网络(FCN)已经成功地应用于表结构识别任务。我们为此目的分析了语义分割网络的有效性，并通过提出基于一致性假设的预测平铺来简化问题，该假设适用于表格结构。对于尺寸为H × W的图像，我们预测行(ŷ_row H)为单列，列(ŷ_row W)为单行。我们使用双头架构，其中初始特征映射(来自编码器-解码器模型)是共享的，而最后两层生成特定于类(行/列)的预测。这允许我们同时使用单个模型对行和列生成预测，而以前的方法依赖于两个单独的模型进行推理。使用该方法，我们能够在ICDAR-13基于图像的表结构识别数据集上获得最先进的结果，平均F-Measure为92.39%(行和列分别为91.90%和92.88%)。通过提出的方法，我们能够在ICDAR-13上获得最先进的结果。所获得的结果表明，通过施加有效约束来约束FCN的问题空间可以显著提高性能。

{"title":"Rethinking Semantic Segmentation for Table Structure Recognition in Documents","authors":"Shoaib Ahmed Siddiqui, Pervaiz Iqbal Khan, A. Dengel, Sheraz Ahmed","doi":"10.1109/ICDAR.2019.00225","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00225","url":null,"abstract":"Based on the recent advancements in the domain of semantic segmentation, Fully-Convolutional Networks (FCN) have been successfully applied for the task of table structure recognition in the past. We analyze the efficacy of semantic segmentation networks for this purpose and simplify the problem by proposing prediction tiling based on the consistency assumption which holds for tabular structures. For an image of dimensions H × W, we predict a single column for the rows (ŷ_row ∊ H) and a predict a single row for the columns (ŷ_row ∊ W). We use a dual-headed architecture where initial feature maps (from the encoder-decoder model) are shared while the last two layers generate class specific (row/column) predictions. This allows us to generate predictions using a single model for both rows and columns simultaneously, where previous methods relied on two separate models for inference. With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13 image-based table structure recognition dataset with an average F-Measure of 92.39% (91.90% and 92.88% F-Measure for rows and columns respectively). With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13. The obtained results advocate that constraining the problem space in the case of FCN by imposing valid constraints can lead to significant performance gains.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124011225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Deformation Classification of Drawings for Assessment of Visual-Motor Perceptual Maturity 基于视觉-运动知觉成熟度评价的图形变形分类

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00155

Momina Moetesum, I. Siddiqi, N. Vincent

Sketches and drawings are popularly employed in clinical psychology to assess the visual-motor and perceptual development in children and adolescents. Drawn responses by subjects are mostly characterized by high degree of deformations that indicates presence of various visual, perceptual and motor disorders. Classification of deformations is a challenging task due to complex and extensive rule representation. In this study, we propose a novel technique to model clinical manifestations using Deep Convolutional Neural Networks (DCNNs). Drawn responses of nine templates used for assessment of perceptual orientation of individuals are employed as training samples. A number of defined deviations scored in each template are then modeled by applying fine tuning on a pre-trained DCNN architecture. Performance of the proposed technique is evaluated on samples of 106 children. Results of experiments show that pre-trained DCNNs can model and classify a number of deformations across multiple shapes with considerable success. Nevertheless some deformations are represented more reliably than the others. Overall promising classification results are observed that substantiate the effectiveness of our proposed technique.

在临床心理学中，素描和绘画被广泛用于评估儿童和青少年的视觉运动和知觉发展。受试者绘制的反应大多具有高度变形的特征，表明存在各种视觉、知觉和运动障碍。由于规则表示复杂而广泛，变形分类是一项具有挑战性的任务。在这项研究中，我们提出了一种使用深度卷积神经网络(DCNNs)来模拟临床表现的新技术。以个体知觉取向评估所用的9个模板的抽取结果作为训练样本。然后，通过在预训练的DCNN架构上应用微调，对每个模板中得分的许多定义偏差进行建模。在106名儿童的样本上评估了所提出的技术的性能。实验结果表明，预训练的DCNNs可以成功地对多个形状的形变进行建模和分类。然而，有些变形比其他变形更可靠。观察到的总体有希望的分类结果证实了我们提出的技术的有效性。

{"title":"Deformation Classification of Drawings for Assessment of Visual-Motor Perceptual Maturity","authors":"Momina Moetesum, I. Siddiqi, N. Vincent","doi":"10.1109/ICDAR.2019.00155","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00155","url":null,"abstract":"Sketches and drawings are popularly employed in clinical psychology to assess the visual-motor and perceptual development in children and adolescents. Drawn responses by subjects are mostly characterized by high degree of deformations that indicates presence of various visual, perceptual and motor disorders. Classification of deformations is a challenging task due to complex and extensive rule representation. In this study, we propose a novel technique to model clinical manifestations using Deep Convolutional Neural Networks (DCNNs). Drawn responses of nine templates used for assessment of perceptual orientation of individuals are employed as training samples. A number of defined deviations scored in each template are then modeled by applying fine tuning on a pre-trained DCNN architecture. Performance of the proposed technique is evaluated on samples of 106 children. Results of experiments show that pre-trained DCNNs can model and classify a number of deformations across multiple shapes with considerable success. Nevertheless some deformations are represented more reliably than the others. Overall promising classification results are observed that substantiate the effectiveness of our proposed technique.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"2005 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128824004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Semi-Synthetic Data Augmentation of Scanned Historical Documents 扫描历史文献的半合成数据增强

2019 International Conference on Document Analysis and Recognition (ICDAR)

Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00051

Romain Karpinski, A. Belaïd

This paper proposes a fully automatic new method for generating semi-synthetic images of historical documents to increase the number of training samples in small datasets. This method extracts and mixes background only images (BOI) with text only images (TOI) issued from two different sources to create semi-synthetic images. The TOIs are extracted with the help of a binary mask obtained by binarizing the image. The BOIs are reconstructed from the original image by replacing TOI pixels using an inpainting method. Finally, a TOI can be efficiently integrated in a BOI using the gradient domain, thus creating a new semi-synthetic image. The idea behind this technique is to automatically obtain documents close to real ones with different backgrounds to highlight the content. Experiments are conducted on the public HisDB dataset which contains few labeled images. We show that the proposed method improves the performance results of a semantic segmentation and baseline extraction task.

本文提出了一种全自动生成历史文献半合成图像的新方法，以增加小数据集的训练样本数量。该方法提取并混合来自两个不同来源的纯背景图像(BOI)和纯文本图像(TOI)，以创建半合成图像。通过对图像进行二值化得到的二值掩模来提取toi。boi是由原始图像通过替换TOI像素使用一种油漆方法重建。最后，利用梯度域将TOI有效地集成到BOI中，从而生成新的半合成图像。这种技术背后的思想是自动获取具有不同背景的接近真实文档的文档，以突出显示内容。实验在包含少量标记图像的公共HisDB数据集上进行。我们证明了该方法提高了语义分割和基线提取任务的性能结果。

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 International Conference on Document Analysis and Recognition (ICDAR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀