首页 > 最新文献

2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献

英文 中文
Care Label Recognition 护理标签识别
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00158
Jiri Kralicek, Jiri Matas, M. Busta
The paper introduces the problem of care label recognition and presents a method addressing it. A care label, also called a care tag, is a small piece of cloth or paper attached to a garment providing instructions for its maintenance and information about e.g. the material and size. The informationand instructions are written as symbols or plain text. Care label recognition is a challenging text and pictogram recognition problem - the often sewn text is small, looking as if printed using a non-standard font; the contrast of the text gradually fades, making OCR progressively more difficult. On the other hand, the information provided is typically redundant and thus it facilitates semi-supervised learning. The presented care label recognition method is based on the recently published End-to-End Method for Multi-LanguageScene Text, E2E-MLT, Busta et al. 2018, exploiting specific constraints, e.g. a care label vocabulary with multi-language equivalences. Experiments conducted on a newly-created dataset of 63 care label images show that even when exploiting problem-specific constraints, a state-of-the-art scene text detection and recognition method achieve precision and recall slightly above 0.6, confirming the challenging nature of the problem.
介绍了护理标签识别问题,并提出了一种解决该问题的方法。护理标签,也叫护理标签,是贴在衣服上的一小块布或纸,提供保养说明和诸如材料和尺寸等信息。信息和说明以符号或纯文本的形式书写。护理标签识别是一个具有挑战性的文本和象形文字识别问题-通常缝制的文本很小,看起来好像使用非标准字体打印;文本的对比度逐渐减弱,使OCR逐渐变得更加困难。另一方面,所提供的信息通常是冗余的,因此它有利于半监督学习。提出的护理标签识别方法基于最近发表的多语言场景文本的端到端方法,E2E-MLT, Busta等人。2018,利用特定的约束,例如具有多语言等价的护理标签词汇表。在新创建的63张护理标签图像数据集上进行的实验表明,即使在利用特定于问题的约束条件时,最先进的场景文本检测和识别方法的精度和召回率也略高于0.6,这证实了问题的挑战性。
{"title":"Care Label Recognition","authors":"Jiri Kralicek, Jiri Matas, M. Busta","doi":"10.1109/ICDAR.2019.00158","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00158","url":null,"abstract":"The paper introduces the problem of care label recognition and presents a method addressing it. A care label, also called a care tag, is a small piece of cloth or paper attached to a garment providing instructions for its maintenance and information about e.g. the material and size. The informationand instructions are written as symbols or plain text. Care label recognition is a challenging text and pictogram recognition problem - the often sewn text is small, looking as if printed using a non-standard font; the contrast of the text gradually fades, making OCR progressively more difficult. On the other hand, the information provided is typically redundant and thus it facilitates semi-supervised learning. The presented care label recognition method is based on the recently published End-to-End Method for Multi-LanguageScene Text, E2E-MLT, Busta et al. 2018, exploiting specific constraints, e.g. a care label vocabulary with multi-language equivalences. Experiments conducted on a newly-created dataset of 63 care label images show that even when exploiting problem-specific constraints, a state-of-the-art scene text detection and recognition method achieve precision and recall slightly above 0.6, confirming the challenging nature of the problem.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134355075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Welcome Message from the Honorary Chair 名誉主席致欢迎辞
Pub Date : 2019-09-01 DOI: 10.1109/icdar.2019.00005
H. Makino
{"title":"Welcome Message from the Honorary Chair","authors":"H. Makino","doi":"10.1109/icdar.2019.00005","DOIUrl":"https://doi.org/10.1109/icdar.2019.00005","url":null,"abstract":"","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131569652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid DBLSTM-SVM Based Beta-Elliptic-CNN Models for Online Arabic Characters Recognition 基于DBLSTM-SVM的β -椭圆- cnn混合模型在线识别阿拉伯字符
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00093
Y. Hamdi, H. Boubaker, Thameur Dhieb, A. Elbaati, A. Alimi
The deep learning-based approaches have proven highly successful in handwriting recognition which represents a challenging task that satisfies its increasingly broad application in mobile devices. Recently, several research initiatives in the area of pattern recognition studies have been introduced. The challenge is more earnest for Arabic scripts due to the inherent cursiveness of their characters, the existence of several groups of similar shape characters, large sizes of respective alphabets, etc. In this paper, we propose an online Arabic character recognition system based on hybrid Beta-Elliptic model (BEM) and convolutional neural network (CNN) feature extractor models and combining deep bidirectional long short-term memory (DBLSTM) and support vector machine (SVM) classifiers. First, we use the extracted online and offline features to make the classification and compare the performance of single classifiers. Second, we proceed by combining the two types of feature-based systems using different combination methods to enhance the global system discriminating power. We have evaluated our system using LMCA and Online-KHATT databases. The obtained recognition rate is in a maximum of 95.48% and 91.55% for the individual systems using the two databases respectively. The combination of the on-line and off-line systems allows improving the accuracy rate to 99.11% and 93.98% using the same databases which exceed the best result for other state-of-the-art systems.
基于深度学习的方法在手写识别方面取得了巨大的成功,这是一项具有挑战性的任务,满足了其在移动设备上日益广泛的应用。近年来,在模式识别研究领域有了一些新的研究成果。阿拉伯文字面临的挑战更为严峻,因为它们的字符固有的草莽性,存在几组形状相似的字符,各自的字母都很大,等等。本文提出了一种基于混合β -椭圆模型(BEM)和卷积神经网络(CNN)特征提取器模型,结合深度双向长短期记忆(DBLSTM)和支持向量机(SVM)分类器的在线阿拉伯文字符识别系统。首先,我们使用提取的在线和离线特征进行分类,并比较单个分类器的性能。其次,采用不同的组合方法将两类基于特征的系统进行组合,增强系统的全局识别能力。我们使用LMCA和Online-KHATT数据库对我们的系统进行了评估。使用两种数据库的单个系统的识别率最高分别为95.48%和91.55%。在线和离线系统的结合使得使用相同的数据库将准确率提高到99.11%和93.98%,超过了其他最先进系统的最佳结果。
{"title":"Hybrid DBLSTM-SVM Based Beta-Elliptic-CNN Models for Online Arabic Characters Recognition","authors":"Y. Hamdi, H. Boubaker, Thameur Dhieb, A. Elbaati, A. Alimi","doi":"10.1109/ICDAR.2019.00093","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00093","url":null,"abstract":"The deep learning-based approaches have proven highly successful in handwriting recognition which represents a challenging task that satisfies its increasingly broad application in mobile devices. Recently, several research initiatives in the area of pattern recognition studies have been introduced. The challenge is more earnest for Arabic scripts due to the inherent cursiveness of their characters, the existence of several groups of similar shape characters, large sizes of respective alphabets, etc. In this paper, we propose an online Arabic character recognition system based on hybrid Beta-Elliptic model (BEM) and convolutional neural network (CNN) feature extractor models and combining deep bidirectional long short-term memory (DBLSTM) and support vector machine (SVM) classifiers. First, we use the extracted online and offline features to make the classification and compare the performance of single classifiers. Second, we proceed by combining the two types of feature-based systems using different combination methods to enhance the global system discriminating power. We have evaluated our system using LMCA and Online-KHATT databases. The obtained recognition rate is in a maximum of 95.48% and 91.55% for the individual systems using the two databases respectively. The combination of the on-line and off-line systems allows improving the accuracy rate to 99.11% and 93.98% using the same databases which exceed the best result for other state-of-the-art systems.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131800897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Data Augmentation via Adversarial Networks for Optical Character Recognition/Conference Submissions 通过对抗性网络进行光学字符识别的数据增强/会议提交
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00038
Victor Storchan
With the ongoing digitalization of ressources across the industry, robust OCR solutions (Optical Character Recognition) are highly valuable. In this work, we aim at designing models to read typical damaged faxes and PDF files and training them with unlabeled data. State-of-art deep learning architectures require scalable tagged datasets that are often difficult and costly to collect. To ensure compliance standards or to provide reproducible cheap and fast solutions for training OCR systems, producing datasets that mimic the quality of the data that will be passed to the model is paramount. In this paper we discuss using unsupervised image-to-image translation methods to learn transformations that aim to map clean images of words to damaged images of words. The quality of the transformation is evaluated through the OCR brick and these results are compared to the Inception Score (IS) of the GANs we used. That way we are able to generate an arbitrary large realistic dataset without labeling a single observation. As a result, we propose an end-to-end OCR training solution to provide competitive models.
随着整个行业资源的持续数字化,强大的OCR解决方案(光学字符识别)是非常有价值的。在这项工作中,我们的目标是设计模型来读取典型的损坏传真和PDF文件,并使用未标记的数据训练它们。最先进的深度学习架构需要可扩展的标记数据集,这些数据集通常很难收集且成本高昂。为了确保符合标准或为训练OCR系统提供可重复的廉价和快速的解决方案,生成模仿将传递给模型的数据质量的数据集是至关重要的。在本文中,我们讨论了使用无监督图像到图像的翻译方法来学习旨在将干净的单词图像映射到损坏的单词图像的转换。通过OCR块评估转换的质量,并将这些结果与我们使用的gan的Inception Score (is)进行比较。这样,我们就可以生成任意的大型真实数据集,而无需标记单个观察结果。因此,我们提出了一个端到端的OCR训练解决方案,以提供有竞争力的模型。
{"title":"Data Augmentation via Adversarial Networks for Optical Character Recognition/Conference Submissions","authors":"Victor Storchan","doi":"10.1109/ICDAR.2019.00038","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00038","url":null,"abstract":"With the ongoing digitalization of ressources across the industry, robust OCR solutions (Optical Character Recognition) are highly valuable. In this work, we aim at designing models to read typical damaged faxes and PDF files and training them with unlabeled data. State-of-art deep learning architectures require scalable tagged datasets that are often difficult and costly to collect. To ensure compliance standards or to provide reproducible cheap and fast solutions for training OCR systems, producing datasets that mimic the quality of the data that will be passed to the model is paramount. In this paper we discuss using unsupervised image-to-image translation methods to learn transformations that aim to map clean images of words to damaged images of words. The quality of the transformation is evaluated through the OCR brick and these results are compared to the Inception Score (IS) of the GANs we used. That way we are able to generate an arbitrary large realistic dataset without labeling a single observation. As a result, we propose an end-to-end OCR training solution to provide competitive models.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131198391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Content Extraction from Lecture Video via Speaker Action Classification Based on Pose Information 基于姿态信息的演讲者动作分类的演讲视频内容提取
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00171
Fei Xu, Kenny Davila, S. Setlur, V. Govindaraju
Online lecture videos are increasingly important e-learning materials for students. Automated content extraction from lecture videos facilitates information retrieval applications that improve access to the lecture material. A significant number of lecture videos include the speaker in the image. Speakers perform various semantically meaningful actions during the process of teaching. Among all the movements of the speaker, key actions such as writing or erasing potentially indicate important features directly related to the lecture content. In this paper, we present a methodology for lecture video content extraction using the speaker actions. Each lecture video is divided into small temporal units called action segments. Using a pose estimator, body and hands skeleton data are extracted and used to compute motion-based features describing each action segment. Then, the dominant speaker action of each of these segments is classified using Random forests and the motion-based features. With the temporal and spatial range of these actions, we implement an alternative way to draw key-frames of handwritten content from the video. In addition, for our fixed camera videos, we also use the skeleton data to compute a mask of the speaker writing locations for the subtraction of the background noise from the binarized key-frames. Our method has been tested on a publicly available lecture video dataset, and it shows reasonable recall and precision results, with a very good compression ratio which is better than previous methods based on content analysis.
在线讲座视频是学生越来越重要的电子学习材料。从讲座视频中自动提取内容有助于信息检索应用程序,从而改善对讲座材料的访问。很多讲座视频的图像中都有讲者。在教学过程中,说话者会做出各种有语义意义的动作。在演讲者的所有动作中,关键动作,如写或擦除,可能表明与演讲内容直接相关的重要特征。在本文中,我们提出了一种使用演讲者动作提取讲座视频内容的方法。每个讲座视频被分成小的时间单元,称为动作片段。使用姿态估计器,提取身体和手部骨骼数据并用于计算描述每个动作段的基于运动的特征。然后,使用随机森林和基于运动的特征对每个片段的主要说话人动作进行分类。利用这些动作的时间和空间范围,我们实现了一种从视频中绘制手写内容的关键帧的替代方法。此外,对于我们的固定摄像机视频,我们还使用骨架数据来计算扬声器写入位置的掩码,以便从二值化的关键帧中减去背景噪声。我们的方法已经在一个公开的讲座视频数据集上进行了测试,显示出合理的查全率和查准率结果,压缩比非常好,优于以往基于内容分析的方法。
{"title":"Content Extraction from Lecture Video via Speaker Action Classification Based on Pose Information","authors":"Fei Xu, Kenny Davila, S. Setlur, V. Govindaraju","doi":"10.1109/ICDAR.2019.00171","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00171","url":null,"abstract":"Online lecture videos are increasingly important e-learning materials for students. Automated content extraction from lecture videos facilitates information retrieval applications that improve access to the lecture material. A significant number of lecture videos include the speaker in the image. Speakers perform various semantically meaningful actions during the process of teaching. Among all the movements of the speaker, key actions such as writing or erasing potentially indicate important features directly related to the lecture content. In this paper, we present a methodology for lecture video content extraction using the speaker actions. Each lecture video is divided into small temporal units called action segments. Using a pose estimator, body and hands skeleton data are extracted and used to compute motion-based features describing each action segment. Then, the dominant speaker action of each of these segments is classified using Random forests and the motion-based features. With the temporal and spatial range of these actions, we implement an alternative way to draw key-frames of handwritten content from the video. In addition, for our fixed camera videos, we also use the skeleton data to compute a mask of the speaker writing locations for the subtraction of the background noise from the binarized key-frames. Our method has been tested on a publicly available lecture video dataset, and it shows reasonable recall and precision results, with a very good compression ratio which is better than previous methods based on content analysis.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115481638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Quality and Time Assessment of Binarization Algorithms 二值化算法的质量和时间评价
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00232
R. Lins, R. Bernardino, D. Jesus
Binarization algorithms are an important step in most document analysis and recognition applications. Many aspects of the document affect the performance of binarization algorithms, such as paper texture and color, noises such as the back-to-front interference, stains, and even the type and color of the ink. This work focuses on determining how each document characteristic impacts the time to process and the quality of the binarized image. This paper assesses thirty of the most widely used document binarization algorithms.
二值化算法是大多数文档分析和识别应用中的重要步骤。文档的许多方面都会影响二值化算法的性能,例如纸张的纹理和颜色、前后干扰等噪声、污渍,甚至墨水的类型和颜色。这项工作的重点是确定每个文档特征如何影响处理时间和二值化图像的质量。本文评估了30种最广泛使用的文档二值化算法。
{"title":"A Quality and Time Assessment of Binarization Algorithms","authors":"R. Lins, R. Bernardino, D. Jesus","doi":"10.1109/ICDAR.2019.00232","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00232","url":null,"abstract":"Binarization algorithms are an important step in most document analysis and recognition applications. Many aspects of the document affect the performance of binarization algorithms, such as paper texture and color, noises such as the back-to-front interference, stains, and even the type and color of the ink. This work focuses on determining how each document characteristic impacts the time to process and the quality of the binarized image. This paper assesses thirty of the most widely used document binarization algorithms.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"263 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115665617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
DICE: Deep Intelligent Contextual Embedding for Twitter Sentiment Analysis DICE: Twitter情感分析的深度智能上下文嵌入
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00157
Usman Naseem, Katarzyna Musial
The sentiment analysis of the social media-based short text (e.g., Twitter messages) is very valuable for many good reasons, explored increasingly in different communities such as text analysis, social media analysis, and recommendation. However, it is challenging as tweet-like social media text is often short, informal and noisy, and involves language ambiguity such as polysemy. The existing sentiment analysis approaches are mainly for document and clean textual data. Accordingly, we propose a Deep Intelligent Contextual Embedding (DICE), which enhances the tweet quality by handling noises within contexts, and then integrates four embeddings to involve polysemy in context, semantics, syntax, and sentiment knowledge of words in a tweet. DICE is then fed to a Bi-directional Long Short Term Memory (BiLSTM) network with attention to determine the sentiment of a tweet. The experimental results show that our model outperforms several baselines of both classic classifiers and combinations of various word embedding models in the sentiment analysis of airline-related tweets.
基于社交媒体的短文本(例如Twitter消息)的情感分析非常有价值,有很多很好的理由,在文本分析、社交媒体分析和推荐等不同的社区中得到了越来越多的探索。然而,这是一个挑战,因为类似推特的社交媒体文本通常简短、非正式、嘈杂,并且涉及语言歧义,如一词多义。现有的情感分析方法主要针对文档和干净文本数据。因此,我们提出了一种深度智能上下文嵌入(DICE),该方法通过处理上下文中的噪声来提高推文质量,然后集成四种嵌入,包括上下文中的多义性、语义、语法和推文中单词的情感知识。然后将DICE与注意力一起输入双向长短期记忆(BiLSTM)网络,以确定tweet的情绪。实验结果表明,我们的模型在航空公司相关推文的情感分析中优于经典分类器和各种词嵌入模型组合的几个基线。
{"title":"DICE: Deep Intelligent Contextual Embedding for Twitter Sentiment Analysis","authors":"Usman Naseem, Katarzyna Musial","doi":"10.1109/ICDAR.2019.00157","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00157","url":null,"abstract":"The sentiment analysis of the social media-based short text (e.g., Twitter messages) is very valuable for many good reasons, explored increasingly in different communities such as text analysis, social media analysis, and recommendation. However, it is challenging as tweet-like social media text is often short, informal and noisy, and involves language ambiguity such as polysemy. The existing sentiment analysis approaches are mainly for document and clean textual data. Accordingly, we propose a Deep Intelligent Contextual Embedding (DICE), which enhances the tweet quality by handling noises within contexts, and then integrates four embeddings to involve polysemy in context, semantics, syntax, and sentiment knowledge of words in a tweet. DICE is then fed to a Bi-directional Long Short Term Memory (BiLSTM) network with attention to determine the sentiment of a tweet. The experimental results show that our model outperforms several baselines of both classic classifiers and combinations of various word embedding models in the sentiment analysis of airline-related tweets.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114394967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Synthesis of Handwriting Dynamics using Sinusoidal Model 用正弦模型合成手写动力学
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00144
Himakshi Choudhury, S. Prasanna
Handwriting production is a complex mechanism of fine motor control, associated with mainly two degrees of freedom in the horizontal and vertical directions. The relation between the horizontal and vertical velocities depends on the trajectory shape and its length. In this work, we explore the generation of handwriting velocities using two sinusoidal oscillations. The proposed method follows the motor equivalence theory and considers that the patterns are stored in the form of a sequence of corner shapes and its relative location in the letter. These points are referred to as the modulation points, where the parameters of the sinusoidal oscillations are modulated to generate required velocity profiles. Depending on the location and shape of the corners, the amplitude, phase, and frequency relations between the two underlying oscillations are changed. Accordingly, this paper presents an efficient method to synthesize the velocity profiles and hence the handwriting. Further, the shape variability in the synthesized data can also be introduced by modifying the position of the modulation points and its corner shapes. The quality of the synthesized handwriting is evaluated using both subjective and quantitative evaluation methods.
手写是精细运动控制的复杂机制,主要与水平和垂直方向的两个自由度有关。水平速度和垂直速度之间的关系取决于轨迹的形状和长度。在这项工作中,我们探索了使用两个正弦振荡的手写速度的生成。所提出的方法遵循电机等效理论,并认为模式以角形状序列及其在字母中的相对位置的形式存储。这些点被称为调制点,其中正弦振荡的参数被调制以产生所需的速度剖面。根据角的位置和形状,两个底层振荡之间的幅度、相位和频率关系会发生变化。因此,本文提出了一种有效的方法来合成速度曲线,从而实现手写。此外,还可以通过修改调制点的位置及其角的形状来引入合成数据中的形状可变性。采用主观评价和定量评价两种方法对合成笔迹的质量进行了评价。
{"title":"Synthesis of Handwriting Dynamics using Sinusoidal Model","authors":"Himakshi Choudhury, S. Prasanna","doi":"10.1109/ICDAR.2019.00144","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00144","url":null,"abstract":"Handwriting production is a complex mechanism of fine motor control, associated with mainly two degrees of freedom in the horizontal and vertical directions. The relation between the horizontal and vertical velocities depends on the trajectory shape and its length. In this work, we explore the generation of handwriting velocities using two sinusoidal oscillations. The proposed method follows the motor equivalence theory and considers that the patterns are stored in the form of a sequence of corner shapes and its relative location in the letter. These points are referred to as the modulation points, where the parameters of the sinusoidal oscillations are modulated to generate required velocity profiles. Depending on the location and shape of the corners, the amplitude, phase, and frequency relations between the two underlying oscillations are changed. Accordingly, this paper presents an efficient method to synthesize the velocity profiles and hence the handwriting. Further, the shape variability in the synthesized data can also be introduced by modifying the position of the modulation points and its corner shapes. The quality of the synthesized handwriting is evaluated using both subjective and quantitative evaluation methods.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114695295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Robust Hybrid Approach for Textual Document Classification 文本文档分类的鲁棒混合方法
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00224
M. Asim, Muhammad Usman Ghani Khan, M. I. Malik, A. Dengel, Sheraz Ahmed
Text document classification is an important task for diverse natural language processing based applications. Traditional machine learning approaches mainly focused on reducing dimensionality of textual data to perform classification. This although improved the overall classification accuracy, the classifiers still faced sparsity problem due to lack of better data representation techniques. Deep learning based text document classification, on the other hand, benefitted greatly from the invention of word embeddings that have solved the sparsity problem and researchers focus mainly remained on the development of deep architectures. Deeper architectures, however, learn some redundant features that limit the performance of deep learning based solutions. In this paper, we propose a two stage text document classification methodology which combines traditional feature engineering with automatic feature engineering (using deep learning). The proposed methodology comprises a filter based feature selection (FSE) algorithm followed by a deep convolutional neural network. This methodology is evaluated on the two most commonly used public datasets, i.e., 20 Newsgroups data and BBC news data. Evaluation results reveal that the proposed methodology outperforms the state-of-the-art of both the (traditional) machine learning and deep learning based text document classification methodologies with a significant margin of 7.7% on 20 Newsgroups and 6.6% on BBC news datasets.
文本文档分类是基于自然语言处理的各种应用的一项重要任务。传统的机器学习方法主要集中在对文本数据进行降维来进行分类。这虽然提高了整体的分类精度,但由于缺乏更好的数据表示技术,分类器仍然面临稀疏性问题。另一方面,基于深度学习的文本文档分类很大程度上得益于词嵌入的发明,它解决了稀疏性问题,研究人员主要关注深度架构的发展。然而,更深层次的架构学习了一些冗余的特征,限制了基于深度学习的解决方案的性能。本文提出了一种结合传统特征工程和自动特征工程(利用深度学习)的两阶段文本文档分类方法。提出的方法包括基于滤波器的特征选择(FSE)算法和深度卷积神经网络。该方法在两个最常用的公共数据集上进行了评估,即20新闻组数据和BBC新闻数据。评估结果显示,所提出的方法优于(传统的)机器学习和基于深度学习的文本文档分类方法,在20个新闻组和BBC新闻数据集上的差距分别为7.7%和6.6%。
{"title":"A Robust Hybrid Approach for Textual Document Classification","authors":"M. Asim, Muhammad Usman Ghani Khan, M. I. Malik, A. Dengel, Sheraz Ahmed","doi":"10.1109/ICDAR.2019.00224","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00224","url":null,"abstract":"Text document classification is an important task for diverse natural language processing based applications. Traditional machine learning approaches mainly focused on reducing dimensionality of textual data to perform classification. This although improved the overall classification accuracy, the classifiers still faced sparsity problem due to lack of better data representation techniques. Deep learning based text document classification, on the other hand, benefitted greatly from the invention of word embeddings that have solved the sparsity problem and researchers focus mainly remained on the development of deep architectures. Deeper architectures, however, learn some redundant features that limit the performance of deep learning based solutions. In this paper, we propose a two stage text document classification methodology which combines traditional feature engineering with automatic feature engineering (using deep learning). The proposed methodology comprises a filter based feature selection (FSE) algorithm followed by a deep convolutional neural network. This methodology is evaluated on the two most commonly used public datasets, i.e., 20 Newsgroups data and BBC news data. Evaluation results reveal that the proposed methodology outperforms the state-of-the-art of both the (traditional) machine learning and deep learning based text document classification methodologies with a significant margin of 7.7% on 20 Newsgroups and 6.6% on BBC news datasets.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116009379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Capturing Micro Deformations from Pooling Layers for Offline Signature Verification 从池化层捕获微变形用于离线签名验证
Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00180
Yuchen Zheng, W. Ohyama, Brian Kenji Iwana, S. Uchida
In this paper, we propose a novel Convolutional Neural Network (CNN) based method that extracts the location information (displacement features) of the maximums in the max-pooling operation and fuses it with the pooling features to capture the micro deformations between the genuine signatures and skilled forgeries as a feature extraction procedure. After the feature extraction procedure, we apply support vector machines (SVMs) as writer-dependent classifiers for each user to build the signature verification system. The extensive experimental results on GPDS-150, GPDS-300, GPDS-1000, GPDS-2000, and GPDS-5000 datasets demonstrate that the proposed method can discriminate the genuine signatures and their corresponding skilled forgeries well and achieve state-of-the-art results on these datasets.
在本文中,我们提出了一种基于卷积神经网络(CNN)的新方法,该方法提取最大池化操作中最大值的位置信息(位移特征),并将其与池化特征融合,以捕获真实签名和熟练伪造签名之间的微变形作为特征提取过程。在特征提取之后,我们将支持向量机(svm)作为每个用户的作者依赖分类器来构建签名验证系统。在GPDS-150、GPDS-300、GPDS-1000、GPDS-2000和GPDS-5000数据集上的大量实验结果表明,该方法可以很好地区分真实签名和相应的熟练伪造签名,并在这些数据集上取得了最先进的结果。
{"title":"Capturing Micro Deformations from Pooling Layers for Offline Signature Verification","authors":"Yuchen Zheng, W. Ohyama, Brian Kenji Iwana, S. Uchida","doi":"10.1109/ICDAR.2019.00180","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00180","url":null,"abstract":"In this paper, we propose a novel Convolutional Neural Network (CNN) based method that extracts the location information (displacement features) of the maximums in the max-pooling operation and fuses it with the pooling features to capture the micro deformations between the genuine signatures and skilled forgeries as a feature extraction procedure. After the feature extraction procedure, we apply support vector machines (SVMs) as writer-dependent classifiers for each user to build the signature verification system. The extensive experimental results on GPDS-150, GPDS-300, GPDS-1000, GPDS-2000, and GPDS-5000 datasets demonstrate that the proposed method can discriminate the genuine signatures and their corresponding skilled forgeries well and achieve state-of-the-art results on these datasets.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123452733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2019 International Conference on Document Analysis and Recognition (ICDAR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1