首页 > 最新文献

2020 25th International Conference on Pattern Recognition (ICPR)最新文献

英文 中文
GAN-Based Image Deblurring Using DCT Discriminator 基于DCT鉴别器的gan图像去模糊
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412584
Hiroki Tomosada, Takahiro Kudo, Takanori Fujisawa, M. Ikehara
In this paper, we propose high quality image debluring by using discrete cosine transform (DCT) with less computational complexity. Recently, Convolutional Neural Network (CNN) and Generative Adversarial Network (GAN) based algorithms have been proposed for image deblurring. Moreover, multi-scale architecture of CNN restores blurred image cleary and suppresses more ringing artifacts or block noise, but it takes much time to process. To solve these problems, we propose a method that preserves texture and suppresses ringing artifacts in the restored image without multi-scale architecture using DCT based loss named “DeblurDCTGAN.”. It compares frequency domain of the images made from deblurred image and ground truth image by using DCT. Hereby, DeblurDCTGAN can reduce block noise or ringing artifacts while maintaining deblurring performance. Our experimental results show that DeblurDCTGAN gets the highest performances on both PSNR and SSIM comparing with other conventional methods in GoPro, DVD, NFS and HIDE test Dataset. Also, the running time per pair of DeblurDCTGAN is faster than others.
在本文中,我们提出了高质量的图像去模糊使用离散余弦变换(DCT)具有较低的计算复杂度。近年来,基于卷积神经网络(CNN)和生成对抗网络(GAN)的图像去模糊算法被提出。此外,CNN的多尺度架构可以清晰地恢复模糊图像,抑制更多的振铃伪影或块噪声,但处理时间较长。为了解决这些问题,我们提出了一种使用基于DCT的损失,在不需要多尺度结构的情况下,保留纹理并抑制恢复图像中的环形伪影的方法,称为“DeblurDCTGAN”。利用DCT对去模糊图像和真实图像进行频域比较。因此,DeblurDCTGAN可以在保持去模糊性能的同时减少块噪声或振铃伪影。实验结果表明,在GoPro、DVD、NFS和HIDE测试数据集中,与其他常规方法相比,DeblurDCTGAN在PSNR和SSIM上都取得了最高的性能。同时,每对DeblurDCTGAN的运行时间也比其他算法快。
{"title":"GAN-Based Image Deblurring Using DCT Discriminator","authors":"Hiroki Tomosada, Takahiro Kudo, Takanori Fujisawa, M. Ikehara","doi":"10.1109/ICPR48806.2021.9412584","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412584","url":null,"abstract":"In this paper, we propose high quality image debluring by using discrete cosine transform (DCT) with less computational complexity. Recently, Convolutional Neural Network (CNN) and Generative Adversarial Network (GAN) based algorithms have been proposed for image deblurring. Moreover, multi-scale architecture of CNN restores blurred image cleary and suppresses more ringing artifacts or block noise, but it takes much time to process. To solve these problems, we propose a method that preserves texture and suppresses ringing artifacts in the restored image without multi-scale architecture using DCT based loss named “DeblurDCTGAN.”. It compares frequency domain of the images made from deblurred image and ground truth image by using DCT. Hereby, DeblurDCTGAN can reduce block noise or ringing artifacts while maintaining deblurring performance. Our experimental results show that DeblurDCTGAN gets the highest performances on both PSNR and SSIM comparing with other conventional methods in GoPro, DVD, NFS and HIDE test Dataset. Also, the running time per pair of DeblurDCTGAN is faster than others.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"10 1","pages":"3675-3681"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88938621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Estimating Static and Dynamic Brain Networks by Kulback-Leibler Divergence from fMRI Data 基于fMRI数据的Kulback-Leibler散度估计静态和动态脑网络
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413047
Gonul Gunal Degirmendereli, F. Yarman-Vural
Representing brain activities by networks is very crucial to understand various cognitive states. This study proposes a novel method to estimate static and dynamic brain networks using Kulback-Leibler divergence. The suggested brain networks are based on the probability distributions of voxel intensity values measured by functional Magnetic Resonance Images (fMRI) recorded while the subjects perform a predefined cognitive task, called complex problem solving. We investigate the validity of the estimated brain networks by modeling and analyzing the different phases of complex problem solving process of human brain, namely planning and execution phases. The suggested computational network model is tested by a classification schema using Support Vector Machines. We observe that the network models can successfully discriminate the planning and execution phases of complex problem solving process with more than 90% accuracy, when the estimated dynamic networks, extracted from the fMRI data, are classified by Support Vector Machines.
通过网络表征大脑活动对于理解各种认知状态至关重要。本研究提出了一种利用Kulback-Leibler散度估计静态和动态脑网络的新方法。当受试者执行预先设定的认知任务(称为复杂问题解决)时,功能磁共振成像(fMRI)记录了体素强度值的概率分布,并据此提出了大脑网络。我们通过建模和分析人脑复杂问题解决过程的不同阶段,即计划和执行阶段,来研究估计的脑网络的有效性。利用支持向量机的分类模式对提出的计算网络模型进行了验证。我们观察到,当从fMRI数据中提取估计的动态网络并使用支持向量机进行分类时,网络模型能够成功区分复杂问题解决过程的计划和执行阶段,准确率超过90%。
{"title":"Estimating Static and Dynamic Brain Networks by Kulback-Leibler Divergence from fMRI Data","authors":"Gonul Gunal Degirmendereli, F. Yarman-Vural","doi":"10.1109/ICPR48806.2021.9413047","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413047","url":null,"abstract":"Representing brain activities by networks is very crucial to understand various cognitive states. This study proposes a novel method to estimate static and dynamic brain networks using Kulback-Leibler divergence. The suggested brain networks are based on the probability distributions of voxel intensity values measured by functional Magnetic Resonance Images (fMRI) recorded while the subjects perform a predefined cognitive task, called complex problem solving. We investigate the validity of the estimated brain networks by modeling and analyzing the different phases of complex problem solving process of human brain, namely planning and execution phases. The suggested computational network model is tested by a classification schema using Support Vector Machines. We observe that the network models can successfully discriminate the planning and execution phases of complex problem solving process with more than 90% accuracy, when the estimated dynamic networks, extracted from the fMRI data, are classified by Support Vector Machines.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"1 1","pages":"5913-5919"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83768066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy Minimum Regularization in Continual Learning 持续学习中的能量最小正则化
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412744
Xiaobin Li, Lianlei Shan, Minglong Li, Weiqiang Wang
How to give agents the ability of continuous learning like human and animals is still a challenge. In the regularized continual learning method OWM, the constraint of the model on the energy compression of the learned task is ignored, which results in the poor performance of the method on the dataset with a large number of learning tasks. In this paper, we propose an energy minimization regularization(EMR) method to constrain the energy of learned tasks, providing enough learning space for the following tasks that are not learned, and increasing the capacity of the model to the number of learning tasks. A large number of experiments show that our method can effectively increase the capacity of the model and reduce the sensitivity of the model to the number of tasks and the size of the network.
如何赋予智能体像人类和动物一样的持续学习能力仍然是一个挑战。在正则化连续学习方法OWM中,忽略了模型对学习任务能量压缩的约束,导致该方法在具有大量学习任务的数据集上性能不佳。在本文中,我们提出了一种能量最小化正则化(EMR)方法来约束学习任务的能量,为后续未学习的任务提供足够的学习空间,并将模型的容量增加到学习任务的数量。大量实验表明,我们的方法可以有效地提高模型的容量,降低模型对任务数量和网络规模的敏感性。
{"title":"Energy Minimum Regularization in Continual Learning","authors":"Xiaobin Li, Lianlei Shan, Minglong Li, Weiqiang Wang","doi":"10.1109/ICPR48806.2021.9412744","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412744","url":null,"abstract":"How to give agents the ability of continuous learning like human and animals is still a challenge. In the regularized continual learning method OWM, the constraint of the model on the energy compression of the learned task is ignored, which results in the poor performance of the method on the dataset with a large number of learning tasks. In this paper, we propose an energy minimization regularization(EMR) method to constrain the energy of learned tasks, providing enough learning space for the following tasks that are not learned, and increasing the capacity of the model to the number of learning tasks. A large number of experiments show that our method can effectively increase the capacity of the model and reduce the sensitivity of the model to the number of tasks and the size of the network.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"129 1","pages":"6404-6409"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79587855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quasibinary Classifier for Images with Zero and Multiple Labels 零标签和多标签图像的准二分类器
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412933
Shuai Liao, E. Gavves, Changyong Oh, Cees G. M. Snoek
The softmax and binary classifier are commonly preferred for image classification applications. However, as softmax is specifically designed for categorical classification, it assumes each image has just one class label. This limits its applicability for problems where the number of labels does not equal one, most notably zero- and multi-label problems. In these challenging settings, binary classifiers are, in theory, better suited. However, as they ignore the correlation between classes, they are not as accurate and scalable in practice. In this paper, we start from the observation that the only difference between binary and softmax classifiers is their normalization function. Specifically, while the binary classifier self-normalizes its score, the softmax classifier combines the scores from all classes before normalisation. On the basis of this observation we introduce a normalization function that is learnable, constant, and shared between classes and data points. By doing so, we arrive at a new type of binary classifier that we coin quasibinary classifier. We show in a variety of image classification settings, and on several datasets, that quasibinary classifiers are considerably better in classification settings where regular binary and softmax classifiers suffer, including zero-label and multi-label classification. What is more, we show that quasibinary classifiers yield well-calibrated probabilities allowing for direct and reliable comparisons, not only between classes but also between data points.
softmax和二进制分类器通常是图像分类应用的首选。然而,由于softmax是专门为分类分类设计的,它假设每个图像只有一个类标签。这限制了它对标签数量不等于1的问题的适用性,尤其是零标签和多标签问题。在这些具有挑战性的环境中,理论上,二元分类器更适合。然而,由于它们忽略了类之间的相关性,因此在实践中它们不那么准确和可扩展。在本文中,我们从观察到二进制分类器和softmax分类器之间的唯一区别是它们的归一化函数开始。具体来说,当二元分类器自归一化其分数时,softmax分类器在归一化之前将所有类别的分数组合在一起。在此观察的基础上,我们引入了一个可学习的、恒定的、在类和数据点之间共享的归一化函数。通过这样做,我们得到了一种新的二元分类器,即拟二元分类器。我们在各种图像分类设置和几个数据集上显示,准二元分类器在常规二元和softmax分类器遭受损失的分类设置中表现得更好,包括零标签和多标签分类。更重要的是,我们表明准二元分类器产生良好校准的概率,允许直接和可靠的比较,不仅在类之间,而且在数据点之间。
{"title":"Quasibinary Classifier for Images with Zero and Multiple Labels","authors":"Shuai Liao, E. Gavves, Changyong Oh, Cees G. M. Snoek","doi":"10.1109/ICPR48806.2021.9412933","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412933","url":null,"abstract":"The softmax and binary classifier are commonly preferred for image classification applications. However, as softmax is specifically designed for categorical classification, it assumes each image has just one class label. This limits its applicability for problems where the number of labels does not equal one, most notably zero- and multi-label problems. In these challenging settings, binary classifiers are, in theory, better suited. However, as they ignore the correlation between classes, they are not as accurate and scalable in practice. In this paper, we start from the observation that the only difference between binary and softmax classifiers is their normalization function. Specifically, while the binary classifier self-normalizes its score, the softmax classifier combines the scores from all classes before normalisation. On the basis of this observation we introduce a normalization function that is learnable, constant, and shared between classes and data points. By doing so, we arrive at a new type of binary classifier that we coin quasibinary classifier. We show in a variety of image classification settings, and on several datasets, that quasibinary classifiers are considerably better in classification settings where regular binary and softmax classifiers suffer, including zero-label and multi-label classification. What is more, we show that quasibinary classifiers yield well-calibrated probabilities allowing for direct and reliable comparisons, not only between classes but also between data points.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"100 1","pages":"8743-8750"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83350686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Empirical Solver for Localized Multiple Kernel Learning via DNNs 基于dnn的局部多核学习的高效经验求解器
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9411974
Ziming Zhang
In this paper we propose solving localized multiple kernel learning (LMKL) using LMKL-Net, a feedforward deep neural network (DNN). In contrast to previous works, as a learning principle we propose parameterizing the gating function for learning kernel combination weights and the multiclass classifier using an attentional network (AN) and a multilayer perceptron (MLP), respectively. Such interpretability helps us better understand how the network solves the problem. Thanks to stochastic gradient descent (SGD), our approach has linear computational complexity in training. Empirically on benchmark datasets we demonstrate that with comparable or better accuracy than the state-of-the-art, our LMKL-Net can be trained about two orders of magnitude faster with about two orders of magnitude smaller memory footprint for large-scale learning.
本文提出利用前馈深度神经网络LMKL- net解决局部多核学习(LMKL)问题。与之前的研究相反,作为一种学习原理,我们提出了参数化门控函数来学习核组合权值,并分别使用注意网络(an)和多层感知器(MLP)来学习多类分类器。这种可解释性有助于我们更好地理解网络是如何解决问题的。由于随机梯度下降(SGD),我们的方法在训练中具有线性计算复杂度。在基准数据集的经验上,我们证明了与最先进的技术相比,我们的LMKL-Net可以以大约两个数量级的速度训练,并且在大规模学习中可以减少大约两个数量级的内存占用。
{"title":"An Efficient Empirical Solver for Localized Multiple Kernel Learning via DNNs","authors":"Ziming Zhang","doi":"10.1109/ICPR48806.2021.9411974","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9411974","url":null,"abstract":"In this paper we propose solving localized multiple kernel learning (LMKL) using LMKL-Net, a feedforward deep neural network (DNN). In contrast to previous works, as a learning principle we propose parameterizing the gating function for learning kernel combination weights and the multiclass classifier using an attentional network (AN) and a multilayer perceptron (MLP), respectively. Such interpretability helps us better understand how the network solves the problem. Thanks to stochastic gradient descent (SGD), our approach has linear computational complexity in training. Empirically on benchmark datasets we demonstrate that with comparable or better accuracy than the state-of-the-art, our LMKL-Net can be trained about two orders of magnitude faster with about two orders of magnitude smaller memory footprint for large-scale learning.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"71 3 1","pages":"647-654"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83428738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Information Graphic Summarization using a Collection of Multimodal Deep Neural Networks 使用多模态深度神经网络集合的信息图形总结
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412146
Edward J. Kim, Connor Onweller, Kathleen F. McCoy
We present a multimodal deep learning framework that can generate summarization text supporting the main idea of an information graphic for presentation to a person who is blind or visually impaired. The framework utilizes the visual, textual, positional, and size characteristics extracted from the image to create the summary. Different and complimentary neural architectures are optimized for each task using crowdsourced training data. From our quantitative experiments and results, we explain the reasoning behind our framework and show the effectiveness of our models. Our qualitative results showcase text generated from our framework and show that Mechanical Turk participants favor them to other automatic and human generated summarizations. We describe the design and results of an experiment to evaluate the utility of our system for people who have visual impairments in the context of understanding Twitter Tweets containing line graphs.
我们提出了一个多模态深度学习框架,该框架可以生成支持信息图形的主要思想的摘要文本,以呈现给盲人或视障人士。该框架利用从图像中提取的视觉、文本、位置和大小特征来创建摘要。使用众包训练数据对每个任务进行了不同的和互补的神经架构优化。从我们的定量实验和结果中,我们解释了我们的框架背后的推理,并展示了我们模型的有效性。我们的定性结果展示了从我们的框架中生成的文本,并表明Mechanical Turk参与者更喜欢它们而不是其他自动和人工生成的摘要。我们描述了一个实验的设计和结果,以评估我们的系统在理解包含线形图的Twitter推文的背景下对有视觉障碍的人的效用。
{"title":"Information Graphic Summarization using a Collection of Multimodal Deep Neural Networks","authors":"Edward J. Kim, Connor Onweller, Kathleen F. McCoy","doi":"10.1109/ICPR48806.2021.9412146","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412146","url":null,"abstract":"We present a multimodal deep learning framework that can generate summarization text supporting the main idea of an information graphic for presentation to a person who is blind or visually impaired. The framework utilizes the visual, textual, positional, and size characteristics extracted from the image to create the summary. Different and complimentary neural architectures are optimized for each task using crowdsourced training data. From our quantitative experiments and results, we explain the reasoning behind our framework and show the effectiveness of our models. Our qualitative results showcase text generated from our framework and show that Mechanical Turk participants favor them to other automatic and human generated summarizations. We describe the design and results of an experiment to evaluate the utility of our system for people who have visual impairments in the context of understanding Twitter Tweets containing line graphs.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"11 1","pages":"10188-10195"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87384891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
2D Discrete Mirror Transform for Image Non-Linear Approximation 二维离散镜像变换用于图像非线性逼近
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412019
Alessandro Gnutti, Fabrizio Guerrini, R. Leonardi
In this paper, a new 2D transform named Discrete Mirror Transform (DMT) is presented. The DMT is computed by decomposing a signal into its even and odd parts around an optimal location in a given direction so that the signal energy is maximally split between the two components. After minimizing the information required to regenerate the original signal by removing redundant structures, the process is iterated leading the signal energy to distribute into a continuously smaller set of coefficients. The DMT can be displayed as a binary tree, where each node represents the single (even or odd) signal derived from the decomposition in the previous level. An optimized version of the DMT (ODMT) is also introduced, by exploiting the possibility to choose different directions at which performing the decomposition. Experimental simulations have been carried out in order to test the sparsity properties of the DMT and ODMT when applied on images: referring to both transforms, the results show a superior performance with respect to the popular Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) in terms of non-linear approximation.
提出了一种新的二维变换——离散镜像变换(DMT)。DMT是通过在给定方向上的最佳位置周围将信号分解为偶数和奇数部分来计算的,以便信号能量在两个分量之间最大限度地分配。通过去除冗余结构,将再生原始信号所需的信息最小化后,迭代该过程,使信号能量分布到一个连续较小的系数集合中。DMT可以显示为二叉树,其中每个节点表示从前一级分解中得到的单个(偶数或奇数)信号。通过利用选择执行分解的不同方向的可能性,还介绍了DMT (ODMT)的优化版本。为了测试DMT和ODMT在图像上应用时的稀疏性,进行了实验模拟:参考这两种变换,结果表明,在非线性近似方面,相对于流行的离散余弦变换(DCT)和离散小波变换(DWT),它们具有优越的性能。
{"title":"2D Discrete Mirror Transform for Image Non-Linear Approximation","authors":"Alessandro Gnutti, Fabrizio Guerrini, R. Leonardi","doi":"10.1109/ICPR48806.2021.9412019","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412019","url":null,"abstract":"In this paper, a new 2D transform named Discrete Mirror Transform (DMT) is presented. The DMT is computed by decomposing a signal into its even and odd parts around an optimal location in a given direction so that the signal energy is maximally split between the two components. After minimizing the information required to regenerate the original signal by removing redundant structures, the process is iterated leading the signal energy to distribute into a continuously smaller set of coefficients. The DMT can be displayed as a binary tree, where each node represents the single (even or odd) signal derived from the decomposition in the previous level. An optimized version of the DMT (ODMT) is also introduced, by exploiting the possibility to choose different directions at which performing the decomposition. Experimental simulations have been carried out in order to test the sparsity properties of the DMT and ODMT when applied on images: referring to both transforms, the results show a superior performance with respect to the popular Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) in terms of non-linear approximation.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"23 1","pages":"9311-9317"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87298266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-modal Contextual Graph Neural Network for Text Visual Question Answering 文本视觉问答的多模态上下文图神经网络
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412891
Yaoyuan Liang, Xin Wang, Xuguang Duan, Wenwu Zhu
Text visual question answering (TextVQA) targets at answering the question related to texts appearing in the given images, posing more challenges than VQA by requiring a deeper recognition and understanding of various shapes of human-readable scene texts as well as their meanings in different contexts. Existing works on TextVQA suffer from two weaknesses: i) scene texts and non-textual objects are processed separately and independently without considering their mutual interactions during the question understanding and answering process, ii) scene texts are encoded only through word embeddings without taking the corresponding visual appearance features as well as their potential relationships with other non-textual objects in the images into account. To overcome the weakness of existing works, we propose a novel multi-modal contextual graph neural network (MCG) model for TextVQA. The proposed MCG model can capture the relationships between visual features of scene texts and non-textual objects in the given images as well as utilize richer sources of multi-modal features to improve the model performance. In particular, we encode the scene texts into richer features containing textual, visual and positional features, then model the visual relations between scene texts and non-textual objects through a contextual graph neural network. Our extensive experiments on real-world dataset demonstrate the advantages of the proposed MCG model over baseline approaches.
文本视觉问答(TextVQA)的目标是回答与给定图像中出现的文本相关的问题,这比VQA更具挑战性,因为它需要更深入地识别和理解人类可读场景文本的各种形状及其在不同上下文中的含义。现有的TextVQA工作存在两个弱点:1)场景文本和非文本对象在理解和回答问题的过程中被分开、独立地处理,而没有考虑它们之间的相互作用;2)场景文本仅通过词嵌入进行编码,而没有考虑相应的视觉外观特征以及它们与图像中其他非文本对象的潜在关系。为了克服现有工作的不足,我们提出了一种新的TextVQA多模态上下文图神经网络(MCG)模型。所提出的MCG模型能够捕捉给定图像中场景文本和非文本对象的视觉特征之间的关系,并利用更丰富的多模态特征来源来提高模型的性能。特别是,我们将场景文本编码为包含文本、视觉和位置特征的更丰富的特征,然后通过上下文图神经网络对场景文本和非文本对象之间的视觉关系进行建模。我们在真实数据集上的大量实验证明了所提出的MCG模型相对于基线方法的优势。
{"title":"Multi-modal Contextual Graph Neural Network for Text Visual Question Answering","authors":"Yaoyuan Liang, Xin Wang, Xuguang Duan, Wenwu Zhu","doi":"10.1109/ICPR48806.2021.9412891","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412891","url":null,"abstract":"Text visual question answering (TextVQA) targets at answering the question related to texts appearing in the given images, posing more challenges than VQA by requiring a deeper recognition and understanding of various shapes of human-readable scene texts as well as their meanings in different contexts. Existing works on TextVQA suffer from two weaknesses: i) scene texts and non-textual objects are processed separately and independently without considering their mutual interactions during the question understanding and answering process, ii) scene texts are encoded only through word embeddings without taking the corresponding visual appearance features as well as their potential relationships with other non-textual objects in the images into account. To overcome the weakness of existing works, we propose a novel multi-modal contextual graph neural network (MCG) model for TextVQA. The proposed MCG model can capture the relationships between visual features of scene texts and non-textual objects in the given images as well as utilize richer sources of multi-modal features to improve the model performance. In particular, we encode the scene texts into richer features containing textual, visual and positional features, then model the visual relations between scene texts and non-textual objects through a contextual graph neural network. Our extensive experiments on real-world dataset demonstrate the advantages of the proposed MCG model over baseline approaches.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"79 1","pages":"3491-3498"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84718296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
DenseRecognition of Spoken Languages 口语的密集识别
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412413
Jaybrata Chakraborty, Bappaditya Chakraborty, U. Bhattacharya
In the present study, we have considered a large number (27) of Indian languages for recognition from their speech signals of different sources. A dense convolutional network architecture (DenseNet) has been used for this classification task. Dynamic elimination of low energy frames from the input speech signal has been considered as a preprocessing operation. Mel-spectrogram of pre-processed speech signal is fed as input to the DenseNet architecture. Language recognition performance of this architecture has been compared with that of several state-of-the-art deep architectures which include a convolutional neural network (CNN), ResNet, CNN-BLSTM and DenseNet-BLSTM hybrid architectures. Additionally, we obtained recognition performances of a stacked BLSTM architecture fed with different sets of handcrafted features for comparison purpose. Simulations for both speaker independent and speaker dependent scenarios have been performed on two different standard datasets which include (i) IITKGP-MLILSC dataset of news clips in 27 different Indian languages and (ii) Linguistic Data Consortium (LDC) dataset of telephonic conversations in 5 different Indian languages. In each case, recognition performance of the DenseNet architecture along with Mel-spectrogram features has been found to be significantly better than all other frameworks implemented in this study.
在本研究中,我们考虑了大量(27)种印度语言从不同来源的语音信号中进行识别。一个密集卷积网络架构(DenseNet)被用于这个分类任务。从输入语音信号中动态消除低能量帧被认为是一种预处理操作。将预处理语音信号的mel谱图作为DenseNet体系结构的输入。将该体系结构的语言识别性能与卷积神经网络(CNN)、ResNet、CNN- blstm和DenseNet-BLSTM混合体系结构等几种最先进的深度体系结构进行了比较。此外,为了进行比较,我们获得了由不同组手工特征馈送的堆叠BLSTM体系结构的识别性能。在两种不同的标准数据集上进行了演讲者独立和演讲者依赖场景的模拟,其中包括(i) 27种不同印度语言的IITKGP-MLILSC新闻片段数据集和(ii)语言数据联盟(LDC) 5种不同印度语言的电话对话数据集。在每种情况下,DenseNet架构以及mel谱图特征的识别性能都明显优于本研究中实现的所有其他框架。
{"title":"DenseRecognition of Spoken Languages","authors":"Jaybrata Chakraborty, Bappaditya Chakraborty, U. Bhattacharya","doi":"10.1109/ICPR48806.2021.9412413","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412413","url":null,"abstract":"In the present study, we have considered a large number (27) of Indian languages for recognition from their speech signals of different sources. A dense convolutional network architecture (DenseNet) has been used for this classification task. Dynamic elimination of low energy frames from the input speech signal has been considered as a preprocessing operation. Mel-spectrogram of pre-processed speech signal is fed as input to the DenseNet architecture. Language recognition performance of this architecture has been compared with that of several state-of-the-art deep architectures which include a convolutional neural network (CNN), ResNet, CNN-BLSTM and DenseNet-BLSTM hybrid architectures. Additionally, we obtained recognition performances of a stacked BLSTM architecture fed with different sets of handcrafted features for comparison purpose. Simulations for both speaker independent and speaker dependent scenarios have been performed on two different standard datasets which include (i) IITKGP-MLILSC dataset of news clips in 27 different Indian languages and (ii) Linguistic Data Consortium (LDC) dataset of telephonic conversations in 5 different Indian languages. In each case, recognition performance of the DenseNet architecture along with Mel-spectrogram features has been found to be significantly better than all other frameworks implemented in this study.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"78 1","pages":"9674-9681"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85264668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
3D Point Cloud Registration Based on Cascaded Mutual Information Attention Network 基于级联互信息关注网络的三维点云配准
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413083
Xiang Pan, Xiaoyi Ji, Sisi Cheng
For 3D point cloud registration, how to improve the local feature correlation of two point clouds is a challenging problem. In this paper, we propose a cascaded mutual information attention registration network. The network improves the accuracy of point cloud registration by stacking residual structure and using lateral connection. Firstly, the local reference coordinate system is defined by spherical representation for the local point set, which improves the stability and reliability of local features under noise. Secondly, the attention structure is used to improve the network depth and ensure the convergence of the network. Furthermore, a lateral connection is introduced into the network to avoid the loss of features in the process of concatenation. In the experimental part, the results of different algorithms are compared. It can be found that the proposed cascaded network can enhance the correlation of local features between different point clouds. As a result, it improves the registration accuracy significantly over the DCP and other typical algorithms.
在三维点云配准中,如何提高两点云的局部特征相关性是一个具有挑战性的问题。本文提出了一种级联互信息关注配准网络。该网络通过叠加残差结构和使用横向连接来提高点云配准的精度。首先,对局部点集采用球面表示定义局部参考坐标系,提高了局部特征在噪声作用下的稳定性和可靠性;其次,利用注意力结构提高网络深度,保证网络的收敛性。此外,在网络中引入横向连接,避免了连接过程中特征的丢失。在实验部分,比较了不同算法的结果。实验结果表明,所提出的级联网络可以增强不同点云之间局部特征的相关性。与DCP和其他典型的配准算法相比,该算法的配准精度得到了显著提高。
{"title":"3D Point Cloud Registration Based on Cascaded Mutual Information Attention Network","authors":"Xiang Pan, Xiaoyi Ji, Sisi Cheng","doi":"10.1109/ICPR48806.2021.9413083","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413083","url":null,"abstract":"For 3D point cloud registration, how to improve the local feature correlation of two point clouds is a challenging problem. In this paper, we propose a cascaded mutual information attention registration network. The network improves the accuracy of point cloud registration by stacking residual structure and using lateral connection. Firstly, the local reference coordinate system is defined by spherical representation for the local point set, which improves the stability and reliability of local features under noise. Secondly, the attention structure is used to improve the network depth and ensure the convergence of the network. Furthermore, a lateral connection is introduced into the network to avoid the loss of features in the process of concatenation. In the experimental part, the results of different algorithms are compared. It can be found that the proposed cascaded network can enhance the correlation of local features between different point clouds. As a result, it improves the registration accuracy significantly over the DCP and other typical algorithms.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"85 1","pages":"10644-10649"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85266240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2020 25th International Conference on Pattern Recognition (ICPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1