首页 > 最新文献

2020 25th International Conference on Pattern Recognition (ICPR)最新文献

英文 中文
Which are the factors affecting the performance of audio surveillance systems? 影响音频监控系统性能的因素有哪些?
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412573
Antonio Greco, Antonio Roberto, Alessia Saggese, M. Vento
Sound event recognition systems are rapidly becoming part of our life, since they can be profitably used in several vertical markets, ranging from audio security applications to scene classification and multi-modal analysis in social robotics. In the last years, a not negligible part of the scientific community started to apply Convolutional Neural Networks (CNNs) to image-based representations of the audio stream, due to their successful adoption in almost all the computer vision tasks. In this paper, we carry out a detailed benchmark of various widely used CNN architectures and visual representations on a popular dataset, namely the MIVIA Audio Events database. Our analysis is aimed at understanding how these factors affect the sound event recognition performance with a particular focus on the false positive rate, very relevant in audio surveillance solutions. In fact, although most of the proposed solutions achieve a high recognition rate, the capability of distinguishing the events-of-interest from the background is often not yet sufficient for real systems, and prevent its usage in real applications. Our comprehensive experimental analysis investigates this aspect and allows to identify useful design guidelines for increasing the specificity of sound event recognition systems.
声音事件识别系统正迅速成为我们生活的一部分,因为它们可以在几个垂直市场中使用,从音频安全应用到场景分类和社交机器人的多模态分析。在过去的几年中,由于卷积神经网络(cnn)在几乎所有计算机视觉任务中的成功应用,科学界开始将卷积神经网络(cnn)应用于基于图像的音频流表示。在本文中,我们在一个流行的数据集(即MIVIA Audio Events数据库)上对各种广泛使用的CNN架构和视觉表示进行了详细的基准测试。我们的分析旨在了解这些因素如何影响声音事件识别性能,特别关注误报率,这与音频监控解决方案非常相关。事实上,尽管大多数提出的解决方案都实现了很高的识别率,但对于实际系统来说,从背景中区分感兴趣事件的能力往往还不够,这阻碍了它在实际应用中的使用。我们的综合实验分析研究了这方面,并允许确定有用的设计准则,以增加声音事件识别系统的特异性。
{"title":"Which are the factors affecting the performance of audio surveillance systems?","authors":"Antonio Greco, Antonio Roberto, Alessia Saggese, M. Vento","doi":"10.1109/ICPR48806.2021.9412573","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412573","url":null,"abstract":"Sound event recognition systems are rapidly becoming part of our life, since they can be profitably used in several vertical markets, ranging from audio security applications to scene classification and multi-modal analysis in social robotics. In the last years, a not negligible part of the scientific community started to apply Convolutional Neural Networks (CNNs) to image-based representations of the audio stream, due to their successful adoption in almost all the computer vision tasks. In this paper, we carry out a detailed benchmark of various widely used CNN architectures and visual representations on a popular dataset, namely the MIVIA Audio Events database. Our analysis is aimed at understanding how these factors affect the sound event recognition performance with a particular focus on the false positive rate, very relevant in audio surveillance solutions. In fact, although most of the proposed solutions achieve a high recognition rate, the capability of distinguishing the events-of-interest from the background is often not yet sufficient for real systems, and prevent its usage in real applications. Our comprehensive experimental analysis investigates this aspect and allows to identify useful design guidelines for increasing the specificity of sound event recognition systems.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"32 1","pages":"7876-7883"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83696760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploiting Knowledge Embedded Soft Labels for Image Recognition 利用知识嵌入软标签进行图像识别
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412395
Lixian Yuan, Riquan Chen, Hefeng Wu, Tianshui Chen, Wentao Wang, Pei Chen
Objects from correlated classes usually share highly similar appearance while objects from uncorrelated classes are very different. Most of current image recognition works treat each class independently, which ignores these class correlations and inevitably leads to sub-optimal performance in many cases. Fortunately, object classes inherently form a hierarchy with different levels of abstraction and this hierarchy encodes rich correlations among different classes. In this work, we utilize a soft label vector that encodes the prior knowledge of class correlations as extra regularization to train the image classifiers. Specifically, for each class, instead of simply using a one-hot vector, we assign a high value to its correlated classes and assign small values to those uncorrelated ones, thus generating knowledge embedded soft labels. We conduct experiments on both general and fine-grained image recognition benchmarks and demonstrate its superiority compared with existing methods.
来自相关类的对象通常具有非常相似的外观,而来自不相关类的对象则非常不同。目前的大多数图像识别工作都是独立对待每个类,这忽略了这些类的相关性,在许多情况下不可避免地导致次优性能。幸运的是,对象类本质上形成了具有不同抽象级别的层次结构,这种层次结构编码了不同类之间丰富的相关性。在这项工作中,我们利用软标签向量编码类相关性的先验知识作为额外的正则化来训练图像分类器。具体来说,对于每个类,我们不是简单地使用一个单热向量,而是给相关的类赋一个高值,给不相关的类赋一个小值,从而生成嵌入知识的软标签。我们在一般和细粒度图像识别基准上进行了实验,并证明了其与现有方法相比的优越性。
{"title":"Exploiting Knowledge Embedded Soft Labels for Image Recognition","authors":"Lixian Yuan, Riquan Chen, Hefeng Wu, Tianshui Chen, Wentao Wang, Pei Chen","doi":"10.1109/ICPR48806.2021.9412395","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412395","url":null,"abstract":"Objects from correlated classes usually share highly similar appearance while objects from uncorrelated classes are very different. Most of current image recognition works treat each class independently, which ignores these class correlations and inevitably leads to sub-optimal performance in many cases. Fortunately, object classes inherently form a hierarchy with different levels of abstraction and this hierarchy encodes rich correlations among different classes. In this work, we utilize a soft label vector that encodes the prior knowledge of class correlations as extra regularization to train the image classifiers. Specifically, for each class, instead of simply using a one-hot vector, we assign a high value to its correlated classes and assign small values to those uncorrelated ones, thus generating knowledge embedded soft labels. We conduct experiments on both general and fine-grained image recognition benchmarks and demonstrate its superiority compared with existing methods.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"61 1","pages":"4989-4995"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90668890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Defects in Old Movies from Manually Assisted Restoration 从人工辅助修复中学习老电影的缺陷
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413196
A. Renaudeau, Travis Seng, A. Carlier, F. Pierre, F. Lauze, Jean-François Aujol, Jean-Denis Durou
We propose to detect defects in old movies, as the first step of a larger framework of old movies restoration by inpainting techniques. The specificity of our work is to learn a film restorer's expertise from a pair of sequences, composed of a movie with defects, and the same movie which was semiautomatically restored with the help of a specialized software. In order to detect those defects with minimal human interaction and further reduce the time spent for a restoration, we feed a U-Net with consecutive defective frames as input to detect the unexpected variations of pixel intensity over space and time. Since the output of the network is a mask of defect location, we first have to create the dataset of mask frames on the basis of restored frames from the software used by the film restorer, instead of classical synthetic ground truth, which is not available. These masks are estimated by computing the absolute difference between restored frames and defectuous frames, combined with thresholding and morphological closing. Our network succeeds in automatically detecting real defects with more precision than the manual selection with an all-encompassing shape, including some the expert restorer could have missed for lack of time.
我们建议在老电影中检测缺陷,作为通过涂漆技术修复老电影的更大框架的第一步。我们工作的特殊性是学习电影修复师的专业知识,从一对序列中,由有缺陷的电影组成,以及在专业软件的帮助下半自动修复的同一部电影。为了以最少的人为干预检测这些缺陷,并进一步减少修复所需的时间,我们将连续缺陷帧作为输入输入U-Net,以检测像素强度随空间和时间的意外变化。由于网络的输出是缺陷位置的掩模,因此我们首先必须在胶片修复器使用的软件中恢复的帧的基础上创建掩模帧的数据集,而不是经典的合成地真,这是不可用的。这些掩模是通过计算恢复帧和缺陷帧之间的绝对差值,结合阈值和形态关闭来估计的。我们的网络成功地自动检测出真正的缺陷,比人工选择更精确,具有全方位的形状,包括一些专家修复可能因缺乏时间而错过的缺陷。
{"title":"Learning Defects in Old Movies from Manually Assisted Restoration","authors":"A. Renaudeau, Travis Seng, A. Carlier, F. Pierre, F. Lauze, Jean-François Aujol, Jean-Denis Durou","doi":"10.1109/ICPR48806.2021.9413196","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413196","url":null,"abstract":"We propose to detect defects in old movies, as the first step of a larger framework of old movies restoration by inpainting techniques. The specificity of our work is to learn a film restorer's expertise from a pair of sequences, composed of a movie with defects, and the same movie which was semiautomatically restored with the help of a specialized software. In order to detect those defects with minimal human interaction and further reduce the time spent for a restoration, we feed a U-Net with consecutive defective frames as input to detect the unexpected variations of pixel intensity over space and time. Since the output of the network is a mask of defect location, we first have to create the dataset of mask frames on the basis of restored frames from the software used by the film restorer, instead of classical synthetic ground truth, which is not available. These masks are estimated by computing the absolute difference between restored frames and defectuous frames, combined with thresholding and morphological closing. Our network succeeds in automatically detecting real defects with more precision than the manual selection with an all-encompassing shape, including some the expert restorer could have missed for lack of time.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"17 1","pages":"5254-5261"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90752169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PointSpherical: Deep Shape Context for Point Cloud Learning in Spherical Coordinates PointSpherical:球面坐标中点云学习的深度形状上下文
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412978
Hua Lin, Bin Fan, Yongcheng Liu, Yirong Yang, Zheng Pan, Jianbo Shi, Chunhong Pan, Huiwen Xie
We propose Spherical Hierarchical modeling of 3D point cloud. Inspired by Shape Context, we design a receptive field on each 3D point by placing a spherical coordinate on it. We sample points using the furthest point method and creating overlapping balls of points. We divide the space into radial, polar angular, and azimuthal angular bins on which we form a Spherical Hierarchy for each ball. We apply 1x1 CNN convolution on points to start the initial feature extraction. Repeated 3D CNN and max-pooling over the Spherical bins propagate contextual information until all the information is condensed in the center bin. Extensive experiments on five datasets strongly evidence that our method outperforms current models on various Point Cloud Learning tasks, including 2D/3D shape classification, 3D part segmentation, and 3D semantic segmentation.
提出了三维点云的球面分层建模方法。受Shape Context的启发,我们通过在每个3D点上放置一个球面坐标来设计一个接受场。我们使用最远点方法采样点,并创建重叠的点球。我们将空间划分为径向、极角和方位角,并在此基础上为每个球形成一个球形层次。我们对点应用1x1 CNN卷积来开始初始特征提取。在球面箱上重复3D CNN和max-pooling传播上下文信息,直到所有信息都浓缩在中心箱中。在五个数据集上进行的大量实验有力地证明,我们的方法在各种点云学习任务上优于当前模型,包括2D/3D形状分类、3D零件分割和3D语义分割。
{"title":"PointSpherical: Deep Shape Context for Point Cloud Learning in Spherical Coordinates","authors":"Hua Lin, Bin Fan, Yongcheng Liu, Yirong Yang, Zheng Pan, Jianbo Shi, Chunhong Pan, Huiwen Xie","doi":"10.1109/ICPR48806.2021.9412978","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412978","url":null,"abstract":"We propose Spherical Hierarchical modeling of 3D point cloud. Inspired by Shape Context, we design a receptive field on each 3D point by placing a spherical coordinate on it. We sample points using the furthest point method and creating overlapping balls of points. We divide the space into radial, polar angular, and azimuthal angular bins on which we form a Spherical Hierarchy for each ball. We apply 1x1 CNN convolution on points to start the initial feature extraction. Repeated 3D CNN and max-pooling over the Spherical bins propagate contextual information until all the information is condensed in the center bin. Extensive experiments on five datasets strongly evidence that our method outperforms current models on various Point Cloud Learning tasks, including 2D/3D shape classification, 3D part segmentation, and 3D semantic segmentation.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"1 1","pages":"10266-10273"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91036586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OCT Image Segmentation Using Neural Architecture Search and SRGAN 基于神经结构搜索和SRGAN的OCT图像分割
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412818
O. Dehzangi, Saba Heidari Gheshlaghi, Annahita Amireskandari, N. Nasrabadi, A. Rezai
Medical image segmentation is a critical field in the domain of computer vision and with the growing acclaim of deep learning based models, research in this field is constantly expanding. Optical coherence tomography (OCT) is a non-invasive method that scans the human's retina with depth. It has been hypothesized that the thickness of the retinal layers extracted from OCTs could be an efficient and effective biomarker for early diagnosis of AD. In this work, we aim to design a self-training model architecture for the task of segmenting the retinal layers in OCT scans. Neural architecture search (NAS) is a subfield of AutoML domain, which has a significant impact on improving the accuracy of machine vision tasks. We integrate the NAS algorithm with a Unet auto-encoder architecture as its backbone. Then, we employ our proposed model to segment the retinal nerve fiber layer in our preprocessed OCT images with the aim of AD diagnosis. In this work, we trained a super-resolution generative adversarial network on the raw OCT scans to improve the quality of the images before the modeling stage. In our architecture search strategy, different primitive operations suggested to find down- & up-sampling Unet cell blocks and the binary gate method has been applied to make the search strategy more practical. Our architecture search method is empirically evaluated by training on the Unet and NAS-Unet from scratch. Specifically, the proposed NAS-Unet training significantly outperforms the baseline human-designed architecture by achieving 95.1% in the mean Intersection over Union metric and 79.1% in the Dice similarity coefficient.
医学图像分割是计算机视觉领域的一个关键领域,随着基于深度学习模型的日益普及,该领域的研究也在不断扩大。光学相干断层扫描(OCT)是一种对人视网膜进行深度扫描的非侵入性方法。研究人员推测,从OCTs中提取的视网膜层厚度可能是早期诊断AD的有效生物标志物。在这项工作中,我们的目标是设计一个自我训练模型架构,用于分割OCT扫描中的视网膜层。神经结构搜索(NAS)是AutoML领域的一个子领域,对提高机器视觉任务的准确率有着重要的影响。我们将NAS算法与Unet自动编码器架构集成为其主干。然后,我们利用所提出的模型对预处理OCT图像中的视网膜神经纤维层进行分割,目的是诊断AD。在这项工作中,我们在原始OCT扫描上训练了一个超分辨率生成对抗网络,以在建模阶段之前提高图像质量。在我们的架构搜索策略中,提出了不同的原语操作来查找上下采样的Unet单元块,并采用了二值门方法使搜索策略更加实用。我们的架构搜索方法通过在Unet和NAS-Unet上从零开始的训练进行了经验评估。具体来说,所提出的NAS-Unet训练显著优于基线人类设计的架构,达到95.1%的平均交集与联合度量和79.1%的骰子相似系数。
{"title":"OCT Image Segmentation Using Neural Architecture Search and SRGAN","authors":"O. Dehzangi, Saba Heidari Gheshlaghi, Annahita Amireskandari, N. Nasrabadi, A. Rezai","doi":"10.1109/ICPR48806.2021.9412818","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412818","url":null,"abstract":"Medical image segmentation is a critical field in the domain of computer vision and with the growing acclaim of deep learning based models, research in this field is constantly expanding. Optical coherence tomography (OCT) is a non-invasive method that scans the human's retina with depth. It has been hypothesized that the thickness of the retinal layers extracted from OCTs could be an efficient and effective biomarker for early diagnosis of AD. In this work, we aim to design a self-training model architecture for the task of segmenting the retinal layers in OCT scans. Neural architecture search (NAS) is a subfield of AutoML domain, which has a significant impact on improving the accuracy of machine vision tasks. We integrate the NAS algorithm with a Unet auto-encoder architecture as its backbone. Then, we employ our proposed model to segment the retinal nerve fiber layer in our preprocessed OCT images with the aim of AD diagnosis. In this work, we trained a super-resolution generative adversarial network on the raw OCT scans to improve the quality of the images before the modeling stage. In our architecture search strategy, different primitive operations suggested to find down- & up-sampling Unet cell blocks and the binary gate method has been applied to make the search strategy more practical. Our architecture search method is empirically evaluated by training on the Unet and NAS-Unet from scratch. Specifically, the proposed NAS-Unet training significantly outperforms the baseline human-designed architecture by achieving 95.1% in the mean Intersection over Union metric and 79.1% in the Dice similarity coefficient.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"7 1","pages":"6425-6430"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91047553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
The effect of image enhancement algorithms on convolutional neural networks 图像增强算法对卷积神经网络的影响
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412110
J. A. Rodríguez-Rodríguez, Miguel A. Molina-Cabello, Rafaela Benítez-Rochel, Ezequiel López-Rubio
Convolutional Neural Networks (CNNs) are widely used due to their high performance in many tasks related to computer vision. In particular, image classification is one of the fields where CNNs are employed with success. However, images can be heavily affected by several inconveniences such as noise or illumination. Therefore, image enhancement algorithms have been developed to improve the quality of the images. In this work, the impact that brightness and image contrast enhancement techniques have on the performance achieved by CNNs in classification tasks is analyzed. More specifically, several well known CNNs architectures such as Alexnet or Googlenet, and image contrast enhancement techniques such as Gamma Correction or Logarithm Transformation are studied. Different experiments have been carried out, and the obtained qualitative and quantitative results are reported.
卷积神经网络(Convolutional Neural Networks, cnn)因其在计算机视觉相关任务中的优异性能而得到广泛应用。特别是,图像分类是cnn成功应用的领域之一。然而,图像可能会受到一些不便因素的严重影响,例如噪声或照明。因此,人们开发了图像增强算法来提高图像质量。在这项工作中,分析了亮度和图像对比度增强技术对cnn在分类任务中取得的性能的影响。更具体地说,研究了几个著名的cnn架构,如Alexnet或Googlenet,以及图像对比度增强技术,如伽马校正或对数变换。进行了不同的实验,并报告了所获得的定性和定量结果。
{"title":"The effect of image enhancement algorithms on convolutional neural networks","authors":"J. A. Rodríguez-Rodríguez, Miguel A. Molina-Cabello, Rafaela Benítez-Rochel, Ezequiel López-Rubio","doi":"10.1109/ICPR48806.2021.9412110","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412110","url":null,"abstract":"Convolutional Neural Networks (CNNs) are widely used due to their high performance in many tasks related to computer vision. In particular, image classification is one of the fields where CNNs are employed with success. However, images can be heavily affected by several inconveniences such as noise or illumination. Therefore, image enhancement algorithms have been developed to improve the quality of the images. In this work, the impact that brightness and image contrast enhancement techniques have on the performance achieved by CNNs in classification tasks is analyzed. More specifically, several well known CNNs architectures such as Alexnet or Googlenet, and image contrast enhancement techniques such as Gamma Correction or Logarithm Transformation are studied. Different experiments have been carried out, and the obtained qualitative and quantitative results are reported.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"12 1","pages":"3084-3089"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91198575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Dual-Memory Model for Incremental Learning: The Handwriting Recognition Use Case 增量学习的双记忆模型:手写识别用例
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9411977
Melanie Piot, Berangere Bourdoulous, Jordan Gonzalez, Aurelia Deshayes, L. Prevost
In this paper, we propose a dual memory model inspired by psychological theory. Short-term memory processes the data stream before integrating them into long-term memory, which generalizes. The use case is learning the ability to recognize handwriting. This begins with the learning of prototypical letters. It continues throughout life and gives the individual the ability to recognize increasingly varied handwriting. This second task is achieved by incrementally training our dual-memory model. We used a convolution network for encoding and random forests as the memory model. Indeed, the latter have the advantage of being easily enhanced to integrate new data and new classes. Performances on the MNIST database are very encouraging since they exceed 95% and the complexity of the model remains reasonable.
本文在心理学理论的启发下,提出了一个双重记忆模型。短期记忆先处理数据流,然后将其整合到长期记忆中,从而进行概括。用例是学习识别笔迹的能力。这从学习典型字母开始。它持续一生,并赋予个体识别越来越多的不同笔迹的能力。第二个任务是通过增量训练我们的双内存模型来实现的。我们使用卷积网络进行编码,并使用随机森林作为记忆模型。实际上,后者具有易于增强以集成新数据和新类的优点。在MNIST数据库上的性能非常令人鼓舞,因为它们超过了95%,并且模型的复杂性仍然合理。
{"title":"Dual-Memory Model for Incremental Learning: The Handwriting Recognition Use Case","authors":"Melanie Piot, Berangere Bourdoulous, Jordan Gonzalez, Aurelia Deshayes, L. Prevost","doi":"10.1109/ICPR48806.2021.9411977","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9411977","url":null,"abstract":"In this paper, we propose a dual memory model inspired by psychological theory. Short-term memory processes the data stream before integrating them into long-term memory, which generalizes. The use case is learning the ability to recognize handwriting. This begins with the learning of prototypical letters. It continues throughout life and gives the individual the ability to recognize increasingly varied handwriting. This second task is achieved by incrementally training our dual-memory model. We used a convolution network for encoding and random forests as the memory model. Indeed, the latter have the advantage of being easily enhanced to integrate new data and new classes. Performances on the MNIST database are very encouraging since they exceed 95% and the complexity of the model remains reasonable.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"3 1","pages":"5527-5534"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91369435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Co-Segmentation for Athlete Movements and Live Commentaries Using Crossmodal Temporal Proximity 运动员运动的无监督共分割和现场评论使用跨模式的时间接近
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9412233
Yasunori Ohishi, Yuki Tanaka, K. Kashino
Audio-visual co-segmentation is a task to extract segments and regions corresponding to specific events on unlabeled audio and video signals. It is particularly important to accomplish it in an unsupervised way, since it is generally very difficult to manually label all the objects and events appearing in audio-visual signals for supervised learning. Here, we propose to take advantage of the temporal proximity of corresponding audio and video entities included in the signals. For this purpose, we newly employ a guided attention scheme to this task to efficiently detect and utilize temporal co-occurrences of audio and video information. Experiments using a real TV broadcasts of sumo wrestling, a sport event, with live commentaries show that our model can automatically extract specific athlete movements and its spoken descriptions in an unsupervised manner.
视听共分割是一种从未标记的音视频信号中提取特定事件对应的片段和区域的任务。以无监督的方式完成它是特别重要的,因为对于监督学习来说,通常很难手动标记视听信号中出现的所有对象和事件。在这里,我们建议利用信号中包含的相应音频和视频实体的时间接近性。为此,我们对该任务采用了一种引导注意力方案,以有效地检测和利用音频和视频信息的时间共现。通过对相扑这一体育赛事实况转播的实验表明,我们的模型能够以无监督的方式自动提取运动员的特定动作及其口头描述。
{"title":"Unsupervised Co-Segmentation for Athlete Movements and Live Commentaries Using Crossmodal Temporal Proximity","authors":"Yasunori Ohishi, Yuki Tanaka, K. Kashino","doi":"10.1109/ICPR48806.2021.9412233","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9412233","url":null,"abstract":"Audio-visual co-segmentation is a task to extract segments and regions corresponding to specific events on unlabeled audio and video signals. It is particularly important to accomplish it in an unsupervised way, since it is generally very difficult to manually label all the objects and events appearing in audio-visual signals for supervised learning. Here, we propose to take advantage of the temporal proximity of corresponding audio and video entities included in the signals. For this purpose, we newly employ a guided attention scheme to this task to efficiently detect and utilize temporal co-occurrences of audio and video information. Experiments using a real TV broadcasts of sumo wrestling, a sport event, with live commentaries show that our model can automatically extract specific athlete movements and its spoken descriptions in an unsupervised manner.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"26 1","pages":"9137-9142"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89744160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi - Direction Convolution for Semantic Segmentation 语义分割的多方向卷积
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413174
Dehui Li, Z. Cao, Ke Xian, Xinyuan Qi, Chao Zhang, Hao Lu
Context is known to be one of crucial factors effecting the performance improvement of semantic segmentation. However, state-of-the-art segmentation models built upon fully convolutional networks are inherently weak in encoding contextual information because of stacked local operations such as convolution and pooling. Failing to capture context leads to inferior segmentation performance. Despite many context modules have been proposed to relieve this problem, they still operate in a local manner or use the same contextual information in different positions (due to upsampling). In this paper, we introduce the idea of Multi-Direction Convolution (MDC)-a novel operator capable of encoding rich contextual information. This operator is inspired by an observation that the standard convolution only slides along the spatial dimension $(x,y text{direction})$ where the channel dimension $(z quad text{direction})$ is fixed, which renders slow growth of the receptive field (RF). If considering the channel-fixed convolution to be one-direction, MDC is multi-direction in the sense that MDC slides along both spatial and channel dimensions, i.e., it slides along $x,y$ when $z$ is fixed, along $x,z$ when $y$ is fixed, and along $y, z$ when $x$ is fixed. In this way, MDC is able to encode rich contextual information with the fast increase of the RF. Compared to existing context modules, the encoded context is position-sensitive because no upsampling is required. MDC is also efficient and easy to implement. It can be implemented with few standard convolution layers with permutation. We show through extensive experiments that MDC effectively and selectively enlarges the RF and outperforms existing contextual modules on two standard benchmarks, including Cityscapes and PASCAL VOC2012.
上下文是影响语义分割性能提高的关键因素之一。然而,建立在全卷积网络上的最先进的分割模型在编码上下文信息方面天生就很弱,因为卷积和池化等堆叠的局部操作。未能捕获上下文将导致较差的分割性能。尽管已经提出了许多上下文模块来解决这个问题,但它们仍然以本地方式运行,或者在不同位置使用相同的上下文信息(由于上采样)。在本文中,我们引入了多方向卷积(Multi-Direction Convolution, MDC)的思想——一种能够编码丰富上下文信息的新算子。这个算子的灵感来自于一个观察,即标准卷积只沿着空间维度$(x,y text{direction})$滑动,而通道维度$(z quad text{direction})$是固定的,这使得接受野(RF)的增长缓慢。如果将通道固定卷积视为单向的,则MDC是多方向的,因为MDC沿着空间维度和通道维度滑动,即当$z$固定时,它沿着$x、y$滑动,当$y$固定时,它沿着$x、z$滑动,当$x$固定时,它沿着$y、z$滑动。这样,随着射频的快速增加,MDC能够编码丰富的上下文信息。与现有的上下文模块相比,编码的上下文是位置敏感的,因为不需要上采样。MDC也是高效且易于实现的。它可以用很少的标准卷积层来实现。我们通过广泛的实验表明,MDC有效地、有选择地扩大了RF,并在两个标准基准上优于现有的上下文模块,包括cityscape和PASCAL VOC2012。
{"title":"Multi - Direction Convolution for Semantic Segmentation","authors":"Dehui Li, Z. Cao, Ke Xian, Xinyuan Qi, Chao Zhang, Hao Lu","doi":"10.1109/ICPR48806.2021.9413174","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413174","url":null,"abstract":"Context is known to be one of crucial factors effecting the performance improvement of semantic segmentation. However, state-of-the-art segmentation models built upon fully convolutional networks are inherently weak in encoding contextual information because of stacked local operations such as convolution and pooling. Failing to capture context leads to inferior segmentation performance. Despite many context modules have been proposed to relieve this problem, they still operate in a local manner or use the same contextual information in different positions (due to upsampling). In this paper, we introduce the idea of Multi-Direction Convolution (MDC)-a novel operator capable of encoding rich contextual information. This operator is inspired by an observation that the standard convolution only slides along the spatial dimension $(x,y text{direction})$ where the channel dimension $(z quad text{direction})$ is fixed, which renders slow growth of the receptive field (RF). If considering the channel-fixed convolution to be one-direction, MDC is multi-direction in the sense that MDC slides along both spatial and channel dimensions, i.e., it slides along $x,y$ when $z$ is fixed, along $x,z$ when $y$ is fixed, and along $y, z$ when $x$ is fixed. In this way, MDC is able to encode rich contextual information with the fast increase of the RF. Compared to existing context modules, the encoded context is position-sensitive because no upsampling is required. MDC is also efficient and easy to implement. It can be implemented with few standard convolution layers with permutation. We show through extensive experiments that MDC effectively and selectively enlarges the RF and outperforms existing contextual modules on two standard benchmarks, including Cityscapes and PASCAL VOC2012.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"97 1","pages":"519-525"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90387345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PowerHC: non linear normalization of distances for advanced nearest neighbor classification PowerHC:用于高级最近邻分类的距离非线性归一化
Pub Date : 2021-01-10 DOI: 10.1109/ICPR48806.2021.9413210
M. Bicego, M. Orozco-Alzate
In this paper we investigate the exploitation of non linear scaling of distances for advanced nearest neighbor classification. Starting from the recently found relation between the Hypersphere Classifier (HC) [1] and the Adaptive Nearest Neighbor rule (ANN) [2], here we propose PowerHC, an improved version of HC in which distances are normalized using a non linear mapping; non linear scaling of data, whose usefulness for feature spaces has been already assessed, has been hardly investigated for distances. A thorough experimental evaluation, involving 24 datasets and a challenging real world scenario of seismic signal classification, confirms the suitability of the proposed approach.
本文研究了利用非线性距离尺度进行高级最近邻分类的方法。从最近发现的超球分类器(HC)[1]和自适应最近邻规则(ANN)[2]之间的关系出发,我们提出了PowerHC,这是HC的改进版本,其中距离使用非线性映射进行规范化;数据的非线性尺度对特征空间的有用性已经得到了评估,但对距离的研究却很少。一项涉及24个数据集和具有挑战性的真实世界地震信号分类场景的全面实验评估证实了所提出方法的适用性。
{"title":"PowerHC: non linear normalization of distances for advanced nearest neighbor classification","authors":"M. Bicego, M. Orozco-Alzate","doi":"10.1109/ICPR48806.2021.9413210","DOIUrl":"https://doi.org/10.1109/ICPR48806.2021.9413210","url":null,"abstract":"In this paper we investigate the exploitation of non linear scaling of distances for advanced nearest neighbor classification. Starting from the recently found relation between the Hypersphere Classifier (HC) [1] and the Adaptive Nearest Neighbor rule (ANN) [2], here we propose PowerHC, an improved version of HC in which distances are normalized using a non linear mapping; non linear scaling of data, whose usefulness for feature spaces has been already assessed, has been hardly investigated for distances. A thorough experimental evaluation, involving 24 datasets and a challenging real world scenario of seismic signal classification, confirms the suitability of the proposed approach.","PeriodicalId":6783,"journal":{"name":"2020 25th International Conference on Pattern Recognition (ICPR)","volume":"12 1","pages":"1205-1211"},"PeriodicalIF":0.0,"publicationDate":"2021-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89415035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2020 25th International Conference on Pattern Recognition (ICPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1