首页 > 最新文献

2020 IEEE International Conference on Image Processing (ICIP)最新文献

英文 中文
Hrnet: Hamiltonian Rescaling Network for Image Downscaling 用于图像降尺度的哈密顿缩放网络
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190729
Y. Chen, Xi Xiao, Tao Dai, Shutao Xia
Image downscaling has become a classical problem in image processing and has recently connected to image super-resolution (SR), which restores high-quality images from low-resolution ones generated by predetermined downscaling kernels (e.g., bicubic). However, most existing image downscaling methods are deterministic and lose information during the downscaling process, while rarely designing specific downscaling methods for image SR. In this paper, we propose a novel learning-based image downscaling method, Hamiltonian Rescaling Network (HRNet). The design of HRNet is based on the discretization of Hamiltonian System, a pair of iterative updating equations, which formulate a mechanism of iterative correction of the error caused by information missing during image or feature downscaling. Extensive experiments demonstrate the effectiveness of our proposed method in terms of both quantitative and qualitative results.
图像降尺度已经成为图像处理中的一个经典问题,最近与图像超分辨率(SR)联系在一起,SR是由预定的降尺度核(如双三次)生成的低分辨率图像恢复高质量图像。然而,现有的图像降尺度方法大多是确定性的,在降尺度过程中会丢失信息,很少针对图像sr设计具体的降尺度方法。本文提出了一种新的基于学习的图像降尺度方法——哈密顿重尺度网络(HRNet)。HRNet的设计基于哈密顿系统的离散化,这是一对迭代更新方程,它建立了一种迭代修正图像或特征降尺度过程中信息缺失导致的误差的机制。大量的实验证明了我们提出的方法在定量和定性结果方面的有效性。
{"title":"Hrnet: Hamiltonian Rescaling Network for Image Downscaling","authors":"Y. Chen, Xi Xiao, Tao Dai, Shutao Xia","doi":"10.1109/ICIP40778.2020.9190729","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190729","url":null,"abstract":"Image downscaling has become a classical problem in image processing and has recently connected to image super-resolution (SR), which restores high-quality images from low-resolution ones generated by predetermined downscaling kernels (e.g., bicubic). However, most existing image downscaling methods are deterministic and lose information during the downscaling process, while rarely designing specific downscaling methods for image SR. In this paper, we propose a novel learning-based image downscaling method, Hamiltonian Rescaling Network (HRNet). The design of HRNet is based on the discretization of Hamiltonian System, a pair of iterative updating equations, which formulate a mechanism of iterative correction of the error caused by information missing during image or feature downscaling. Extensive experiments demonstrate the effectiveness of our proposed method in terms of both quantitative and qualitative results.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132578906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
DCM: A Dense-Attention Context Module For Semantic Segmentation DCM:一种用于语义分割的密集关注上下文模块
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190675
Shenghua Li, Quan Zhou, Jia Liu, Jie Wang, Yawen Fan, Xiaofu Wu, Longin Jan Latecki
For image semantic segmentation, a fully convolutional network is usually employed as the encoder to abstract visual features of the input image. A meticulously designed decoder is used to decoding the final feature map of the backbone. The output resolution of backbones which are designed for image classification task is too low to match segmentation task. Most existing methods for obtaining the final high-resolution feature map can not fully utilize the information of different layers of the backbone. To adequately extract the information of a single layer, the multi-scale context information of different layers, and the global information of backbone, we present a new attention-augmented module named Dense-attention Context Module (DCM), which is used to connect the common backbones and the other decoding heads. The experiments show the promising results of our method on Cityscapes dataset.
对于图像语义分割,通常采用全卷积网络作为编码器对输入图像的视觉特征进行抽象。精心设计的解码器用于解码主干的最终特征图。专为图像分类任务设计的主干输出分辨率太低,无法匹配图像分割任务。现有的获取最终高分辨率特征图的方法大多不能充分利用主干网不同层的信息。为了充分提取单层信息、不同层的多尺度上下文信息和主干的全局信息,我们提出了一种新的注意力增强模块——密集注意力上下文模块(DCM),用于连接公共主干和其他解码头。实验结果表明,该方法在城市景观数据集上取得了良好的效果。
{"title":"DCM: A Dense-Attention Context Module For Semantic Segmentation","authors":"Shenghua Li, Quan Zhou, Jia Liu, Jie Wang, Yawen Fan, Xiaofu Wu, Longin Jan Latecki","doi":"10.1109/ICIP40778.2020.9190675","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190675","url":null,"abstract":"For image semantic segmentation, a fully convolutional network is usually employed as the encoder to abstract visual features of the input image. A meticulously designed decoder is used to decoding the final feature map of the backbone. The output resolution of backbones which are designed for image classification task is too low to match segmentation task. Most existing methods for obtaining the final high-resolution feature map can not fully utilize the information of different layers of the backbone. To adequately extract the information of a single layer, the multi-scale context information of different layers, and the global information of backbone, we present a new attention-augmented module named Dense-attention Context Module (DCM), which is used to connect the common backbones and the other decoding heads. The experiments show the promising results of our method on Cityscapes dataset.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133224904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dependent Scalar Quantization For Neural Network Compression 神经网络压缩的相关标量量化
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190955
Paul Haase, H. Schwarz, H. Kirchhoffer, Simon Wiedemann, Talmaj Marinc, Arturo Marbán, K. Müller, W. Samek, D. Marpe, T. Wiegand
Recent approaches to compression of deep neural networks, like the emerging standard on compression of neural networks for multimedia content description and analysis (MPEG-7 part 17), apply scalar quantization and entropy coding of the quantization indexes. In this paper we present an advanced method for quantization of neural network parameters, which applies dependent scalar quantization (DQ) or trellis-coded quantization (TCQ), and an improved context modeling for the entropy coding of the quantization indexes. We show that the proposed method achieves 5.778% bitrate reduction and virtually no loss (0.37%) of network performance in average, compared to the baseline methods of the second test model (NCTM) of MPEG-7 part 17 for relevant working points.
最近的深度神经网络压缩方法,如新兴的用于多媒体内容描述和分析的神经网络压缩标准(MPEG-7 part 17),采用量化指标的标量量化和熵编码。本文提出了一种神经网络参数量化的新方法,即依赖标量量化(DQ)或网格编码量化(TCQ),并对量化指标的熵编码进行了改进的上下文建模。我们表明,与MPEG-7 part 17的第二个测试模型(NCTM)的基线方法相比,该方法在相关工作点上实现了5.778%的比特率降低,平均几乎没有网络性能损失(0.37%)。
{"title":"Dependent Scalar Quantization For Neural Network Compression","authors":"Paul Haase, H. Schwarz, H. Kirchhoffer, Simon Wiedemann, Talmaj Marinc, Arturo Marbán, K. Müller, W. Samek, D. Marpe, T. Wiegand","doi":"10.1109/ICIP40778.2020.9190955","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190955","url":null,"abstract":"Recent approaches to compression of deep neural networks, like the emerging standard on compression of neural networks for multimedia content description and analysis (MPEG-7 part 17), apply scalar quantization and entropy coding of the quantization indexes. In this paper we present an advanced method for quantization of neural network parameters, which applies dependent scalar quantization (DQ) or trellis-coded quantization (TCQ), and an improved context modeling for the entropy coding of the quantization indexes. We show that the proposed method achieves 5.778% bitrate reduction and virtually no loss (0.37%) of network performance in average, compared to the baseline methods of the second test model (NCTM) of MPEG-7 part 17 for relevant working points.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127831226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Proposal-Based Instance Segmentation With Point Supervision 基于提议的点监督实例分割
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190782
I. Laradji, Negar Rostamzadeh, Pedro H. O. Pinheiro, David Vázquez, Mark W. Schmidt
Instance segmentation methods often require costly per-pixel labels. We propose a method called WISE-Net that only requires point-level annotations. During training, the model only has access to a single pixel label per object, yet the task is to output full segmentation masks. To address this challenge, we construct a network with two branches: (1) a 10-calization network (L-Net) that predicts the location of each object; and (2) an embedding network (E-Net) that learns an embedding space where pixels of the same object are close. The segmentation masks for the located objects are obtained by grouping pixels with similar embeddings. We evaluate our approach on PASCAL VOC, COCO, KITTI and CityScapes datasets. The experiments show that our method (1) obtains competitive results compared to fully-supervised methods in certain scenarios; (2) outperforms fully-and weakly-supervised methods with a fixed annotation budget; and (3) establishes a first strong baseline for instance segmentation with point-level supervision.
实例分割方法通常需要昂贵的逐像素标签。我们提出了一种叫做WISE-Net的方法,它只需要点级注释。在训练过程中,模型只能访问每个对象的单个像素标签,但任务是输出完整的分割掩码。为了解决这一挑战,我们构建了一个具有两个分支的网络:(1)一个预测每个对象位置的10-calization网络(L-Net);(2)学习同一物体像素接近的嵌入空间的嵌入网络(E-Net)。通过对具有相似嵌入的像素进行分组,获得定位对象的分割掩码。我们在PASCAL VOC、COCO、KITTI和cityscape数据集上评估了我们的方法。实验表明,与全监督方法相比,我们的方法(1)在某些场景下获得了具有竞争力的结果;(2)在固定标注预算下优于全监督和弱监督方法;(3)通过点级监督为实例分割建立第一个强基线。
{"title":"Proposal-Based Instance Segmentation With Point Supervision","authors":"I. Laradji, Negar Rostamzadeh, Pedro H. O. Pinheiro, David Vázquez, Mark W. Schmidt","doi":"10.1109/ICIP40778.2020.9190782","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190782","url":null,"abstract":"Instance segmentation methods often require costly per-pixel labels. We propose a method called WISE-Net that only requires point-level annotations. During training, the model only has access to a single pixel label per object, yet the task is to output full segmentation masks. To address this challenge, we construct a network with two branches: (1) a 10-calization network (L-Net) that predicts the location of each object; and (2) an embedding network (E-Net) that learns an embedding space where pixels of the same object are close. The segmentation masks for the located objects are obtained by grouping pixels with similar embeddings. We evaluate our approach on PASCAL VOC, COCO, KITTI and CityScapes datasets. The experiments show that our method (1) obtains competitive results compared to fully-supervised methods in certain scenarios; (2) outperforms fully-and weakly-supervised methods with a fixed annotation budget; and (3) establishes a first strong baseline for instance segmentation with point-level supervision.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131692543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Variational Autoencoder Based Unsupervised Domain Adaptation For Semantic Segmentation 基于变分自编码器的无监督域自适应语义分割
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190973
Zongyao Li, Ren Togo, Takahiro Ogawa, M. Haseyama
Unsupervised domain adaptation, which transfers supervised knowledge from a labeled domain to an unlabeled domain, remains a tough problem in the field of computer vision, especially for semantic segmentation. Some methods inspired by adversarial learning and semi-supervised learning have been developed for unsupervised domain adaptation in semantic segmentation and achieved outstanding performances. In this paper, we propose a novel method for this task. Like adversarial learning-based methods using a discriminator to align the feature distributions from different domains, we employ a variational autoencoder to get to the same destination but in a non-adversarial manner. Since the two approaches are compatible, we also integrate an adversarial loss into our method. By further introducing pseudo labels, our method can achieve state-of-the-art performances on two benchmark adaptation scenarios, GTA5-toCITYSCAPES and SYNTHIA-to-CITYSCAPES.
无监督域自适应是将有监督知识从有标记的域转移到无标记的域,是计算机视觉领域的一个难题,特别是在语义分割领域。在对抗学习和半监督学习的启发下,针对语义分割中的无监督域自适应问题提出了一些方法,并取得了较好的效果。在本文中,我们提出了一种新的方法来完成这项任务。就像基于对抗性学习的方法使用鉴别器来对齐来自不同领域的特征分布一样,我们使用变分自编码器以非对抗性的方式到达相同的目的地。由于这两种方法是兼容的,我们还将对抗性损失集成到我们的方法中。通过进一步引入伪标签,我们的方法可以在gta5 - tocityscape和syntia -to cityscape两个基准适应场景上实现最先进的性能。
{"title":"Variational Autoencoder Based Unsupervised Domain Adaptation For Semantic Segmentation","authors":"Zongyao Li, Ren Togo, Takahiro Ogawa, M. Haseyama","doi":"10.1109/ICIP40778.2020.9190973","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190973","url":null,"abstract":"Unsupervised domain adaptation, which transfers supervised knowledge from a labeled domain to an unlabeled domain, remains a tough problem in the field of computer vision, especially for semantic segmentation. Some methods inspired by adversarial learning and semi-supervised learning have been developed for unsupervised domain adaptation in semantic segmentation and achieved outstanding performances. In this paper, we propose a novel method for this task. Like adversarial learning-based methods using a discriminator to align the feature distributions from different domains, we employ a variational autoencoder to get to the same destination but in a non-adversarial manner. Since the two approaches are compatible, we also integrate an adversarial loss into our method. By further introducing pseudo labels, our method can achieve state-of-the-art performances on two benchmark adaptation scenarios, GTA5-toCITYSCAPES and SYNTHIA-to-CITYSCAPES.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124212853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Dronecaps: Recognition Of Human Actions In Drone Videos Using Capsule Networks With Binary Volume Comparisons 无人机帽:识别人类行动在无人机视频使用胶囊网络与二进制体积比较
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190864
Abdullah M. Algamdi, Victor Sanchez, Chang-Tsun Li
Understanding human actions from videos captured by drones is a challenging task in computer vision due to the unfamiliar viewpoints of individuals and changes in their size due to the camera’s location and motion. This work proposes DroneCaps, a capsule network architecture for multi-label human action recognition (HAR) in videos captured by drones. DroneCaps uses features computed by 3D convolution neural networks plus a new set of features computed by a novel Binary Volume Comparison layer. All these features, in conjunction with the learning power of CapsNets, allow understanding and abstracting the different viewpoints and poses of the depicted individuals very efficiently, thus improving multi-label HAR. The evaluation of the DroneCaps architecture’s performance for multi-label classification shows that it outperforms state-of-the-art methods on the Okutama-Action dataset.
从无人机拍摄的视频中理解人类行为在计算机视觉中是一项具有挑战性的任务,因为个人的视角不熟悉,而且由于摄像机的位置和运动,他们的大小也会发生变化。这项工作提出了DroneCaps,这是一种胶囊网络架构,用于无人机捕获的视频中的多标签人类动作识别(HAR)。DroneCaps使用3D卷积神经网络计算的特征加上一组新的由新的二进制体积比较层计算的特征。所有这些特征,结合CapsNets的学习能力,可以非常有效地理解和抽象所描绘个体的不同观点和姿势,从而改进多标签HAR。对DroneCaps架构的多标签分类性能的评估表明,它在Okutama-Action数据集上优于最先进的方法。
{"title":"Dronecaps: Recognition Of Human Actions In Drone Videos Using Capsule Networks With Binary Volume Comparisons","authors":"Abdullah M. Algamdi, Victor Sanchez, Chang-Tsun Li","doi":"10.1109/ICIP40778.2020.9190864","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190864","url":null,"abstract":"Understanding human actions from videos captured by drones is a challenging task in computer vision due to the unfamiliar viewpoints of individuals and changes in their size due to the camera’s location and motion. This work proposes DroneCaps, a capsule network architecture for multi-label human action recognition (HAR) in videos captured by drones. DroneCaps uses features computed by 3D convolution neural networks plus a new set of features computed by a novel Binary Volume Comparison layer. All these features, in conjunction with the learning power of CapsNets, allow understanding and abstracting the different viewpoints and poses of the depicted individuals very efficiently, thus improving multi-label HAR. The evaluation of the DroneCaps architecture’s performance for multi-label classification shows that it outperforms state-of-the-art methods on the Okutama-Action dataset.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124507115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Variational Auto-Encoders Without Graph Coarsening For Fine Mesh Learning 面向精细网格学习的无图粗化变分自编码器
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191189
Nicolas Vercheval, H. Bie, A. Pižurica
In this paper, we propose a Variational Auto-Encoder able to correctly reconstruct a fine mesh from a very low-dimensional latent space. The architecture avoids the usual coarsening of the graph and relies on pooling layers for the decoding phase and on the mean values of the training set for the up-sampling phase. We select new operators compared to previous work, and in particular, we define a new Dirac operator which can be extended to different types of graph structured data. We show the improvements over the previous operators and compare the results with the current benchmark on the Coma Dataset.
在本文中,我们提出了一种变分自编码器,能够从非常低维的潜在空间中正确地重建精细网格。该架构避免了通常的图粗化,在解码阶段依赖于池化层,在上采样阶段依赖于训练集的平均值。与以往的工作相比,我们选择了新的算子,特别是我们定义了一个新的Dirac算子,它可以扩展到不同类型的图结构数据。我们展示了对之前操作的改进,并将结果与Coma数据集上的当前基准进行了比较。
{"title":"Variational Auto-Encoders Without Graph Coarsening For Fine Mesh Learning","authors":"Nicolas Vercheval, H. Bie, A. Pižurica","doi":"10.1109/ICIP40778.2020.9191189","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191189","url":null,"abstract":"In this paper, we propose a Variational Auto-Encoder able to correctly reconstruct a fine mesh from a very low-dimensional latent space. The architecture avoids the usual coarsening of the graph and relies on pooling layers for the decoding phase and on the mean values of the training set for the up-sampling phase. We select new operators compared to previous work, and in particular, we define a new Dirac operator which can be extended to different types of graph structured data. We show the improvements over the previous operators and compare the results with the current benchmark on the Coma Dataset.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116387508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Interpretable Synthetic Reduced Nearest Neighbor: An Expectation Maximization Approach 可解释综合约简最近邻:一种期望最大化方法
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190986
Pooya Tavallali, P. Tavallali, M. Khosravi, M. Singhal
Synthetic Reduced Nearest Neighbor (SRNN) is a Nearest Neighbor model which is constrained to have K synthetic samples (prototypes/centroids). There has been little attempt toward direct optimization and interpretability of SRNN with proper guarantees like convergence. To tackle these issues, this paper, inspired by K-means algorithm, provides a novel optimization of Synthetic Reduced Nearest Neighbor based on Expectation Maximization (EM-SRNN) that always converges while also monotonically decreases the objective function. The optimization consists of iterating over the centroids of the model and assignment of training samples to centroids. The EM-SRNN is interpretable since the centroids represent sub-clusters of the classes. Such type of interpretability is suitable for various studies such as image processing and epidemiological studies. In this paper, analytical aspects of problem are explored and linear complexity of optimization over the trainset is shown. Finally, EM-SRNN is shown to have superior or similar performance when compared with several other interpretable and similar state-of-the-art models such trees and kernel SVMs.
合成减少最近邻(SRNN)是一个最近邻模型,它被约束为有K个合成样本(原型/质心)。对于SRNN的直接优化和可解释性,以及适当的保证(如收敛性),很少有尝试。为了解决这些问题,本文受K-means算法的启发,提出了一种基于期望最大化的合成最近邻简化优化算法(EM-SRNN),该算法总是收敛的同时也单调地减小目标函数。优化包括对模型的质心进行迭代和将训练样本分配给质心。EM-SRNN是可解释的,因为质心表示类的子簇。这种可解释性适用于各种研究,如图像处理和流行病学研究。本文对问题的解析方面进行了探讨,并展示了在训练集上优化的线性复杂性。最后,EM-SRNN与其他几个可解释的和类似的最先进的模型(如树和核支持向量机)相比,显示出优越或相似的性能。
{"title":"Interpretable Synthetic Reduced Nearest Neighbor: An Expectation Maximization Approach","authors":"Pooya Tavallali, P. Tavallali, M. Khosravi, M. Singhal","doi":"10.1109/ICIP40778.2020.9190986","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190986","url":null,"abstract":"Synthetic Reduced Nearest Neighbor (SRNN) is a Nearest Neighbor model which is constrained to have K synthetic samples (prototypes/centroids). There has been little attempt toward direct optimization and interpretability of SRNN with proper guarantees like convergence. To tackle these issues, this paper, inspired by K-means algorithm, provides a novel optimization of Synthetic Reduced Nearest Neighbor based on Expectation Maximization (EM-SRNN) that always converges while also monotonically decreases the objective function. The optimization consists of iterating over the centroids of the model and assignment of training samples to centroids. The EM-SRNN is interpretable since the centroids represent sub-clusters of the classes. Such type of interpretability is suitable for various studies such as image processing and epidemiological studies. In this paper, analytical aspects of problem are explored and linear complexity of optimization over the trainset is shown. Finally, EM-SRNN is shown to have superior or similar performance when compared with several other interpretable and similar state-of-the-art models such trees and kernel SVMs.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123401742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Spatio-Temporal Slowfast Self-Attention Network For Action Recognition 动作识别的时空慢速自注意网络
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191290
Myeongjun Kim, Taehun Kim, Daijin Kim
We propose Spatio-Temporal SlowFast Self-Attention network for action recognition. Conventional Convolutional Neural Networks have the advantage of capturing the local area of the data. However, to understand a human action, it is appropriate to consider both human and the overall context of given scene. Therefore, we repurpose a self-attention mechanism from Self-Attention GAN (SAGAN) to our model for retrieving global semantic context when making action recognition. Using the self-attention mechanism, we propose a module that can extract four features in video information: spatial information, temporal information, slow action information, and fast action information. We train and test our network on the Atomic Visual Actions (AVA) dataset and show significant frame-AP improvements on 28 categories.
我们提出了一种用于动作识别的时空慢速自注意网络。传统的卷积神经网络具有捕获数据局部区域的优点。然而,为了理解人类的行为,我们应该同时考虑人类和给定场景的整体背景。因此,我们将自注意GAN (SAGAN)中的自注意机制重新应用于我们的模型中,以便在进行动作识别时检索全局语义上下文。利用自注意机制,我们提出了一个模块,可以提取视频信息中的四个特征:空间信息、时间信息、慢动作信息和快动作信息。我们在原子视觉动作(AVA)数据集上训练和测试了我们的网络,并在28个类别上显示了显著的帧ap改进。
{"title":"Spatio-Temporal Slowfast Self-Attention Network For Action Recognition","authors":"Myeongjun Kim, Taehun Kim, Daijin Kim","doi":"10.1109/ICIP40778.2020.9191290","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191290","url":null,"abstract":"We propose Spatio-Temporal SlowFast Self-Attention network for action recognition. Conventional Convolutional Neural Networks have the advantage of capturing the local area of the data. However, to understand a human action, it is appropriate to consider both human and the overall context of given scene. Therefore, we repurpose a self-attention mechanism from Self-Attention GAN (SAGAN) to our model for retrieving global semantic context when making action recognition. Using the self-attention mechanism, we propose a module that can extract four features in video information: spatial information, temporal information, slow action information, and fast action information. We train and test our network on the Atomic Visual Actions (AVA) dataset and show significant frame-AP improvements on 28 categories.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123475174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Deep Learning And Interactivity For Video Rotoscoping 深度学习和交互性视频陀螺成像
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191057
Shivam Saboo, F. Lefèbvre, Vincent Demoulin
In this work we extend the idea of object co-segmentation [10] to perform interactive video segmentation. Our framework predicts the coordinates of vertices along the boundary of an object for two frames of a video simultaneously. The predicted vertices are interactive in nature and a user interaction on one frame assists the network to correct the predictions for both frames. We employ attention mechanism at the encoder stage and a simple combination network at the decoder stage which allows the network to perform this simultaneous correction efficiently. The framework is also robust to the distance between the two input frames as it can handle a distance of up to 50 frames in between the two inputs.We train our model on professional dataset, which consists pixel accurate annotations given by professional Roto artists. We test our model on DAVIS [15] and achieve state of the art results in both automatic and interactive mode surpassing Curve-GCN [11] and PolyRNN++ [1].
在这项工作中,我们扩展了对象共分割[10]的思想来进行交互式视频分割。我们的框架可以同时预测两帧视频中沿物体边界的顶点坐标。预测的顶点本质上是交互式的,用户在一帧上的交互有助于网络纠正对两帧的预测。我们在编码器阶段采用注意机制,在解码器阶段采用简单的组合网络,使网络能够有效地执行这种同步校正。该框架对两个输入帧之间的距离也具有鲁棒性,因为它可以处理两个输入帧之间最多50帧的距离。我们在专业数据集上训练我们的模型,该数据集由专业Roto艺术家给出的像素精确的注释组成。我们在DAVIS[15]上测试了我们的模型,并在自动和交互模式下获得了超越Curve-GCN[11]和PolyRNN++[1]的最新结果。
{"title":"Deep Learning And Interactivity For Video Rotoscoping","authors":"Shivam Saboo, F. Lefèbvre, Vincent Demoulin","doi":"10.1109/ICIP40778.2020.9191057","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191057","url":null,"abstract":"In this work we extend the idea of object co-segmentation [10] to perform interactive video segmentation. Our framework predicts the coordinates of vertices along the boundary of an object for two frames of a video simultaneously. The predicted vertices are interactive in nature and a user interaction on one frame assists the network to correct the predictions for both frames. We employ attention mechanism at the encoder stage and a simple combination network at the decoder stage which allows the network to perform this simultaneous correction efficiently. The framework is also robust to the distance between the two input frames as it can handle a distance of up to 50 frames in between the two inputs.We train our model on professional dataset, which consists pixel accurate annotations given by professional Roto artists. We test our model on DAVIS [15] and achieve state of the art results in both automatic and interactive mode surpassing Curve-GCN [11] and PolyRNN++ [1].","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123665348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2020 IEEE International Conference on Image Processing (ICIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1