首页 > 最新文献

2021 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

英文 中文
Gradient Distribution Alignment Certificates Better Adversarial Domain Adaptation 梯度分布对齐证书更好的对抗域适应
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00881
Z. Gao, Shufei Zhang, Kaizhu Huang, Qiufeng Wang, Chaoliang Zhong
The latest heuristic for handling the domain shift in un-supervised domain adaptation tasks is to reduce the data distribution discrepancy using adversarial learning. Recent studies improve the conventional adversarial domain adaptation methods with discriminative information by integrating the classifier’s outputs into distribution divergence measurement. However, they still suffer from the equilibrium problem of adversarial learning in which even if the discriminator is fully confused, sufficient similarity between two distributions cannot be guaranteed. To overcome this problem, we propose a novel approach named feature gradient distribution alignment (FGDA)1. We demonstrate the rationale of our method both theoretically and empirically. In particular, we show that the distribution discrepancy can be reduced by constraining feature gradients of two domains to have similar distributions. Meanwhile, our method enjoys a theoretical guarantee that a tighter error upper bound for target samples can be obtained than that of conventional adversarial domain adaptation methods. By integrating the proposed method with existing adversarial domain adaptation models, we achieve state-of-the-art performance on two real-world benchmark datasets.
在无监督域自适应任务中,最新的启发式方法是利用对抗学习来减少数据分布差异。最近的研究通过将分类器的输出集成到分布散度测量中,改进了传统的带有判别信息的对抗域自适应方法。然而,它们仍然存在对抗学习的平衡问题,即即使判别器完全混淆,也不能保证两个分布之间足够的相似性。为了克服这个问题,我们提出了一种新的方法,称为特征梯度分布对齐(FGDA)1。我们从理论上和经验上论证了我们方法的基本原理。特别是,我们表明,通过约束两个域的特征梯度使其具有相似的分布,可以减少分布差异。同时,与传统的对抗域自适应方法相比,该方法可以获得更严格的目标样本误差上界。通过将所提出的方法与现有的对抗性域自适应模型相结合,我们在两个真实世界的基准数据集上实现了最先进的性能。
{"title":"Gradient Distribution Alignment Certificates Better Adversarial Domain Adaptation","authors":"Z. Gao, Shufei Zhang, Kaizhu Huang, Qiufeng Wang, Chaoliang Zhong","doi":"10.1109/ICCV48922.2021.00881","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00881","url":null,"abstract":"The latest heuristic for handling the domain shift in un-supervised domain adaptation tasks is to reduce the data distribution discrepancy using adversarial learning. Recent studies improve the conventional adversarial domain adaptation methods with discriminative information by integrating the classifier’s outputs into distribution divergence measurement. However, they still suffer from the equilibrium problem of adversarial learning in which even if the discriminator is fully confused, sufficient similarity between two distributions cannot be guaranteed. To overcome this problem, we propose a novel approach named feature gradient distribution alignment (FGDA)1. We demonstrate the rationale of our method both theoretically and empirically. In particular, we show that the distribution discrepancy can be reduced by constraining feature gradients of two domains to have similar distributions. Meanwhile, our method enjoys a theoretical guarantee that a tighter error upper bound for target samples can be obtained than that of conventional adversarial domain adaptation methods. By integrating the proposed method with existing adversarial domain adaptation models, we achieve state-of-the-art performance on two real-world benchmark datasets.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"5 1","pages":"8917-8926"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79687852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
You Don’t Only Look Once: Constructing Spatial-Temporal Memory for Integrated 3D Object Detection and Tracking 你不只是看一次:构建时空记忆集成3D目标检测和跟踪
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00317
Jiaming Sun, Yiming Xie, Siyu Zhang, Linghao Chen, Guofeng Zhang, H. Bao, Xiaowei Zhou
Humans are able to continuously detect and track surrounding objects by constructing a spatial-temporal memory of the objects when looking around. In contrast, 3D object detectors in existing tracking-by-detection systems often search for objects in every new video frame from scratch, without fully leveraging memory from previous detection results. In this work, we propose a novel system for integrated 3D object detection and tracking, which uses a dynamic object occupancy map and previous object states as spatial-temporal memory to assist object detection in future frames. This memory, together with the ego-motion from back-end odometry, guides the detector to achieve more efficient object proposal generation and more accurate object state estimation. The experiments demonstrate the effectiveness of the proposed system and its performance on the ScanNet and KITTI datasets. Moreover, the proposed system produces stable bounding boxes and pose trajectories over time, while being able to handle occluded and truncated objects. Code is available at the project page: https://zju3dv.github.io/UDOLO.
人类能够在环顾四周时,通过构建物体的时空记忆,持续地探测和跟踪周围的物体。相比之下,在现有的检测跟踪系统中,3D物体检测器通常从头开始在每个新的视频帧中搜索物体,而没有充分利用以前检测结果的内存。在这项工作中,我们提出了一种集成3D物体检测和跟踪的新系统,该系统使用动态物体占用地图和先前的物体状态作为时空记忆来辅助未来帧中的物体检测。这种记忆与后端里程计的自我运动相结合,指导检测器实现更高效的目标提议生成和更准确的目标状态估计。实验证明了该系统在ScanNet和KITTI数据集上的有效性和性能。此外,该系统可以生成稳定的边界框和姿态轨迹,同时能够处理遮挡和截断的物体。代码可从项目页面获得:https://zju3dv.github.io/UDOLO。
{"title":"You Don’t Only Look Once: Constructing Spatial-Temporal Memory for Integrated 3D Object Detection and Tracking","authors":"Jiaming Sun, Yiming Xie, Siyu Zhang, Linghao Chen, Guofeng Zhang, H. Bao, Xiaowei Zhou","doi":"10.1109/ICCV48922.2021.00317","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00317","url":null,"abstract":"Humans are able to continuously detect and track surrounding objects by constructing a spatial-temporal memory of the objects when looking around. In contrast, 3D object detectors in existing tracking-by-detection systems often search for objects in every new video frame from scratch, without fully leveraging memory from previous detection results. In this work, we propose a novel system for integrated 3D object detection and tracking, which uses a dynamic object occupancy map and previous object states as spatial-temporal memory to assist object detection in future frames. This memory, together with the ego-motion from back-end odometry, guides the detector to achieve more efficient object proposal generation and more accurate object state estimation. The experiments demonstrate the effectiveness of the proposed system and its performance on the ScanNet and KITTI datasets. Moreover, the proposed system produces stable bounding boxes and pose trajectories over time, while being able to handle occluded and truncated objects. Code is available at the project page: https://zju3dv.github.io/UDOLO.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"9 1","pages":"3265-3174"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79829259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Partial Off-policy Learning: Balance Accuracy and Diversity for Human-Oriented Image Captioning 部分非策略学习:以人为本的图像字幕的平衡准确性和多样性
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00219
Jiahe Shi, Yali Li, Shengjin Wang
Human-oriented image captioning with both high diversity and accuracy is a challenging task in vision+language modeling. The reinforcement learning (RL) based frameworks promote the accuracy of image captioning, yet seriously hurt the diversity. In contrast, other methods based on variational auto-encoder (VAE) or generative adversarial network (GAN) can produce diverse yet less accurate captions. In this work, we devote our attention to promote the diversity of RL-based image captioning. To be specific, we devise a partial off-policy learning scheme to balance accuracy and diversity. First, we keep the model exposed to varied candidate captions by sampling from the initial state before RL launched. Second, a novel criterion named max-CIDEr is proposed to serve as the reward for promoting diversity. We combine the above-mentioned offpolicy strategy with the on-policy one to moderate the exploration effect, further balancing the diversity and accuracy for human-like image captioning. Experiments show that our method locates the closest to human performance in the diversity-accuracy space, and achieves the highest Pearson correlation as 0.337 with human performance.
在视觉+语言建模中,具有高度多样性和准确性的以人为本的图像字幕是一项具有挑战性的任务。基于强化学习(RL)的框架提高了图像字幕的准确性,但严重损害了图像字幕的多样性。相比之下,其他基于变分自编码器(VAE)或生成对抗网络(GAN)的方法可以产生多种但不太准确的字幕。在这项工作中,我们致力于促进基于强化学习的图像字幕的多样性。具体来说,我们设计了一个局部的非策略学习方案来平衡准确性和多样性。首先,我们通过从RL启动前的初始状态采样,使模型暴露于不同的候选标题。其次,提出了一个新的标准max-CIDEr作为促进多样性的奖励。我们将上述的非政策策略与政策策略相结合,以调节探索效果,进一步平衡类人图像字幕的多样性和准确性。实验表明,我们的方法在多样性-精度空间中最接近人类的表现,与人类表现的Pearson相关性最高,为0.337。
{"title":"Partial Off-policy Learning: Balance Accuracy and Diversity for Human-Oriented Image Captioning","authors":"Jiahe Shi, Yali Li, Shengjin Wang","doi":"10.1109/ICCV48922.2021.00219","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00219","url":null,"abstract":"Human-oriented image captioning with both high diversity and accuracy is a challenging task in vision+language modeling. The reinforcement learning (RL) based frameworks promote the accuracy of image captioning, yet seriously hurt the diversity. In contrast, other methods based on variational auto-encoder (VAE) or generative adversarial network (GAN) can produce diverse yet less accurate captions. In this work, we devote our attention to promote the diversity of RL-based image captioning. To be specific, we devise a partial off-policy learning scheme to balance accuracy and diversity. First, we keep the model exposed to varied candidate captions by sampling from the initial state before RL launched. Second, a novel criterion named max-CIDEr is proposed to serve as the reward for promoting diversity. We combine the above-mentioned offpolicy strategy with the on-policy one to moderate the exploration effect, further balancing the diversity and accuracy for human-like image captioning. Experiments show that our method locates the closest to human performance in the diversity-accuracy space, and achieves the highest Pearson correlation as 0.337 with human performance.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"42 1","pages":"2167-2176"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82518379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Geometric Deep Neural Network using Rigid and Non-Rigid Transformations for Human Action Recognition 基于刚性和非刚性变换的几何深度神经网络用于人体动作识别
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01238
Rasha Friji, Hassen Drira, F. Chaieb, Hamza Kchok, S. Kurtek
Deep Learning architectures, albeit successful in most computer vision tasks, were designed for data with an underlying Euclidean structure, which is not usually fulfilled since pre-processed data may lie on a non-linear space. In this paper, we propose a geometry aware deep learning approach using rigid and non rigid transformation optimization for skeleton-based action recognition. Skeleton sequences are first modeled as trajectories on Kendall’s shape space and then mapped to the linear tangent space. The resulting structured data are then fed to a deep learning architecture, which includes a layer that optimizes over rigid and non rigid transformations of the 3D skeletons, followed by a CNN-LSTM network. The assessment on two large scale skeleton datasets, namely NTU-RGB+D and NTU-RGB+D 120, has proven that the proposed approach outperforms existing geometric deep learning methods and exceeds recently published approaches with respect to the majority of configurations.
尽管深度学习架构在大多数计算机视觉任务中取得了成功,但它是为具有底层欧几里德结构的数据而设计的,由于预处理数据可能位于非线性空间,因此通常无法实现。在本文中,我们提出了一种基于骨架的动作识别的几何感知深度学习方法,该方法使用刚性和非刚性转换优化。骨架序列首先建模为肯德尔形状空间上的轨迹,然后映射到线性切线空间。然后将得到的结构化数据馈送到深度学习架构中,该架构包括一个优化3D骨架的刚性和非刚性转换的层,然后是CNN-LSTM网络。对NTU-RGB+D和NTU-RGB+D 120两个大型骨架数据集的评估证明,所提出的方法优于现有的几何深度学习方法,并且在大多数配置方面超过了最近发表的方法。
{"title":"Geometric Deep Neural Network using Rigid and Non-Rigid Transformations for Human Action Recognition","authors":"Rasha Friji, Hassen Drira, F. Chaieb, Hamza Kchok, S. Kurtek","doi":"10.1109/ICCV48922.2021.01238","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01238","url":null,"abstract":"Deep Learning architectures, albeit successful in most computer vision tasks, were designed for data with an underlying Euclidean structure, which is not usually fulfilled since pre-processed data may lie on a non-linear space. In this paper, we propose a geometry aware deep learning approach using rigid and non rigid transformation optimization for skeleton-based action recognition. Skeleton sequences are first modeled as trajectories on Kendall’s shape space and then mapped to the linear tangent space. The resulting structured data are then fed to a deep learning architecture, which includes a layer that optimizes over rigid and non rigid transformations of the 3D skeletons, followed by a CNN-LSTM network. The assessment on two large scale skeleton datasets, namely NTU-RGB+D and NTU-RGB+D 120, has proven that the proposed approach outperforms existing geometric deep learning methods and exceeds recently published approaches with respect to the majority of configurations.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"20 1","pages":"12591-12600"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80321482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Hierarchical Disentangled Representation Learning for Outdoor Illumination Estimation and Editing 户外照明估计与编辑的分层解纠缠表示学习
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01503
Piaopiao Yu, Jie Guo, Fan Huang, Cheng Zhou, H. Che, Xiao Ling, Yanwen Guo
Data-driven sky models have gained much attention in outdoor illumination prediction recently, showing superior performance against analytical models. However, naively compressing an outdoor panorama into a low-dimensional latent vector, as existing models have done, causes two major problems. One is the mutual interference between the HDR intensity of the sun and the complex textures of the surrounding sky, and the other is the lack of fine-grained control over independent lighting factors due to the entangled representation. To address these issues, we propose a hierarchical disentangled sky model (HDSky) for outdoor illumination prediction. With this model, any outdoor panorama can be hierarchically disentangled into several factors based on three well-designed autoencoders. The first autoencoder compresses each sunny panorama into a sky vector and a sun vector with some constraints. The second autoencoder and the third autoencoder further disentangle the sun intensity and the sky intensity from the sun vector and the sky vector with several customized loss functions respectively. Moreover, a unified framework is designed to predict all-weather sky information from a single outdoor image. Through extensive experiments, we demonstrate that the proposed model significantly improves the accuracy of outdoor illumination prediction. It also allows users to intuitively edit the predicted panorama (e.g., changing the position of the sun while preserving others), without sacrificing physical plausibility.
数据驱动的天空模型近年来在室外照明预测中受到越来越多的关注,显示出比分析模型更优越的性能。然而,像现有模型所做的那样,天真地将室外全景压缩成低维潜在向量,会导致两个主要问题。一个是太阳的HDR强度与周围天空的复杂纹理之间的相互干扰,另一个是由于纠缠表示而缺乏对独立照明因素的细粒度控制。为了解决这些问题,我们提出了一种用于室外照明预测的分层解纠缠天空模型(HDSky)。利用该模型,基于三个精心设计的自编码器,任何户外全景都可以分层地分解为几个因素。第一个自动编码器将每个晴朗的全景压缩成一个天空矢量和一个带有一些约束的太阳矢量。第二个自编码器和第三个自编码器分别使用几个定制的损失函数将太阳强度和天空强度从太阳矢量和天空矢量中进一步分离出来。此外,还设计了一个统一的框架,从单一的户外图像中预测全天候的天空信息。通过大量的实验,我们证明了该模型显著提高了室外光照预测的精度。它还允许用户直观地编辑预测的全景(例如,在保留其他位置的同时改变太阳的位置),而不会牺牲物理上的合理性。
{"title":"Hierarchical Disentangled Representation Learning for Outdoor Illumination Estimation and Editing","authors":"Piaopiao Yu, Jie Guo, Fan Huang, Cheng Zhou, H. Che, Xiao Ling, Yanwen Guo","doi":"10.1109/ICCV48922.2021.01503","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01503","url":null,"abstract":"Data-driven sky models have gained much attention in outdoor illumination prediction recently, showing superior performance against analytical models. However, naively compressing an outdoor panorama into a low-dimensional latent vector, as existing models have done, causes two major problems. One is the mutual interference between the HDR intensity of the sun and the complex textures of the surrounding sky, and the other is the lack of fine-grained control over independent lighting factors due to the entangled representation. To address these issues, we propose a hierarchical disentangled sky model (HDSky) for outdoor illumination prediction. With this model, any outdoor panorama can be hierarchically disentangled into several factors based on three well-designed autoencoders. The first autoencoder compresses each sunny panorama into a sky vector and a sun vector with some constraints. The second autoencoder and the third autoencoder further disentangle the sun intensity and the sky intensity from the sun vector and the sky vector with several customized loss functions respectively. Moreover, a unified framework is designed to predict all-weather sky information from a single outdoor image. Through extensive experiments, we demonstrate that the proposed model significantly improves the accuracy of outdoor illumination prediction. It also allows users to intuitively edit the predicted panorama (e.g., changing the position of the sun while preserving others), without sacrificing physical plausibility.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"79 1","pages":"15293-15302"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80359899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
CryoDRGN2: Ab initio neural reconstruction of 3D protein structures from real cryo-EM images CryoDRGN2:基于真实冷冻电镜图像的三维蛋白质结构从头算神经重建
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00403
Ellen D. Zhong, Adam K. Lerer, Joey Davis, B. Berger
Protein structure determination from cryo-EM data requires reconstructing a 3D volume (or distribution of volumes) from many noisy and randomly oriented 2D projection images. While the standard homogeneous reconstruction task aims to recover a single static structure, recently-proposed neural and non-neural methods can reconstruct distributions of structures, thereby enabling the study of protein complexes that possess intrinsic structural or conformational heterogeneity. These heterogeneous reconstruction methods, however, require fixed image poses, which are typically estimated from an upstream homogeneous reconstruction and are not guaranteed to be accurate under highly heterogeneous conditions.In this work we describe cryoDRGN2, an ab initio reconstruction algorithm, which can jointly estimate image poses and learn a neural model of a distribution of 3D structures on real heterogeneous cryo-EM data. To achieve this, we adapt search algorithms from the traditional cryo-EM literature, and describe the optimizations and design choices required to make such a search procedure computationally tractable in the neural model setting. We show that cryoDRGN2 is robust to the high noise levels of real cryo-EM images, trains faster than earlier neural methods, and achieves state-of-the-art performance on real cryo-EM datasets.
从冷冻电镜数据中确定蛋白质结构需要从许多噪声和随机定向的2D投影图像中重建3D体积(或体积分布)。虽然标准的均质重建任务旨在恢复单个静态结构,但最近提出的神经和非神经方法可以重建结构的分布,从而能够研究具有内在结构或构象异质性的蛋白质复合物。然而,这些异构重建方法需要固定的图像姿态,这些姿态通常是从上游的均匀重建中估计出来的,并且不能保证在高度异构的条件下是准确的。在这项工作中,我们描述了一种从头开始重建算法cryoDRGN2,它可以联合估计图像姿态并学习真实异质低温电镜数据上三维结构分布的神经模型。为了实现这一目标,我们采用了传统低温电镜文献中的搜索算法,并描述了在神经模型设置中使这种搜索过程易于计算的优化和设计选择。我们表明,cryoDRGN2对真实低温电镜图像的高噪声水平具有鲁棒性,比早期的神经方法训练速度更快,并且在真实低温电镜数据集上实现了最先进的性能。
{"title":"CryoDRGN2: Ab initio neural reconstruction of 3D protein structures from real cryo-EM images","authors":"Ellen D. Zhong, Adam K. Lerer, Joey Davis, B. Berger","doi":"10.1109/ICCV48922.2021.00403","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00403","url":null,"abstract":"Protein structure determination from cryo-EM data requires reconstructing a 3D volume (or distribution of volumes) from many noisy and randomly oriented 2D projection images. While the standard homogeneous reconstruction task aims to recover a single static structure, recently-proposed neural and non-neural methods can reconstruct distributions of structures, thereby enabling the study of protein complexes that possess intrinsic structural or conformational heterogeneity. These heterogeneous reconstruction methods, however, require fixed image poses, which are typically estimated from an upstream homogeneous reconstruction and are not guaranteed to be accurate under highly heterogeneous conditions.In this work we describe cryoDRGN2, an ab initio reconstruction algorithm, which can jointly estimate image poses and learn a neural model of a distribution of 3D structures on real heterogeneous cryo-EM data. To achieve this, we adapt search algorithms from the traditional cryo-EM literature, and describe the optimizations and design choices required to make such a search procedure computationally tractable in the neural model setting. We show that cryoDRGN2 is robust to the high noise levels of real cryo-EM images, trains faster than earlier neural methods, and achieves state-of-the-art performance on real cryo-EM datasets.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"195 1","pages":"4046-4055"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75884883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions RMSMP:一种新颖的行混合多精度深度神经网络量化框架
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00520
Sung-En Chang, Yanyu Li, Mengshu Sun, Weiwen Jiang, Sijia Liu, Yanzhi Wang, Xue Lin
This work proposes a novel Deep Neural Network (DNN) quantization framework, namely RMSMP, with a Row-wise Mixed-Scheme and Multi-Precision approach. Specifically, this is the first effort to assign mixed quantization schemes and multiple precisions within layers – among rows of the DNN weight matrix, for simplified operations in hardware inference, while preserving accuracy. Furthermore, this paper makes a different observation from the prior work that the quantization error does not necessarily exhibit the layer-wise sensitivity, and actually can be mitigated as long as a certain portion of the weights in every layer are in higher precisions. This observation enables layer-wise uniformality in the hardware implementation towards guaranteed inference acceleration, while still enjoying row-wise flexibility of mixed schemes and multiple precisions to boost accuracy. The candidates of schemes and precisions are derived practically and effectively with a highly hardware-informative strategy to reduce the problem search space.With the offline determined ratio of different quantization schemes and precisions for all the layers, the RMSMP quantization algorithm uses Hessian and variance based method to effectively assign schemes and precisions for each row. The proposed RMSMP is tested for the image classification and natural language processing (BERT) applications, and achieves the best accuracy performance among state-of-the-arts under the same equivalent precisions. The RMSMP is implemented on FPGA devices, achieving 3.65× speedup in the end-to-end inference time for ResNet-18 on ImageNet, comparing with the 4-bit Fixed-point baseline.
本文提出了一种新颖的深度神经网络(DNN)量化框架,即RMSMP,采用行混合方案和多精度方法。具体来说,这是第一次在层内分配混合量化方案和多重精度-在DNN权重矩阵的行之间,以简化硬件推理中的操作,同时保持精度。此外,与以往的研究不同,本文观察到量化误差并不一定表现出分层敏感性,实际上只要每层中有一定比例的权重精度较高,量化误差就可以得到缓解。这种观察使硬件实现的分层均匀性能够保证推理加速,同时仍然享受混合方案和多个精度的行方向灵活性以提高准确性。采用高硬件信息量的策略,切实有效地推导了备选方案和精度,减少了问题的搜索空间。RMSMP量化算法通过离线确定各层不同量化方案和精度的比例,利用基于Hessian和方差的方法对每一行进行有效的方案和精度分配。在图像分类和自然语言处理(BERT)应用中进行了测试,在相同的等效精度下,RMSMP达到了目前最先进的精度性能。RMSMP在FPGA器件上实现,与4位定点基线相比,在ImageNet上ResNet-18的端到端推理时间上实现了3.65倍的加速。
{"title":"RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise Mixed Schemes and Multiple Precisions","authors":"Sung-En Chang, Yanyu Li, Mengshu Sun, Weiwen Jiang, Sijia Liu, Yanzhi Wang, Xue Lin","doi":"10.1109/ICCV48922.2021.00520","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00520","url":null,"abstract":"This work proposes a novel Deep Neural Network (DNN) quantization framework, namely RMSMP, with a Row-wise Mixed-Scheme and Multi-Precision approach. Specifically, this is the first effort to assign mixed quantization schemes and multiple precisions within layers – among rows of the DNN weight matrix, for simplified operations in hardware inference, while preserving accuracy. Furthermore, this paper makes a different observation from the prior work that the quantization error does not necessarily exhibit the layer-wise sensitivity, and actually can be mitigated as long as a certain portion of the weights in every layer are in higher precisions. This observation enables layer-wise uniformality in the hardware implementation towards guaranteed inference acceleration, while still enjoying row-wise flexibility of mixed schemes and multiple precisions to boost accuracy. The candidates of schemes and precisions are derived practically and effectively with a highly hardware-informative strategy to reduce the problem search space.With the offline determined ratio of different quantization schemes and precisions for all the layers, the RMSMP quantization algorithm uses Hessian and variance based method to effectively assign schemes and precisions for each row. The proposed RMSMP is tested for the image classification and natural language processing (BERT) applications, and achieves the best accuracy performance among state-of-the-arts under the same equivalent precisions. The RMSMP is implemented on FPGA devices, achieving 3.65× speedup in the end-to-end inference time for ResNet-18 on ImageNet, comparing with the 4-bit Fixed-point baseline.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"25 1","pages":"5231-5240"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82243688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Else-Net: Elastic Semantic Network for Continual Action Recognition from Skeleton Data Else-Net:基于骨架数据的连续动作识别弹性语义网络
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01318
Tianjiao Li, Qiuhong Ke, Hossein Rahmani, Rui En Ho, Henghui Ding, Jun Liu
Most of the state-of-the-art action recognition methods focus on offline learning, where the samples of all types of actions need to be provided at once. Here, we address continual learning of action recognition, where various types of new actions are continuously learned over time. This task is quite challenging, owing to the catastrophic forgetting problem stemming from the discrepancies between the previously learned actions and current new actions to be learned. Therefore, we propose Else-Net, a novel Elastic Semantic Network with multiple learning blocks to learn diversified human actions over time. Specifically, our Else-Net is able to automatically search and update the most relevant learning blocks w.r.t. the current new action, or explore new blocks to store new knowledge, preserving the unmatched ones to retain the knowledge of previously learned actions and alleviates forgetting when learning new actions. Moreover, even though different human actions may vary to a large extent as a whole, their local body parts can still share many homogeneous features. Inspired by this, our proposed Else-Net mines the shared knowledge of the decomposed human body parts from different actions, which benefits continual learning of actions. Experiments show that the proposed approach enables effective continual action recognition and achieves promising performance on two large-scale action recognition datasets.
大多数最先进的动作识别方法都集中在离线学习上,需要一次提供所有类型动作的样本。在这里,我们讨论动作识别的持续学习,随着时间的推移,各种类型的新动作被不断学习。这个任务是相当具有挑战性的,因为灾难性的遗忘问题源于先前学习的行为和当前要学习的新行为之间的差异。因此,我们提出了一种新的弹性语义网络Else-Net,它具有多个学习块,可以随着时间的推移学习不同的人类行为。具体来说,我们的Else-Net能够自动搜索和更新与当前新动作相关的最相关的学习块,或者探索新的块来存储新知识,保留不匹配的块来保留以前学习过的动作知识,并减轻学习新动作时的遗忘。此外,尽管不同的人类行为可能在很大程度上是一个整体,但他们的局部身体部位仍然可以共享许多同质特征。受此启发,我们提出的Else-Net从不同的动作中挖掘分解的人体部位的共享知识,有利于动作的持续学习。实验表明,该方法能够有效地进行连续动作识别,并在两个大规模动作识别数据集上取得了令人满意的效果。
{"title":"Else-Net: Elastic Semantic Network for Continual Action Recognition from Skeleton Data","authors":"Tianjiao Li, Qiuhong Ke, Hossein Rahmani, Rui En Ho, Henghui Ding, Jun Liu","doi":"10.1109/ICCV48922.2021.01318","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01318","url":null,"abstract":"Most of the state-of-the-art action recognition methods focus on offline learning, where the samples of all types of actions need to be provided at once. Here, we address continual learning of action recognition, where various types of new actions are continuously learned over time. This task is quite challenging, owing to the catastrophic forgetting problem stemming from the discrepancies between the previously learned actions and current new actions to be learned. Therefore, we propose Else-Net, a novel Elastic Semantic Network with multiple learning blocks to learn diversified human actions over time. Specifically, our Else-Net is able to automatically search and update the most relevant learning blocks w.r.t. the current new action, or explore new blocks to store new knowledge, preserving the unmatched ones to retain the knowledge of previously learned actions and alleviates forgetting when learning new actions. Moreover, even though different human actions may vary to a large extent as a whole, their local body parts can still share many homogeneous features. Inspired by this, our proposed Else-Net mines the shared knowledge of the decomposed human body parts from different actions, which benefits continual learning of actions. Experiments show that the proposed approach enables effective continual action recognition and achieves promising performance on two large-scale action recognition datasets.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"36 1","pages":"13414-13423"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82246088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Channel Augmented Joint Learning for Visible-Infrared Recognition 可见-红外识别的通道增强联合学习
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.01331
Mang Ye, Weijian Ruan, Bo Du, Mike Zheng Shou
This paper introduces a powerful channel augmented joint learning strategy for the visible-infrared recognition problem. For data augmentation, most existing methods directly adopt the standard operations designed for single-modality visible images, and thus do not fully consider the imagery properties in visible to infrared matching. Our basic idea is to homogenously generate color-irrelevant images by randomly exchanging the color channels. It can be seamlessly integrated into existing augmentation operations without modifying the network, consistently improving the robustness against color variations. Incorporated with a random erasing strategy, it further greatly enriches the diversity by simulating random occlusions. For cross-modality metric learning, we design an enhanced channel-mixed learning strategy to simultaneously handle the intra-and cross-modality variations with squared difference for stronger discriminability. Besides, a channel-augmented joint learning strategy is further developed to explicitly optimize the outputs of augmented images. Extensive experiments with insightful analysis on two visible-infrared recognition tasks show that the proposed strategies consistently improve the accuracy. Without auxiliary information, it improves the state-of-the-art Rank-1/mAP by 14.59%/13.00% on the large-scale SYSU-MM01 dataset.
针对可见红外识别问题,提出了一种强大的通道增强联合学习策略。对于数据增强,现有方法大多直接采用针对单模态可见光图像设计的标准操作,没有充分考虑可见光与红外匹配的图像特性。我们的基本思想是通过随机交换颜色通道来均匀地生成与颜色无关的图像。它可以无缝地集成到现有的增强操作中,而无需修改网络,不断提高对颜色变化的鲁棒性。结合随机擦除策略,通过模拟随机遮挡,进一步极大地丰富了多样性。对于跨模态度量学习,我们设计了一种增强的通道混合学习策略,以同时处理模态内和跨模态的平方差异,以增强可判别性。此外,进一步开发了一种通道增强联合学习策略,以明确优化增强图像的输出。对两种可见红外识别任务进行了深入的实验分析,结果表明所提出的策略能够持续提高识别精度。在没有辅助信息的情况下,在SYSU-MM01大规模数据集上,该方法将最先进的Rank-1/mAP提高了14.59%/13.00%。
{"title":"Channel Augmented Joint Learning for Visible-Infrared Recognition","authors":"Mang Ye, Weijian Ruan, Bo Du, Mike Zheng Shou","doi":"10.1109/ICCV48922.2021.01331","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.01331","url":null,"abstract":"This paper introduces a powerful channel augmented joint learning strategy for the visible-infrared recognition problem. For data augmentation, most existing methods directly adopt the standard operations designed for single-modality visible images, and thus do not fully consider the imagery properties in visible to infrared matching. Our basic idea is to homogenously generate color-irrelevant images by randomly exchanging the color channels. It can be seamlessly integrated into existing augmentation operations without modifying the network, consistently improving the robustness against color variations. Incorporated with a random erasing strategy, it further greatly enriches the diversity by simulating random occlusions. For cross-modality metric learning, we design an enhanced channel-mixed learning strategy to simultaneously handle the intra-and cross-modality variations with squared difference for stronger discriminability. Besides, a channel-augmented joint learning strategy is further developed to explicitly optimize the outputs of augmented images. Extensive experiments with insightful analysis on two visible-infrared recognition tasks show that the proposed strategies consistently improve the accuracy. Without auxiliary information, it improves the state-of-the-art Rank-1/mAP by 14.59%/13.00% on the large-scale SYSU-MM01 dataset.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"46 1","pages":"13547-13556"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81160858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-tree Complex Wavelet Representation and Contradict Channel Loss 基于分层双树复小波表示和矛盾信道损失的单幅图像去雪算法
Pub Date : 2021-10-01 DOI: 10.1109/ICCV48922.2021.00416
Wei-Ting Chen, H. Fang, Cheng-Lin Hsieh, Cheng-Che Tsai, I-Hsiang Chen, Jianwei Ding, Sy-Yen Kuo
Snow is a highly complicated atmospheric phenomenon that usually contains snowflake, snow streak, and veiling effect (similar to the haze or the mist). In this literature, we propose a single image desnowing algorithm to address the diversity of snow particles in shape and size. First, to better represent the complex snow shape, we apply the dual-tree wavelet transform and propose a complex wavelet loss in the network. Second, we propose a hierarchical decomposition paradigm in our network for better under-standing the different sizes of snow particles. Last, we propose a novel feature called the contradict channel (CC) for the snow scenes. We find that the regions containing the snow particles tend to have higher intensity in the CC than that in the snow-free regions. We leverage this discriminative feature to construct the contradict channel loss for improving the performance of snow removal. Moreover, due to the limitation of existing snow datasets, to simulate the snow scenarios comprehensively, we propose a large-scale dataset called Comprehensive Snow Dataset (CSD). Experimental results show that the proposed method can favorably outperform existing methods in three synthetic datasets and real-world datasets. The code and dataset are released in https://github.com/weitingchen83/ICCV2021-Single-Image-Desnowing-HDCWNet.
雪是一种高度复杂的大气现象,通常包含雪花、雪条和遮蔽效应(类似于雾霾或薄雾)。在这篇文献中,我们提出了一种单图像降雪算法来解决雪颗粒在形状和大小上的多样性。首先,为了更好地表征复杂雪形,我们采用双树小波变换,并在网络中提出复小波损失。其次,我们在我们的网络中提出了一个分层分解范式,以便更好地理解不同大小的雪颗粒。最后,我们提出了一种新的特征,称为矛盾通道(CC)的雪景。我们发现,在有雪的地区,CC的强度往往高于无雪地区。我们利用这种判别特征来构建矛盾信道损失,以提高除雪性能。此外,由于现有积雪数据集的局限性,为了全面模拟积雪情景,我们提出了一个大型数据集,称为综合积雪数据集(CSD)。实验结果表明,该方法在三种合成数据集和实际数据集上均优于现有方法。代码和数据集发布在https://github.com/weitingchen83/ICCV2021-Single-Image-Desnowing-HDCWNet。
{"title":"ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-tree Complex Wavelet Representation and Contradict Channel Loss","authors":"Wei-Ting Chen, H. Fang, Cheng-Lin Hsieh, Cheng-Che Tsai, I-Hsiang Chen, Jianwei Ding, Sy-Yen Kuo","doi":"10.1109/ICCV48922.2021.00416","DOIUrl":"https://doi.org/10.1109/ICCV48922.2021.00416","url":null,"abstract":"Snow is a highly complicated atmospheric phenomenon that usually contains snowflake, snow streak, and veiling effect (similar to the haze or the mist). In this literature, we propose a single image desnowing algorithm to address the diversity of snow particles in shape and size. First, to better represent the complex snow shape, we apply the dual-tree wavelet transform and propose a complex wavelet loss in the network. Second, we propose a hierarchical decomposition paradigm in our network for better under-standing the different sizes of snow particles. Last, we propose a novel feature called the contradict channel (CC) for the snow scenes. We find that the regions containing the snow particles tend to have higher intensity in the CC than that in the snow-free regions. We leverage this discriminative feature to construct the contradict channel loss for improving the performance of snow removal. Moreover, due to the limitation of existing snow datasets, to simulate the snow scenarios comprehensively, we propose a large-scale dataset called Comprehensive Snow Dataset (CSD). Experimental results show that the proposed method can favorably outperform existing methods in three synthetic datasets and real-world datasets. The code and dataset are released in https://github.com/weitingchen83/ICCV2021-Single-Image-Desnowing-HDCWNet.","PeriodicalId":6820,"journal":{"name":"2021 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"32 1","pages":"4176-4185"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89800029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
期刊
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1