首页 > 最新文献

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go 神经形态:无监督形状插值和对应在一个去
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00739
Marvin Eisenberger, David Novotný, Gael Kerchenbaum, Patrick Labatut, N. Neverova, D. Cremers, A. Vedaldi
We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes and produces in one go, i.e. in a single feed forward pass, a smooth interpolation and point-to-point correspondences between them. The interpolation, expressed as a deformation field, changes the pose of the source shape to resemble the target, but leaves the object identity unchanged. NeuroMorph uses an elegant architecture combining graph convolutions with global feature pooling to extract local features. During training, the model is incentivized to create realistic deformations by approximating geodesics on the underlying shape space manifold. This strong geometric prior allows to train our model end-to-end and in a fully unsupervised manner without requiring any manual correspondence annotations. NeuroMorph works well for a large variety of input shapes, including non-isometric pairs from different object categories. It obtains state-of-the-art results for both shape correspondence and interpolation tasks, matching or surpassing the performance of recent unsupervised and supervised methods on multiple benchmarks.
我们提出了NeuroMorph,一种新的神经网络架构,它将两个3D形状作为输入,并一次性产生,即在单个前馈传递中,平滑插值和它们之间的点对点对应。插值以变形场的形式表示,改变源形状的位姿,使其与目标形状相似,但不改变目标的身份。NeuroMorph使用一种优雅的架构,结合图卷积和全局特征池来提取局部特征。在训练过程中,激励模型通过在底层形状空间流形上近似测地线来创建逼真的变形。这种强大的几何先验允许以完全无监督的方式端到端训练我们的模型,而不需要任何手动通信注释。NeuroMorph可以很好地处理各种各样的输入形状,包括来自不同对象类别的非等距对。它为形状对应和插值任务获得了最先进的结果,在多个基准上匹配或超过了最近的无监督和有监督方法的性能。
{"title":"NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go","authors":"Marvin Eisenberger, David Novotný, Gael Kerchenbaum, Patrick Labatut, N. Neverova, D. Cremers, A. Vedaldi","doi":"10.1109/CVPR46437.2021.00739","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00739","url":null,"abstract":"We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes and produces in one go, i.e. in a single feed forward pass, a smooth interpolation and point-to-point correspondences between them. The interpolation, expressed as a deformation field, changes the pose of the source shape to resemble the target, but leaves the object identity unchanged. NeuroMorph uses an elegant architecture combining graph convolutions with global feature pooling to extract local features. During training, the model is incentivized to create realistic deformations by approximating geodesics on the underlying shape space manifold. This strong geometric prior allows to train our model end-to-end and in a fully unsupervised manner without requiring any manual correspondence annotations. NeuroMorph works well for a large variety of input shapes, including non-isometric pairs from different object categories. It obtains state-of-the-art results for both shape correspondence and interpolation tasks, matching or surpassing the performance of recent unsupervised and supervised methods on multiple benchmarks.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131152914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
Saliency-Guided Image Translation 显著性引导的图像翻译
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01624
Lai Jiang, Mai Xu, Xiaofei Wang, L. Sigal
In this paper, we propose a novel task for saliency-guided image translation, with the goal of image-to-image translation conditioned on the user specified saliency map. To address this problem, we develop a novel Generative Adversarial Network (GAN)-based model, called SalG-GAN. Given the original image and target saliency map, SalG-GAN can generate a translated image that satisfies the target saliency map. In SalG-GAN, a disentangled representation framework is proposed to encourage the model to learn diverse translations for the same target saliency condition. A saliency-based attention module is introduced as a special attention mechanism for facilitating the developed structures of saliency-guided generator, saliency cue encoder and saliency-guided global and local discriminators. Furthermore, we build a synthetic dataset and a real-world dataset with labeled visual attention for training and evaluating our SalG-GAN. The experimental results over both datasets verify the effectiveness of our model for saliency-guided image translation.
在本文中,我们提出了一种新的显著性引导图像翻译任务,其目标是在用户指定的显著性映射的条件下进行图像到图像的翻译。为了解决这个问题,我们开发了一种新的基于生成对抗网络(GAN)的模型,称为SalG-GAN。给定原始图像和目标显著性图,SalG-GAN可以生成满足目标显著性图的翻译图像。在SalG-GAN中,提出了一个解纠缠的表示框架,以鼓励模型在相同的目标显著性条件下学习不同的翻译。基于显著性的注意模块作为一种特殊的注意机制,促进了显著性引导生成器、显著性线索编码器和显著性引导全局和局部鉴别器结构的发展。此外,我们构建了一个合成数据集和一个带有标记视觉注意力的真实数据集,用于训练和评估我们的SalG-GAN。在两个数据集上的实验结果验证了我们的模型在显著性引导下的图像翻译中的有效性。
{"title":"Saliency-Guided Image Translation","authors":"Lai Jiang, Mai Xu, Xiaofei Wang, L. Sigal","doi":"10.1109/CVPR46437.2021.01624","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01624","url":null,"abstract":"In this paper, we propose a novel task for saliency-guided image translation, with the goal of image-to-image translation conditioned on the user specified saliency map. To address this problem, we develop a novel Generative Adversarial Network (GAN)-based model, called SalG-GAN. Given the original image and target saliency map, SalG-GAN can generate a translated image that satisfies the target saliency map. In SalG-GAN, a disentangled representation framework is proposed to encourage the model to learn diverse translations for the same target saliency condition. A saliency-based attention module is introduced as a special attention mechanism for facilitating the developed structures of saliency-guided generator, saliency cue encoder and saliency-guided global and local discriminators. Furthermore, we build a synthetic dataset and a real-world dataset with labeled visual attention for training and evaluating our SalG-GAN. The experimental results over both datasets verify the effectiveness of our model for saliency-guided image translation.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"417 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131527715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Intelligent Carpet: Inferring 3D Human Pose from Tactile Signals 智能地毯:从触觉信号推断3D人体姿势
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01110
Yiyue Luo, Yunzhu Li, Michael Foshey, Wan Shou, Pratyusha Sharma, Tomás Palacios, A. Torralba, W. Matusik
Daily human activities, e.g., locomotion, exercises, and resting, are heavily guided by the tactile interactions between the human and the ground. In this work, leveraging such tactile interactions, we propose a 3D human pose estimation approach using the pressure maps recorded by a tactile carpet as input. We build a low-cost, high-density, large-scale intelligent carpet, which enables the real-time recordings of human-floor tactile interactions in a seamless manner. We collect a synchronized tactile and visual dataset on various human activities. Employing a state-of-the-art camera-based pose estimation model as supervision, we design and implement a deep neural network model to infer 3D human poses using only the tactile information. Our pipeline can be further scaled up to multi-person pose estimation. We evaluate our system and demonstrate its potential applications in diverse fields.
人类的日常活动,如运动、锻炼和休息,在很大程度上是由人与地面之间的触觉相互作用指导的。在这项工作中,利用这种触觉交互,我们提出了一种3D人体姿势估计方法,使用触觉地毯记录的压力图作为输入。我们打造了一个低成本、高密度、大规模的智能地毯,可以无缝地实时记录人与地板的触觉互动。我们收集了各种人类活动的同步触觉和视觉数据集。采用最先进的基于相机的姿态估计模型作为监督,我们设计并实现了一个深度神经网络模型,仅使用触觉信息来推断3D人体姿态。我们的流水线可以进一步扩展到多人姿态估计。我们评估了我们的系统,并展示了它在不同领域的潜在应用。
{"title":"Intelligent Carpet: Inferring 3D Human Pose from Tactile Signals","authors":"Yiyue Luo, Yunzhu Li, Michael Foshey, Wan Shou, Pratyusha Sharma, Tomás Palacios, A. Torralba, W. Matusik","doi":"10.1109/CVPR46437.2021.01110","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01110","url":null,"abstract":"Daily human activities, e.g., locomotion, exercises, and resting, are heavily guided by the tactile interactions between the human and the ground. In this work, leveraging such tactile interactions, we propose a 3D human pose estimation approach using the pressure maps recorded by a tactile carpet as input. We build a low-cost, high-density, large-scale intelligent carpet, which enables the real-time recordings of human-floor tactile interactions in a seamless manner. We collect a synchronized tactile and visual dataset on various human activities. Employing a state-of-the-art camera-based pose estimation model as supervision, we design and implement a deep neural network model to infer 3D human poses using only the tactile information. Our pipeline can be further scaled up to multi-person pose estimation. We evaluate our system and demonstrate its potential applications in diverse fields.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131924872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Wide-Baseline Relative Camera Pose Estimation with Directional Learning 基于方向学习的宽基线相对相机姿态估计
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00327
Kefan Chen, Noah Snavely, A. Makadia
Modern deep learning techniques that regress the relative camera pose between two images have difficulty dealing with challenging scenarios, such as large camera motions resulting in occlusions and significant changes in perspective that leave little overlap between images. These models continue to struggle even with the benefit of large supervised training datasets. To address the limitations of these models, we take inspiration from techniques that show regressing keypoint locations in 2D and 3D can be improved by estimating a discrete distribution over keypoint locations. Analogously, in this paper we explore improving camera pose regression by instead predicting a discrete distribution over camera poses. To realize this idea, we introduce DirectionNet, which estimates discrete distributions over the 5D relative pose space using a novel parameterization to make the estimation problem tractable. Specifically, DirectionNet factorizes relative camera pose, specified by a 3D rotation and a translation direction, into a set of 3D direction vectors. Since 3D directions can be identified with points on the sphere, DirectionNet estimates discrete distributions on the sphere as its output. We evaluate our model on challenging synthetic and real pose estimation datasets constructed from Matterport3D and InteriorNet. Promising results show a near 50% reduction in error over direct regression methods.
现代深度学习技术对两幅图像之间的相对相机姿势进行回归,难以处理具有挑战性的场景,例如导致遮挡的大型相机运动和图像之间几乎没有重叠的重大视角变化。这些模型仍然在与大型监督训练数据集的优势作斗争。为了解决这些模型的局限性,我们从2D和3D中回归关键点位置的技术中获得灵感,这些技术可以通过估计关键点位置上的离散分布来改进。类似地,在本文中,我们探索通过预测相机姿势的离散分布来改进相机姿势回归。为了实现这一想法,我们引入了DirectionNet,它使用一种新的参数化方法来估计5D相对姿态空间上的离散分布,从而使估计问题易于处理。具体来说,DirectionNet将由3D旋转和平移方向指定的相对相机姿态分解为一组3D方向向量。由于三维方向可以通过球体上的点来识别,DirectionNet估计球体上的离散分布作为其输出。我们在Matterport3D和interornet构建的具有挑战性的合成和真实姿态估计数据集上评估了我们的模型。有希望的结果表明,与直接回归方法相比,误差减少了近50%。
{"title":"Wide-Baseline Relative Camera Pose Estimation with Directional Learning","authors":"Kefan Chen, Noah Snavely, A. Makadia","doi":"10.1109/CVPR46437.2021.00327","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00327","url":null,"abstract":"Modern deep learning techniques that regress the relative camera pose between two images have difficulty dealing with challenging scenarios, such as large camera motions resulting in occlusions and significant changes in perspective that leave little overlap between images. These models continue to struggle even with the benefit of large supervised training datasets. To address the limitations of these models, we take inspiration from techniques that show regressing keypoint locations in 2D and 3D can be improved by estimating a discrete distribution over keypoint locations. Analogously, in this paper we explore improving camera pose regression by instead predicting a discrete distribution over camera poses. To realize this idea, we introduce DirectionNet, which estimates discrete distributions over the 5D relative pose space using a novel parameterization to make the estimation problem tractable. Specifically, DirectionNet factorizes relative camera pose, specified by a 3D rotation and a translation direction, into a set of 3D direction vectors. Since 3D directions can be identified with points on the sphere, DirectionNet estimates discrete distributions on the sphere as its output. We evaluate our model on challenging synthetic and real pose estimation datasets constructed from Matterport3D and InteriorNet. Promising results show a near 50% reduction in error over direct regression methods.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134067077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Clusformer: A Transformer based Clustering Approach to Unsupervised Large-scale Face and Visual Landmark Recognition 聚类器:一种基于变压器的无监督大规模人脸和视觉地标识别聚类方法
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01070
Xuan-Bac Nguyen, Duc Toan Bui, C. Duong, T. D. Bui, Khoa Luu
The research in automatic unsupervised visual clustering has received considerable attention over the last couple years. It aims at explaining distributions of unlabeled visual images by clustering them via a parameterized model of appearance. Graph Convolutional Neural Networks (GCN) have recently been one of the most popular clustering methods. However, it has reached some limitations. Firstly, it is quite sensitive to hard or noisy samples. Secondly, it is hard to investigate with various deep network models due to its computational training time. Finally, it is hard to design an end-to-end training model between the deep feature extraction and GCN clustering modeling. This work therefore presents the Clusformer, a simple but new perspective of Transformer based approach, to automatic visual clustering via its unsupervised attention mechanism. The proposed method is able to robustly deal with noisy or hard samples. It is also flexible and effective to collaborate with different deep network models with various model sizes in an end-to-end framework. The proposed method is evaluated on two popular large-scale visual databases, i.e. Google Landmark and MS-Celeb1M face database, and outperforms prior unsupervised clustering methods. Code will be available at https://github.com/VinAIResearch/Clusformer
近年来,自动无监督视觉聚类的研究受到了广泛的关注。它旨在通过参数化的外观模型通过聚类来解释未标记视觉图像的分布。图卷积神经网络(GCN)是近年来最流行的聚类方法之一。然而,它已经达到了一些局限性。首先,它对硬样本或有噪声的样本非常敏感。其次,由于各种深度网络模型的计算训练时间,难以对其进行研究。最后,很难在深度特征提取和GCN聚类建模之间设计一个端到端的训练模型。因此,这项工作提出了Clusformer,一个简单但基于Transformer方法的新视角,通过其无监督注意机制实现自动视觉聚类。该方法能够鲁棒地处理噪声或硬样本。在端到端框架中与不同模型大小的不同深度网络模型进行协作也是灵活有效的。在谷歌Landmark和MS-Celeb1M两种流行的大规模视觉数据库上对该方法进行了评估,结果表明该方法优于先前的无监督聚类方法。代码将在https://github.com/VinAIResearch/Clusformer上提供
{"title":"Clusformer: A Transformer based Clustering Approach to Unsupervised Large-scale Face and Visual Landmark Recognition","authors":"Xuan-Bac Nguyen, Duc Toan Bui, C. Duong, T. D. Bui, Khoa Luu","doi":"10.1109/CVPR46437.2021.01070","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01070","url":null,"abstract":"The research in automatic unsupervised visual clustering has received considerable attention over the last couple years. It aims at explaining distributions of unlabeled visual images by clustering them via a parameterized model of appearance. Graph Convolutional Neural Networks (GCN) have recently been one of the most popular clustering methods. However, it has reached some limitations. Firstly, it is quite sensitive to hard or noisy samples. Secondly, it is hard to investigate with various deep network models due to its computational training time. Finally, it is hard to design an end-to-end training model between the deep feature extraction and GCN clustering modeling. This work therefore presents the Clusformer, a simple but new perspective of Transformer based approach, to automatic visual clustering via its unsupervised attention mechanism. The proposed method is able to robustly deal with noisy or hard samples. It is also flexible and effective to collaborate with different deep network models with various model sizes in an end-to-end framework. The proposed method is evaluated on two popular large-scale visual databases, i.e. Google Landmark and MS-Celeb1M face database, and outperforms prior unsupervised clustering methods. Code will be available at https://github.com/VinAIResearch/Clusformer","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130388856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction COMPLETER:通过对比预测的不完全多视图聚类
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01102
Yijie Lin, Yuanbiao Gou, Zitao Liu, Boyun Li, Jiancheng Lv, Xi Peng
In this paper, we study two challenging problems in incomplete multi-view clustering analysis, namely, i) how to learn an informative and consistent representation among different views without the help of labels and ii) how to recover the missing views from data. To this end, we propose a novel objective that incorporates representation learning and data recovery into a unified framework from the view of information theory. To be specific, the informative and consistent representation is learned by maximizing the mutual information across different views through contrastive learning, and the missing views are recovered by minimizing the conditional entropy of different views through dual prediction. To the best of our knowledge, this could be the first work to provide a theoretical framework that unifies the consistent representation learning and cross-view data recovery. Extensive experimental results show the proposed method remarkably outperforms 10 competitive multi-view clustering methods on four challenging datasets. The code is available at https://pengxi.me.
本文研究了不完全多视图聚类分析中的两个具有挑战性的问题,即如何在不借助标签的情况下学习不同视图之间的信息一致的表示和如何从数据中恢复缺失的视图。为此,我们提出了一个新的目标,即从信息论的角度将表示学习和数据恢复整合到一个统一的框架中。具体而言,通过对比学习最大化不同视图之间的相互信息,获得信息丰富的一致表示,通过对偶预测最小化不同视图的条件熵,恢复缺失的视图。据我们所知,这可能是第一个提供统一一致表示学习和跨视图数据恢复的理论框架的工作。大量的实验结果表明,该方法在4个具有挑战性的数据集上显著优于10种竞争多视图聚类方法。代码可在https://pengxi.me上获得。
{"title":"COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction","authors":"Yijie Lin, Yuanbiao Gou, Zitao Liu, Boyun Li, Jiancheng Lv, Xi Peng","doi":"10.1109/CVPR46437.2021.01102","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01102","url":null,"abstract":"In this paper, we study two challenging problems in incomplete multi-view clustering analysis, namely, i) how to learn an informative and consistent representation among different views without the help of labels and ii) how to recover the missing views from data. To this end, we propose a novel objective that incorporates representation learning and data recovery into a unified framework from the view of information theory. To be specific, the informative and consistent representation is learned by maximizing the mutual information across different views through contrastive learning, and the missing views are recovered by minimizing the conditional entropy of different views through dual prediction. To the best of our knowledge, this could be the first work to provide a theoretical framework that unifies the consistent representation learning and cross-view data recovery. Extensive experimental results show the proposed method remarkably outperforms 10 competitive multi-view clustering methods on four challenging datasets. The code is available at https://pengxi.me.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122091928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 129
Fine-Grained Shape-Appearance Mutual Learning for Cloth-Changing Person Re-Identification 换布人再识别的细粒形相互鉴
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01037
Peixian Hong, Tao Wu, Ancong Wu, Xintong Han, Weishi Zheng
Recently, person re-identification (Re-ID) has achieved great progress. However, current methods largely depend on color appearance, which is not reliable when a person changes the clothes. Cloth-changing Re-ID is challenging since pedestrian images with clothes change exhibit large intra-class variation and small inter-class variation. Some significant features for identification are embedded in unobvious body shape differences across pedestrians. To explore such body shape cues for cloth-changing Re-ID, we propose a Fine-grained Shape-Appearance Mutual learning framework (FSAM), a two-stream framework that learns fine-grained discriminative body shape knowledge in a shape stream and transfers it to an appearance stream to complement the cloth-unrelated knowledge in the appearance features. Specifically, in the shape stream, FSAM learns fine-grained discriminative mask with the guidance of identities and extracts fine-grained body shape features by a pose-specific multi-branch network. To complement cloth-unrelated shape knowledge in the appearance stream, dense interactive mutual learning is performed across low-level and high-level features to transfer knowledge from shape stream to appearance stream, which enables the appearance stream to be deployed independently without extra computation for mask estimation. We evaluated our method on benchmark cloth-changing Re-ID datasets and achieved the start-of-the-art performance.
近年来,个人身份再识别(Re-ID)取得了很大进展。然而,目前的方法很大程度上依赖于颜色外观,当一个人换衣服时,这是不可靠的。换衣Re-ID具有挑战性,因为换衣后的行人图像表现出较大的类内变化和较小的类间变化。行人之间不明显的体型差异中嵌入了一些重要的识别特征。为了探索这种改变布料的Re-ID的身体形状线索,我们提出了一个细粒度形状-外观相互学习框架(FSAM),这是一个两流框架,它在形状流中学习细粒度区分的身体形状知识,并将其转移到外观流中,以补充外观特征中与布料无关的知识。具体来说,在形状流中,FSAM在身份的引导下学习细粒度的判别掩模,并通过针对姿态的多分支网络提取细粒度的体型特征。为了补充外观流中与布料无关的形状知识,在低级和高级特征之间进行密集的交互互学习,将知识从形状流转移到外观流,从而使外观流能够独立部署,而无需额外的掩模估计计算。我们在基准换布Re-ID数据集上评估了我们的方法,并取得了最先进的性能。
{"title":"Fine-Grained Shape-Appearance Mutual Learning for Cloth-Changing Person Re-Identification","authors":"Peixian Hong, Tao Wu, Ancong Wu, Xintong Han, Weishi Zheng","doi":"10.1109/CVPR46437.2021.01037","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01037","url":null,"abstract":"Recently, person re-identification (Re-ID) has achieved great progress. However, current methods largely depend on color appearance, which is not reliable when a person changes the clothes. Cloth-changing Re-ID is challenging since pedestrian images with clothes change exhibit large intra-class variation and small inter-class variation. Some significant features for identification are embedded in unobvious body shape differences across pedestrians. To explore such body shape cues for cloth-changing Re-ID, we propose a Fine-grained Shape-Appearance Mutual learning framework (FSAM), a two-stream framework that learns fine-grained discriminative body shape knowledge in a shape stream and transfers it to an appearance stream to complement the cloth-unrelated knowledge in the appearance features. Specifically, in the shape stream, FSAM learns fine-grained discriminative mask with the guidance of identities and extracts fine-grained body shape features by a pose-specific multi-branch network. To complement cloth-unrelated shape knowledge in the appearance stream, dense interactive mutual learning is performed across low-level and high-level features to transfer knowledge from shape stream to appearance stream, which enables the appearance stream to be deployed independently without extra computation for mask estimation. We evaluated our method on benchmark cloth-changing Re-ID datasets and achieved the start-of-the-art performance.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124795501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Controllable Image Restoration for Under-Display Camera in Smartphones 智能手机下显相机的可控图像恢复
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00211
Ki-Won Kwon, Eunhee Kang, Sangwon Lee, Su-Jin Lee, Hyong-Euk Lee, ByungIn Yoo, Jae-Joon Han
Under-display camera (UDC) technology is essential for full-screen display in smartphones and is achieved by removing the concept of drilling holes on display. However, this causes inevitable image degradation in the form of spatially variant blur and noise because of the opaque display in front of the camera. To address spatially variant blur and noise in UDC images, we propose a novel controllable image restoration algorithm utilizing pixel-wise UDC-specific kernel representation and a noise estimator. The kernel representation is derived from an elaborate optical model that reflects the effect of both normal and oblique light incidence. Also, noise-adaptive learning is introduced to control noise levels, which can be utilized to provide optimal results depending on the user preferences. The experiments showed that the proposed method achieved superior quantitative performance as well as higher perceptual quality on both a real-world dataset and a monitor-based aligned dataset compared to conventional image restoration algorithms.
显示屏下摄像头(UDC)技术对于智能手机的全屏显示至关重要,它消除了在显示屏上钻孔的概念。然而,这导致不可避免的图像退化,其形式是空间变化的模糊和噪声,因为在相机前的不透明显示。为了解决UDC图像中空间变化的模糊和噪声问题,我们提出了一种新的可控图像恢复算法,该算法利用像素级UDC特定核表示和噪声估计器。核表示是从一个复杂的光学模型推导出来的,该模型反映了正入射光和斜入射光的影响。此外,引入噪声自适应学习来控制噪声水平,可根据用户偏好提供最佳结果。实验表明,与传统的图像恢复算法相比,该方法在真实数据集和基于监视器的对齐数据集上都取得了更好的定量性能和感知质量。
{"title":"Controllable Image Restoration for Under-Display Camera in Smartphones","authors":"Ki-Won Kwon, Eunhee Kang, Sangwon Lee, Su-Jin Lee, Hyong-Euk Lee, ByungIn Yoo, Jae-Joon Han","doi":"10.1109/CVPR46437.2021.00211","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00211","url":null,"abstract":"Under-display camera (UDC) technology is essential for full-screen display in smartphones and is achieved by removing the concept of drilling holes on display. However, this causes inevitable image degradation in the form of spatially variant blur and noise because of the opaque display in front of the camera. To address spatially variant blur and noise in UDC images, we propose a novel controllable image restoration algorithm utilizing pixel-wise UDC-specific kernel representation and a noise estimator. The kernel representation is derived from an elaborate optical model that reflects the effect of both normal and oblique light incidence. Also, noise-adaptive learning is introduced to control noise levels, which can be utilized to provide optimal results depending on the user preferences. The experiments showed that the proposed method achieved superior quantitative performance as well as higher perceptual quality on both a real-world dataset and a monitor-based aligned dataset compared to conventional image restoration algorithms.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132478980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Safe Local Motion Planning with Self-Supervised Freespace Forecasting 安全的局部运动规划与自监督自由空间预测
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.01254
Peiyun Hu, Aaron Huang, J. Dolan, David Held, Deva Ramanan
Safe local motion planning for autonomous driving in dynamic environments requires forecasting how the scene evolves. Practical autonomy stacks adopt a semantic object-centric representation of a dynamic scene and build object detection, tracking, and prediction modules to solve forecasting. However, training these modules comes at an enormous human cost of manually annotated objects across frames. In this work, we explore future freespace as an alternative representation to support motion planning. Our key intuition is that it is important to avoid straying into occupied space regardless of what is occupying it. Importantly, computing ground-truth future freespace is annotation-free. First, we explore freespace forecasting as a self-supervised learning task. We then demonstrate how to use forecasted freespace to identify collision-prone plans from off-the-shelf motion planners. Finally, we propose future freespace as an additional source of annotation-free supervision. We demonstrate how to integrate such supervision into the learning-based planners. Experimental results on nuScenes and CARLA suggest both approaches lead to a significant reduction in collision rates.1
动态环境中自动驾驶的安全局部运动规划需要预测场景的演变。实用的自治堆栈采用以语义对象为中心的动态场景表示,并构建对象检测、跟踪和预测模块来解决预测问题。然而,训练这些模块需要付出巨大的人力成本,需要跨框架手动标注对象。在这项工作中,我们探索未来的自由空间作为支持运动规划的替代表示。我们的主要直觉是,避免误入已被占用的空间是很重要的,不管占用的是什么。重要的是,计算基真未来自由空间是不需要注释的。首先,我们将自由空间预测作为一种自监督学习任务进行探索。然后,我们演示了如何使用预测的自由空间来识别来自现成的运动规划器的容易发生碰撞的计划。最后,我们建议未来的自由空间作为无注释监督的额外来源。我们演示了如何将这种监督整合到基于学习的计划中。在nuScenes和CARLA上的实验结果表明,这两种方法都能显著降低碰撞率
{"title":"Safe Local Motion Planning with Self-Supervised Freespace Forecasting","authors":"Peiyun Hu, Aaron Huang, J. Dolan, David Held, Deva Ramanan","doi":"10.1109/CVPR46437.2021.01254","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.01254","url":null,"abstract":"Safe local motion planning for autonomous driving in dynamic environments requires forecasting how the scene evolves. Practical autonomy stacks adopt a semantic object-centric representation of a dynamic scene and build object detection, tracking, and prediction modules to solve forecasting. However, training these modules comes at an enormous human cost of manually annotated objects across frames. In this work, we explore future freespace as an alternative representation to support motion planning. Our key intuition is that it is important to avoid straying into occupied space regardless of what is occupying it. Importantly, computing ground-truth future freespace is annotation-free. First, we explore freespace forecasting as a self-supervised learning task. We then demonstrate how to use forecasted freespace to identify collision-prone plans from off-the-shelf motion planners. Finally, we propose future freespace as an additional source of annotation-free supervision. We demonstrate how to integrate such supervision into the learning-based planners. Experimental results on nuScenes and CARLA suggest both approaches lead to a significant reduction in collision rates.1","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"26 826 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132138548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
BABEL: Bodies, Action and Behavior with English Labels 巴别塔:英语标签下的身体、动作和行为
Pub Date : 2021-06-01 DOI: 10.1109/CVPR46437.2021.00078
Abhinanda R. Punnakkal, Arjun Chandrasekaran, Nikos Athanasiou, Alejandra Quiros-Ramirez, Michael J. Black Max Planck Institute for Intelligent Systems, Universität Konstanz
Understanding the semantics of human movement – the what, how and why of the movement – is an important problem that requires datasets of human actions with semantic labels. Existing datasets take one of two approaches. Large-scale video datasets contain many action labels but do not contain ground-truth 3D human motion. Alternatively, motion-capture (mocap) datasets have precise body motions but are limited to a small number of actions. To address this, we present BABEL, a large dataset with language labels describing the actions being performed in mocap sequences. BABEL consists of language labels for over 43 hours of mocap sequences from AMASS, containing over 250 unique actions. Each action label in BABEL is precisely aligned with the duration of the corresponding action in the mocap sequence. BABELalso allows overlap of multiple actions, that may each span different durations. This results in a total of over 66000 action segments. The dense annotations can be leveraged for tasks like action recognition, temporal localization, motion synthesis, etc. To demonstrate the value of BABEL as a benchmark, we evaluate the performance of models on 3D action recognition. We demonstrate that BABEL poses interesting learning challenges that are applicable to real-world scenarios, and can serve as a useful benchmark for progress in 3D action recognition. The dataset, baseline methods, and evaluation code are available and supported for academic research purposes at https://babel.is.tue.mpg.de/.
理解人类运动的语义——运动的内容、方式和原因——是一个重要的问题,它需要带有语义标签的人类行为数据集。现有的数据集采用两种方法之一。大规模视频数据集包含许多动作标签,但不包含真实的3D人体运动。另外,动作捕捉(mocap)数据集具有精确的身体动作,但仅限于少量动作。为了解决这个问题,我们提出了BABEL,这是一个大型数据集,其中包含描述动作捕捉序列中执行的动作的语言标签。BABEL由来自AMASS的超过43小时的动作捕捉序列的语言标签组成,包含250多个独特的动作。BABEL中的每个动作标签都精确地与动作捕捉序列中相应动作的持续时间对齐。babel还允许多个动作重叠,每个动作可能跨越不同的持续时间。这就产生了总共超过66000个行动环节。密集注释可以用于动作识别、时间定位、运动合成等任务。为了证明BABEL作为基准的价值,我们评估了模型在3D动作识别上的性能。我们证明BABEL提出了适用于现实世界场景的有趣的学习挑战,并且可以作为3D动作识别进展的有用基准。数据集、基线方法和评估代码可在https://babel.is.tue.mpg.de/上获得并支持用于学术研究目的。
{"title":"BABEL: Bodies, Action and Behavior with English Labels","authors":"Abhinanda R. Punnakkal, Arjun Chandrasekaran, Nikos Athanasiou, Alejandra Quiros-Ramirez, Michael J. Black Max Planck Institute for Intelligent Systems, Universität Konstanz","doi":"10.1109/CVPR46437.2021.00078","DOIUrl":"https://doi.org/10.1109/CVPR46437.2021.00078","url":null,"abstract":"Understanding the semantics of human movement – the what, how and why of the movement – is an important problem that requires datasets of human actions with semantic labels. Existing datasets take one of two approaches. Large-scale video datasets contain many action labels but do not contain ground-truth 3D human motion. Alternatively, motion-capture (mocap) datasets have precise body motions but are limited to a small number of actions. To address this, we present BABEL, a large dataset with language labels describing the actions being performed in mocap sequences. BABEL consists of language labels for over 43 hours of mocap sequences from AMASS, containing over 250 unique actions. Each action label in BABEL is precisely aligned with the duration of the corresponding action in the mocap sequence. BABELalso allows overlap of multiple actions, that may each span different durations. This results in a total of over 66000 action segments. The dense annotations can be leveraged for tasks like action recognition, temporal localization, motion synthesis, etc. To demonstrate the value of BABEL as a benchmark, we evaluate the performance of models on 3D action recognition. We demonstrate that BABEL poses interesting learning challenges that are applicable to real-world scenarios, and can serve as a useful benchmark for progress in 3D action recognition. The dataset, baseline methods, and evaluation code are available and supported for academic research purposes at https://babel.is.tue.mpg.de/.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133675573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 91
期刊
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1