首页 > 最新文献

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
3D Human Mesh Regression With Dense Correspondence 具有密集对应的三维人体网格回归
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.00708
Wang Zeng, Wanli Ouyang, P. Luo, Wentao Liu, Xiaogang Wang
Estimating 3D mesh of the human body from a single 2D image is an important task with many applications such as augmented reality and Human-Robot interaction. However, prior works reconstructed 3D mesh from global image feature extracted by using convolutional neural network (CNN), where the dense correspondences between the mesh surface and the image pixels are missing, leading to suboptimal solution. This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i.e. a 2D space used for texture mapping of 3D mesh). DecoMR first predicts pixel-to-surface dense correspondence map (i.e., IUV image), with which we transfer local features from the image space to the UV space. Then the transferred local image features are processed in the UV space to regress a location map, which is well aligned with transferred features. Finally we reconstruct 3D human mesh from the regressed location map with a predefined mapping function. We also observe that the existing discontinuous UV map are unfriendly to the learning of network. Therefore, we propose a novel UV map that maintains most of the neighboring relations on the original mesh surface. Experiments demonstrate that our proposed local feature alignment and continuous UV map outperforms existing 3D mesh based methods on multiple public benchmarks. Code will be made available at https: //github.com/zengwang430521/DecoMR.
从单幅二维图像中估计人体的三维网格是增强现实和人机交互等许多应用中的重要任务。然而,先前的研究利用卷积神经网络(CNN)提取的全局图像特征重建三维网格,网格表面与图像像素之间的密集对应关系缺失,导致次优解。本文提出了一种无模型的三维人体网格估计框架DecoMR,该框架在UV空间(即用于三维网格纹理映射的二维空间)中明确地建立了网格与局部图像特征之间的密集对应关系。DecoMR首先预测像素到表面的密集对应映射(即IUV图像),利用该映射将局部特征从图像空间转移到UV空间。然后在UV空间中对转移的局部图像特征进行处理,回归出与转移特征对齐良好的位置图。最后,利用回归的位置图和预定义的映射函数重建三维人体网格。我们还发现,现有的不连续UV图不利于网络的学习。因此,我们提出了一种保持原始网格表面上大部分相邻关系的新型UV图。实验表明,我们提出的局部特征对齐和连续UV地图在多个公共基准测试中优于现有的基于3D网格的方法。代码将在https: //github.com/zengwang430521/DecoMR上提供。
{"title":"3D Human Mesh Regression With Dense Correspondence","authors":"Wang Zeng, Wanli Ouyang, P. Luo, Wentao Liu, Xiaogang Wang","doi":"10.1109/CVPR42600.2020.00708","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.00708","url":null,"abstract":"Estimating 3D mesh of the human body from a single 2D image is an important task with many applications such as augmented reality and Human-Robot interaction. However, prior works reconstructed 3D mesh from global image feature extracted by using convolutional neural network (CNN), where the dense correspondences between the mesh surface and the image pixels are missing, leading to suboptimal solution. This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i.e. a 2D space used for texture mapping of 3D mesh). DecoMR first predicts pixel-to-surface dense correspondence map (i.e., IUV image), with which we transfer local features from the image space to the UV space. Then the transferred local image features are processed in the UV space to regress a location map, which is well aligned with transferred features. Finally we reconstruct 3D human mesh from the regressed location map with a predefined mapping function. We also observe that the existing discontinuous UV map are unfriendly to the learning of network. Therefore, we propose a novel UV map that maintains most of the neighboring relations on the original mesh surface. Experiments demonstrate that our proposed local feature alignment and continuous UV map outperforms existing 3D mesh based methods on multiple public benchmarks. Code will be made available at https: //github.com/zengwang430521/DecoMR.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"62 1","pages":"7052-7061"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77511763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
SpSequenceNet: Semantic Segmentation Network on 4D Point Clouds SpSequenceNet: 4D点云的语义分割网络
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00463
Hanyu Shi, Guosheng Lin, Hao Wang, Tzu-Yi Hung, Zhenhua Wang
Point clouds are useful in many applications like autonomous driving and robotics as they provide natural 3D information of the surrounding environments. While there are extensive research on 3D point clouds, scene understanding on 4D point clouds, a series of consecutive 3D point clouds frames, is an emerging topic and yet under-investigated. With 4D point clouds (3D point cloud videos), robotic systems could enhance their robustness by leveraging the temporal information from previous frames. However, the existing semantic segmentation methods on 4D point clouds suffer from low precision due to the spatial and temporal information loss in their network structures. In this paper, we propose SpSequenceNet to address this problem. The network is designed based on 3D sparse convolution. And we introduce two novel modules, a cross-frame global attention module and a cross-frame local interpolation module, to capture spatial and temporal information in 4D point clouds. We conduct extensive experiments on SemanticKITTI, and achieve the state-of-the-art result of 43.1% on mIoU, which is 1.5% higher than the previous best approach.
点云在自动驾驶和机器人等许多应用中都很有用,因为它们提供了周围环境的自然3D信息。虽然对3D点云的研究非常广泛,但对4D点云(一系列连续的3D点云帧)的场景理解是一个新兴的话题,但研究还不够充分。利用4D点云(3D点云视频),机器人系统可以通过利用前一帧的时间信息来增强其鲁棒性。然而,现有的四维点云语义分割方法由于其网络结构中时空信息的丢失,导致分割精度较低。在本文中,我们提出SpSequenceNet来解决这个问题。该网络是基于三维稀疏卷积设计的。并引入了跨帧全局关注模块和跨帧局部插值模块,实现了四维点云的时空信息捕获。我们在SemanticKITTI上进行了大量的实验,在mIoU上获得了43.1%的最先进结果,比以前的最佳方法提高了1.5%。
{"title":"SpSequenceNet: Semantic Segmentation Network on 4D Point Clouds","authors":"Hanyu Shi, Guosheng Lin, Hao Wang, Tzu-Yi Hung, Zhenhua Wang","doi":"10.1109/cvpr42600.2020.00463","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00463","url":null,"abstract":"Point clouds are useful in many applications like autonomous driving and robotics as they provide natural 3D information of the surrounding environments. While there are extensive research on 3D point clouds, scene understanding on 4D point clouds, a series of consecutive 3D point clouds frames, is an emerging topic and yet under-investigated. With 4D point clouds (3D point cloud videos), robotic systems could enhance their robustness by leveraging the temporal information from previous frames. However, the existing semantic segmentation methods on 4D point clouds suffer from low precision due to the spatial and temporal information loss in their network structures. In this paper, we propose SpSequenceNet to address this problem. The network is designed based on 3D sparse convolution. And we introduce two novel modules, a cross-frame global attention module and a cross-frame local interpolation module, to capture spatial and temporal information in 4D point clouds. We conduct extensive experiments on SemanticKITTI, and achieve the state-of-the-art result of 43.1% on mIoU, which is 1.5% higher than the previous best approach.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"37 1","pages":"4573-4582"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79819596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Online Depth Learning Against Forgetting in Monocular Videos 单目视频中的在线深度学习对抗遗忘
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00455
Zhenyu Zhang, Stéphane Lathuilière, E. Ricci, N. Sebe, Yan Yan, Jian Yang
Online depth learning is the problem of consistently adapting a depth estimation model to handle a continuously changing environment. This problem is challenging due to the network easily overfits on the current environment and forgets its past experiences. To address such problem, this paper presents a novel Learning to Prevent Forgetting (LPF) method for online mono-depth adaptation to new target domains in unsupervised manner. Instead of updating the universal parameters, LPF learns adapter modules to efficiently adjust the feature representation and distribution without losing the pre-learned knowledge in online condition. Specifically, to adapt temporal-continuous depth patterns in videos, we introduce a novel meta-learning approach to learn adapter modules by combining online adaptation process into the learning objective. To further avoid overfitting, we propose a novel temporal-consistent regularization to harmonize the gradient descent procedure at each online learning step. Extensive evaluations on real-world datasets demonstrate that the proposed method, with very limited parameters, significantly improves the estimation quality.
在线深度学习是一个不断调整深度估计模型来处理不断变化的环境的问题。这个问题很有挑战性,因为网络很容易对当前环境过拟合而忘记过去的经验。为了解决这一问题,本文提出了一种新的学习防止遗忘(LPF)方法,用于以无监督方式在线单深度适应新的目标域。LPF不需要更新通用参数,而是通过学习适配模块来有效地调整特征表示和分布,而不会丢失在线状态下预先学习的知识。具体来说,为了适应视频中的时间连续深度模式,我们引入了一种新的元学习方法,通过将在线适应过程结合到学习目标中来学习适配器模块。为了进一步避免过拟合,我们提出了一种新的时间一致正则化来协调每个在线学习步骤的梯度下降过程。对真实数据集的广泛评估表明,该方法在参数非常有限的情况下显著提高了估计质量。
{"title":"Online Depth Learning Against Forgetting in Monocular Videos","authors":"Zhenyu Zhang, Stéphane Lathuilière, E. Ricci, N. Sebe, Yan Yan, Jian Yang","doi":"10.1109/cvpr42600.2020.00455","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00455","url":null,"abstract":"Online depth learning is the problem of consistently adapting a depth estimation model to handle a continuously changing environment. This problem is challenging due to the network easily overfits on the current environment and forgets its past experiences. To address such problem, this paper presents a novel Learning to Prevent Forgetting (LPF) method for online mono-depth adaptation to new target domains in unsupervised manner. Instead of updating the universal parameters, LPF learns adapter modules to efficiently adjust the feature representation and distribution without losing the pre-learned knowledge in online condition. Specifically, to adapt temporal-continuous depth patterns in videos, we introduce a novel meta-learning approach to learn adapter modules by combining online adaptation process into the learning objective. To further avoid overfitting, we propose a novel temporal-consistent regularization to harmonize the gradient descent procedure at each online learning step. Extensive evaluations on real-world datasets demonstrate that the proposed method, with very limited parameters, significantly improves the estimation quality.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"20 1","pages":"4493-4502"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80092190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
DAVD-Net: Deep Audio-Aided Video Decompression of Talking Heads 深音频辅助视频解压缩的说话头
Pub Date : 2020-06-01 DOI: 10.1109/CVPR42600.2020.01235
Xi Zhang, Xiaolin Wu, Xinliang Zhai, Xianye Ben, Chengjie Tu
Close-up talking heads are among the most common and salient object in video contents, such as face-to-face conversations in social media, teleconferences, news broadcasting, talk shows, etc. Due to the high sensitivity of human visual system to faces, compression distortions in talking heads videos are highly visible and annoying. To address this problem, we present a novel deep convolutional neural network (DCNN) method for very low bit rate video reconstruction of talking heads. The key innovation is a new DCNN architecture that can exploit the audio-video correlations to repair compression defects in the face region. We further improve reconstruction quality by embedding into our DCNN the encoder information of the video compression standards and introducing a constraining projection module in the network. Extensive experiments demonstrate that the proposed DCNN method outperforms the existing state-of-the-art methods on videos of talking heads.
特写说话头是视频内容中最常见和最突出的对象之一,例如社交媒体中的面对面交谈、电话会议、新闻广播、谈话节目等。由于人类视觉系统对人脸的高度敏感性,说话头视频中的压缩失真非常明显,令人讨厌。为了解决这个问题,我们提出了一种新的深度卷积神经网络(DCNN)方法,用于非常低比特率的谈话头视频重建。关键的创新是一种新的DCNN架构,它可以利用音视频相关性来修复面部区域的压缩缺陷。通过在DCNN中嵌入视频压缩标准的编码器信息,并在网络中引入约束投影模块,进一步提高了重建质量。大量的实验表明,所提出的DCNN方法在说话头视频上优于现有的最先进的方法。
{"title":"DAVD-Net: Deep Audio-Aided Video Decompression of Talking Heads","authors":"Xi Zhang, Xiaolin Wu, Xinliang Zhai, Xianye Ben, Chengjie Tu","doi":"10.1109/CVPR42600.2020.01235","DOIUrl":"https://doi.org/10.1109/CVPR42600.2020.01235","url":null,"abstract":"Close-up talking heads are among the most common and salient object in video contents, such as face-to-face conversations in social media, teleconferences, news broadcasting, talk shows, etc. Due to the high sensitivity of human visual system to faces, compression distortions in talking heads videos are highly visible and annoying. To address this problem, we present a novel deep convolutional neural network (DCNN) method for very low bit rate video reconstruction of talking heads. The key innovation is a new DCNN architecture that can exploit the audio-video correlations to repair compression defects in the face region. We further improve reconstruction quality by embedding into our DCNN the encoder information of the video compression standards and introducing a constraining projection module in the network. Extensive experiments demonstrate that the proposed DCNN method outperforms the existing state-of-the-art methods on videos of talking heads.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"18 1","pages":"12332-12341"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82436270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Accurate Estimation of Body Height From a Single Depth Image via a Four-Stage Developing Network 基于四阶段显影网络的单幅深度图像中人体高度的精确估计
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00829
Fukun Yin, Shizhe Zhou
Non-contact measurement of human body height can be very difficult under some circumstances.In this paper we address the problem of accurately estimating the height of a person with arbitrary postures from a single depth image. By introducing a novel part-based intermediate representation plus a four-stage increasingly complex deep neural network, we manage to achieve significantly higher accuracy than previous methods. We first describe the human body in the form of a segmentation of human torso as four nearly rigid parts and then predict their lengths respectively by 3 CNNs. Instead of directly adding the lengths of these parts together, we further construct another independent developing CNN that combines the intermediate representation, part lengths and depth information together to finally predict the body height results.Here we develop an increasingly complex network architecture and adopt a hybrid pooling to optimize training process. To the best of our knowledge, this is the first method that estimates height only from a single depth image. In experiments our average accuracy reaches at 99.1% for people in various positions and postures.
在某些情况下,非接触式测量人体身高是非常困难的。在本文中,我们解决了从单个深度图像中准确估计任意姿势的人的高度的问题。通过引入一种新颖的基于零件的中间表示和一个四阶段日益复杂的深度神经网络,我们成功地实现了比以前的方法更高的精度。我们首先以人体躯干分割的形式将人体描述为四个近刚性部分,然后用3个cnn分别预测它们的长度。我们不是直接将这些部分的长度相加,而是进一步构建另一个独立发展的CNN,将中间表示、部分长度和深度信息结合在一起,最终预测出身体高度的结果。在这里,我们开发了一个日益复杂的网络架构,并采用混合池来优化训练过程。据我们所知,这是第一种仅从单个深度图像估计高度的方法。在实验中,我们对不同位置和姿势的人的平均准确率达到99.1%。
{"title":"Accurate Estimation of Body Height From a Single Depth Image via a Four-Stage Developing Network","authors":"Fukun Yin, Shizhe Zhou","doi":"10.1109/cvpr42600.2020.00829","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00829","url":null,"abstract":"Non-contact measurement of human body height can be very difficult under some circumstances.In this paper we address the problem of accurately estimating the height of a person with arbitrary postures from a single depth image. By introducing a novel part-based intermediate representation plus a four-stage increasingly complex deep neural network, we manage to achieve significantly higher accuracy than previous methods. We first describe the human body in the form of a segmentation of human torso as four nearly rigid parts and then predict their lengths respectively by 3 CNNs. Instead of directly adding the lengths of these parts together, we further construct another independent developing CNN that combines the intermediate representation, part lengths and depth information together to finally predict the body height results.Here we develop an increasingly complex network architecture and adopt a hybrid pooling to optimize training process. To the best of our knowledge, this is the first method that estimates height only from a single depth image. In experiments our average accuracy reaches at 99.1% for people in various positions and postures.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"29 1","pages":"8264-8273"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81394264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Deep Facial Non-Rigid Multi-View Stereo 深度面部非刚性多视图立体
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00589
Ziqian Bai, Zhaopeng Cui, Jamal Ahmed Rahim, Xiaoming Liu, P. Tan
We present a method for 3D face reconstruction from multi-view images with different expressions. We formulate this problem from the perspective of non-rigid multi-view stereo (NRMVS). Unlike previous learning-based methods, which often regress the face shape directly, our method optimizes the 3D face shape by explicitly enforcing multi-view appearance consistency, which is known to be effective in recovering shape details according to conventional multi-view stereo methods. Furthermore, by estimating face shape through optimization based on multi-view consistency, our method can potentially have better generalization to unseen data. However, this optimization is challenging since each input image has a different expression. We facilitate it with a CNN network that learns to regularize the non-rigid 3D face according to the input image and preliminary optimization results. Extensive experiments show that our method achieves the state-of-the-art performance on various datasets and generalizes well to in-the-wild data.
提出了一种基于不同表情的多视图图像的三维人脸重建方法。我们从非刚性多视点立体(NRMVS)的角度来阐述这个问题。与以往基于学习的方法直接回归人脸形状不同,我们的方法通过显式强制多视图外观一致性来优化3D人脸形状,这在传统多视图立体方法中可以有效地恢复形状细节。此外,通过基于多视图一致性的优化估计脸型,我们的方法可以更好地泛化未知数据。然而,这种优化是具有挑战性的,因为每个输入图像都有不同的表达式。我们使用CNN网络来促进它,该网络根据输入图像和初步优化结果学习正则化非刚性3D面。大量的实验表明,我们的方法在各种数据集上都达到了最先进的性能,并且可以很好地推广到野外数据。
{"title":"Deep Facial Non-Rigid Multi-View Stereo","authors":"Ziqian Bai, Zhaopeng Cui, Jamal Ahmed Rahim, Xiaoming Liu, P. Tan","doi":"10.1109/cvpr42600.2020.00589","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00589","url":null,"abstract":"We present a method for 3D face reconstruction from multi-view images with different expressions. We formulate this problem from the perspective of non-rigid multi-view stereo (NRMVS). Unlike previous learning-based methods, which often regress the face shape directly, our method optimizes the 3D face shape by explicitly enforcing multi-view appearance consistency, which is known to be effective in recovering shape details according to conventional multi-view stereo methods. Furthermore, by estimating face shape through optimization based on multi-view consistency, our method can potentially have better generalization to unseen data. However, this optimization is challenging since each input image has a different expression. We facilitate it with a CNN network that learns to regularize the non-rigid 3D face according to the input image and preliminary optimization results. Extensive experiments show that our method achieves the state-of-the-art performance on various datasets and generalizes well to in-the-wild data.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"65 1","pages":"5849-5859"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78682524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
Reverse Perspective Network for Perspective-Aware Object Counting 透视感知对象计数的逆向透视网络
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00443
Yifan Yang, Guorong Li, Zhe Wu, Li Su, Qingming Huang, N. Sebe
One of the critical challenges of object counting is the dramatic scale variations, which is introduced by arbitrary perspectives. We propose a reverse perspective network to solve the scale variations of input images, instead of generating perspective maps to smooth final outputs. The reverse perspective network explicitly evaluates the perspective distortions, and efficiently corrects the distortions by uniformly warping the input images. Then the proposed network delivers images with similar instance scales to the regressor. Thus the regression network doesn't need multi-scale receptive fields to match the various scales. Besides, to further solve the scale problem of more congested areas, we enhance the corresponding regions of ground-truth with the evaluation errors. Then we force the regressor to learn from the augmented ground-truth via an adversarial process. Furthermore, to verify the proposed model, we collected a vehicle counting dataset based on Unmanned Aerial Vehicles (UAVs). The proposed dataset has fierce scale variations. Extensive experimental results on four benchmark datasets show the improvements of our method against the state-of-the-arts.
物体计数的关键挑战之一是由任意视角引入的戏剧性尺度变化。我们提出了一个反向透视网络来解决输入图像的比例变化,而不是生成透视映射来平滑最终输出。反向透视网络明确地评估透视畸变,并通过对输入图像进行均匀翘曲来有效地纠正畸变。然后,该网络提供与回归器具有相似实例尺度的图像。因此,回归网络不需要多尺度接受野来匹配不同的尺度。此外,为了进一步解决更拥挤地区的尺度问题,我们对具有评价误差的地真值相应区域进行了增强。然后,我们强迫回归器通过对抗过程从增强的基础真理中学习。此外,为了验证所提出的模型,我们收集了一个基于无人机(uav)的车辆计数数据集。所提出的数据集具有强烈的尺度变化。在四个基准数据集上的大量实验结果表明,我们的方法对最先进的方法进行了改进。
{"title":"Reverse Perspective Network for Perspective-Aware Object Counting","authors":"Yifan Yang, Guorong Li, Zhe Wu, Li Su, Qingming Huang, N. Sebe","doi":"10.1109/cvpr42600.2020.00443","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00443","url":null,"abstract":"One of the critical challenges of object counting is the dramatic scale variations, which is introduced by arbitrary perspectives. We propose a reverse perspective network to solve the scale variations of input images, instead of generating perspective maps to smooth final outputs. The reverse perspective network explicitly evaluates the perspective distortions, and efficiently corrects the distortions by uniformly warping the input images. Then the proposed network delivers images with similar instance scales to the regressor. Thus the regression network doesn't need multi-scale receptive fields to match the various scales. Besides, to further solve the scale problem of more congested areas, we enhance the corresponding regions of ground-truth with the evaluation errors. Then we force the regressor to learn from the augmented ground-truth via an adversarial process. Furthermore, to verify the proposed model, we collected a vehicle counting dataset based on Unmanned Aerial Vehicles (UAVs). The proposed dataset has fierce scale variations. Extensive experimental results on four benchmark datasets show the improvements of our method against the state-of-the-arts.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"96 1","pages":"4373-4382"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78733276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 96
Few-Shot Learning of Part-Specific Probability Space for 3D Shape Segmentation 三维形状分割中局部特定概率空间的少镜头学习
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00456
Lingjing Wang, Xiang Li, Yi Fang
Recently, deep neural networks are introduced as supervised discriminative models for the learning of 3D point cloud segmentation. Most previous supervised methods require a large number of training data with human annotation part labels to guide the training process to ensure the model's generalization abilities on test data. In comparison, we propose a novel 3D shape segmentation method that requires few labeled data for training. Given an input 3D shape, the training of our model starts with identifying a similar 3D shape with part annotations from a mini-pool of shape templates (e.g. 10 shapes). With the selected template shape, a novel Coherent Point Transformer is proposed to fully leverage the power of a deep neural network to smoothly morph the template shape towards the input shape. Then, based on the transformed template shapes with part labels, a newly proposed Part-specific Density Estimator is developed to learn a continuous part-specific probability distribution function on the entire 3D space with a batch consistency regularization term. With the learned part-specific probability distribution, our model is able to predict the part labels of a new input 3D shape in an end-to-end manner. We demonstrate that our proposed method can achieve remarkable segmentation results on the ShapeNet dataset with few shots, compared to previous supervised learning approaches.
近年来,深度神经网络作为监督判别模型被引入到三维点云分割的学习中。以往的监督方法大多需要大量带有人工标注部分标签的训练数据来指导训练过程,以保证模型对测试数据的泛化能力。相比之下,我们提出了一种新的三维形状分割方法,需要很少的标记数据进行训练。给定一个输入的3D形状,我们的模型的训练从一个形状模板的小池(例如10个形状)中识别一个相似的3D形状开始。在选择模板形状的基础上,提出了一种新的相干点变压器,以充分利用深度神经网络的力量使模板形状平滑地向输入形状变形。然后,基于转换后的带有零件标签的模板形状,提出了一种新的零件特定密度估计器,用于学习整个三维空间上具有批一致性正则化项的连续零件特定概率分布函数。利用学习到的零件特定概率分布,我们的模型能够以端到端的方式预测新输入3D形状的零件标签。我们证明,与之前的监督学习方法相比,我们提出的方法可以在ShapeNet数据集上获得显着的分割结果。
{"title":"Few-Shot Learning of Part-Specific Probability Space for 3D Shape Segmentation","authors":"Lingjing Wang, Xiang Li, Yi Fang","doi":"10.1109/cvpr42600.2020.00456","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00456","url":null,"abstract":"Recently, deep neural networks are introduced as supervised discriminative models for the learning of 3D point cloud segmentation. Most previous supervised methods require a large number of training data with human annotation part labels to guide the training process to ensure the model's generalization abilities on test data. In comparison, we propose a novel 3D shape segmentation method that requires few labeled data for training. Given an input 3D shape, the training of our model starts with identifying a similar 3D shape with part annotations from a mini-pool of shape templates (e.g. 10 shapes). With the selected template shape, a novel Coherent Point Transformer is proposed to fully leverage the power of a deep neural network to smoothly morph the template shape towards the input shape. Then, based on the transformed template shapes with part labels, a newly proposed Part-specific Density Estimator is developed to learn a continuous part-specific probability distribution function on the entire 3D space with a batch consistency regularization term. With the learned part-specific probability distribution, our model is able to predict the part labels of a new input 3D shape in an end-to-end manner. We demonstrate that our proposed method can achieve remarkable segmentation results on the ShapeNet dataset with few shots, compared to previous supervised learning approaches.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"110 1","pages":"4503-4512"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87930548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
ARShadowGAN: Shadow Generative Adversarial Network for Augmented Reality in Single Light Scenes ARShadowGAN:用于单光场景增强现实的阴影生成对抗网络
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00816
Daquan Liu, Chengjiang Long, Hongpan Zhang, Hanning Yu, Xinzhi Dong, Chunxia Xiao
Generating virtual object shadows consistent with the real-world environment shading effects is important but challenging in computer vision and augmented reality applications. To address this problem, we propose an end-to-end Generative Adversarial Network for shadow generation named ARShadowGAN for augmented reality in single light scenes. Our ARShadowGAN makes full use of attention mechanism and is able to directly model the mapping relation between the virtual object shadow and the real-world environment without any explicit estimation of the illumination and 3D geometric information. In addition, we collect an image set which provides rich clues for shadow generation and construct a dataset for training and evaluating our proposed ARShadowGAN. The extensive experimental results show that our proposed ARShadowGAN is capable of directly generating plausible virtual object shadows in single light scenes. Our source code is available at https://github.com/ldq9526/ARShadowGAN.
在计算机视觉和增强现实应用中,生成与现实环境阴影效果一致的虚拟物体阴影非常重要,但也具有挑战性。为了解决这个问题,我们提出了一个端到端的生成对抗网络,用于阴影生成,名为ARShadowGAN,用于增强现实中的单光场景。我们的ARShadowGAN充分利用了注意机制,能够直接建模虚拟物体阴影与现实环境之间的映射关系,而无需显式估计照明和三维几何信息。此外,我们收集了一个图像集,为阴影生成提供了丰富的线索,并构建了一个数据集,用于训练和评估我们提出的ARShadowGAN。大量的实验结果表明,我们提出的ARShadowGAN能够在单光场景中直接生成逼真的虚拟物体阴影。我们的源代码可从https://github.com/ldq9526/ARShadowGAN获得。
{"title":"ARShadowGAN: Shadow Generative Adversarial Network for Augmented Reality in Single Light Scenes","authors":"Daquan Liu, Chengjiang Long, Hongpan Zhang, Hanning Yu, Xinzhi Dong, Chunxia Xiao","doi":"10.1109/cvpr42600.2020.00816","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00816","url":null,"abstract":"Generating virtual object shadows consistent with the real-world environment shading effects is important but challenging in computer vision and augmented reality applications. To address this problem, we propose an end-to-end Generative Adversarial Network for shadow generation named ARShadowGAN for augmented reality in single light scenes. Our ARShadowGAN makes full use of attention mechanism and is able to directly model the mapping relation between the virtual object shadow and the real-world environment without any explicit estimation of the illumination and 3D geometric information. In addition, we collect an image set which provides rich clues for shadow generation and construct a dataset for training and evaluating our proposed ARShadowGAN. The extensive experimental results show that our proposed ARShadowGAN is capable of directly generating plausible virtual object shadows in single light scenes. Our source code is available at https://github.com/ldq9526/ARShadowGAN.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"34 1","pages":"8136-8145"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86861866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 71
Fantastic Answers and Where to Find Them: Immersive Question-Directed Visual Attention 奇妙的答案和在哪里找到它们:沉浸式问题导向的视觉注意
Pub Date : 2020-06-01 DOI: 10.1109/cvpr42600.2020.00305
Ming Jiang, Shi Chen, Jinhui Yang, Qi Zhao
While most visual attention studies focus on bottom-up attention with restricted field-of-view, real-life situations are filled with embodied vision tasks. The role of attention is more significant in the latter due to the information overload, and attention to the most important regions is critical to the success of tasks. The effects of visual attention on task performance in this context have also been widely ignored. This research addresses a number of challenges to bridge this research gap, on both the data and model aspects. Specifically, we introduce the first dataset of top-down attention in immersive scenes. The Immersive Question-directed Visual Attention (IQVA) dataset features visual attention and corresponding task performance (i.e., answer correctness). It consists of 975 questions and answers collected from people viewing 360° videos in a head-mounted display. Analyses of the data demonstrate a significant correlation between people's task performance and their eye movements, suggesting the role of attention in task performance. With that, a neural network is developed to encode the differences of correct and incorrect attention and jointly predict the two. The proposed attention model for the first time takes into account answer correctness, whose outputs naturally distinguish important regions from distractions. This study with new data and features may enable new tasks that leverage attention and answer correctness, and inspire new research that reveals the process behind decision making in performing various tasks.
虽然大多数视觉注意研究集中在有限视野下的自下而上的注意,但现实生活中充满了具身视觉任务。由于信息过载,注意力在后者中的作用更为显著,对最重要区域的关注对任务的成功至关重要。在这种情况下,视觉注意对任务表现的影响也被广泛忽视。本研究在数据和模型方面解决了一些挑战,以弥合这一研究差距。具体来说,我们在沉浸式场景中引入了第一个自上而下的注意力数据集。沉浸式问题导向视觉注意(IQVA)数据集具有视觉注意和相应的任务表现(即答案正确性)。它由975个问题和答案组成,这些问题和答案是通过头戴式显示器观看360°视频的人收集的。对数据的分析表明,人们的任务表现和他们的眼球运动之间存在显著的相关性,这表明了注意力在任务表现中的作用。在此基础上,利用神经网络对正确注意和错误注意的差异进行编码,并对两者进行联合预测。提出的注意力模型首次考虑了答案的正确性,其输出自然区分重要区域和干扰。这项研究有了新的数据和特征,可能会产生新的任务,利用注意力和答案的正确性,并激发新的研究,揭示在执行各种任务的决策背后的过程。
{"title":"Fantastic Answers and Where to Find Them: Immersive Question-Directed Visual Attention","authors":"Ming Jiang, Shi Chen, Jinhui Yang, Qi Zhao","doi":"10.1109/cvpr42600.2020.00305","DOIUrl":"https://doi.org/10.1109/cvpr42600.2020.00305","url":null,"abstract":"While most visual attention studies focus on bottom-up attention with restricted field-of-view, real-life situations are filled with embodied vision tasks. The role of attention is more significant in the latter due to the information overload, and attention to the most important regions is critical to the success of tasks. The effects of visual attention on task performance in this context have also been widely ignored. This research addresses a number of challenges to bridge this research gap, on both the data and model aspects. Specifically, we introduce the first dataset of top-down attention in immersive scenes. The Immersive Question-directed Visual Attention (IQVA) dataset features visual attention and corresponding task performance (i.e., answer correctness). It consists of 975 questions and answers collected from people viewing 360° videos in a head-mounted display. Analyses of the data demonstrate a significant correlation between people's task performance and their eye movements, suggesting the role of attention in task performance. With that, a neural network is developed to encode the differences of correct and incorrect attention and jointly predict the two. The proposed attention model for the first time takes into account answer correctness, whose outputs naturally distinguish important regions from distractions. This study with new data and features may enable new tasks that leverage attention and answer correctness, and inspire new research that reveals the process behind decision making in performing various tasks.","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"179 1","pages":"2977-2986"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86225766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1