首页 > 最新文献

2019 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

英文 中文
Skeleton-Aware 3D Human Shape Reconstruction From Point Clouds 基于点云的骨骼感知三维人体形状重建
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00553
Haiyong Jiang, Jianfei Cai, Jianmin Zheng
This work addresses the problem of 3D human shape reconstruction from point clouds. Considering that human shapes are of high dimensions and with large articulations, we adopt the state-of-the-art parametric human body model, SMPL, to reduce the dimension of learning space and generate smooth and valid reconstruction. However, SMPL parameters, especially pose parameters, are not easy to learn because of ambiguity and locality of the pose representation. Thus, we propose to incorporate skeleton awareness into the deep learning based regression of SMPL parameters for 3D human shape reconstruction. Our basic idea is to use the state-of-the-art technique PointNet++ to extract point features, and then map point features to skeleton joint features and finally to SMPL parameters for the reconstruction from point clouds. Particularly, we develop an end-to-end framework, where we propose a graph aggregation module to augment PointNet++ by extracting better point features, an attention module to better map unordered point features into ordered skeleton joint features, and a skeleton graph module to extract better joint features for SMPL parameter regression. The entire framework network is first trained in an end-to-end manner on synthesized dataset, and then online fine-tuned on unseen dataset with unsupervised loss to bridges gaps between training and testing. The experiments on multiple datasets show that our method is on par with the state-of-the-art solution.
这项工作解决了从点云重建三维人体形状的问题。考虑到人体形状的高维、大关节,我们采用最先进的参数化人体模型SMPL来降低学习空间的维数,生成平滑有效的重构。然而,由于姿态表示的模糊性和局域性,SMPL参数尤其是姿态参数不容易学习。因此,我们建议将骨骼感知纳入基于深度学习的SMPL参数回归中,用于三维人体形状重建。我们的基本思路是使用最先进的PointNet++技术提取点特征,然后将点特征映射到骨架关节特征,最后映射到SMPL参数,从点云进行重建。特别地,我们开发了一个端到端框架,其中我们提出了一个图聚合模块,通过提取更好的点特征来增强PointNet++,一个注意模块,以更好地将无序点特征映射到有序骨架连接特征,以及一个骨架图模块,以提取更好的连接特征用于SMPL参数回归。整个框架网络首先以端到端的方式在合成数据集上进行训练,然后在未见过的数据集上进行在线微调,以弥补训练和测试之间的差距。在多个数据集上的实验表明,我们的方法与最先进的解决方案相当。
{"title":"Skeleton-Aware 3D Human Shape Reconstruction From Point Clouds","authors":"Haiyong Jiang, Jianfei Cai, Jianmin Zheng","doi":"10.1109/ICCV.2019.00553","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00553","url":null,"abstract":"This work addresses the problem of 3D human shape reconstruction from point clouds. Considering that human shapes are of high dimensions and with large articulations, we adopt the state-of-the-art parametric human body model, SMPL, to reduce the dimension of learning space and generate smooth and valid reconstruction. However, SMPL parameters, especially pose parameters, are not easy to learn because of ambiguity and locality of the pose representation. Thus, we propose to incorporate skeleton awareness into the deep learning based regression of SMPL parameters for 3D human shape reconstruction. Our basic idea is to use the state-of-the-art technique PointNet++ to extract point features, and then map point features to skeleton joint features and finally to SMPL parameters for the reconstruction from point clouds. Particularly, we develop an end-to-end framework, where we propose a graph aggregation module to augment PointNet++ by extracting better point features, an attention module to better map unordered point features into ordered skeleton joint features, and a skeleton graph module to extract better joint features for SMPL parameter regression. The entire framework network is first trained in an end-to-end manner on synthesized dataset, and then online fine-tuned on unseen dataset with unsupervised loss to bridges gaps between training and testing. The experiments on multiple datasets show that our method is on par with the state-of-the-art solution.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"74 1","pages":"5430-5440"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86156986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
A Dataset of Multi-Illumination Images in the Wild 野外多光照图像数据集
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00418
Lukas Murmann, Michaël Gharbi, M. Aittala, F. Durand
Collections of images under a single, uncontrolled illumination have enabled the rapid advancement of core computer vision tasks like classification, detection, and segmentation. But even with modern learning techniques, many inverse problems involving lighting and material understanding remain too severely ill-posed to be solved with single-illumination datasets. The data simply does not contain the necessary supervisory signals. Multi-illumination datasets are notoriously hard to capture, so the data is typically collected at small scale, in controlled environments, either using multiple light sources, or robotic gantries. This leads to image collections that are not representative of the variety and complexity of real world scenes. We introduce a new multi-illumination dataset of more than 1000 real scenes, each captured in high dynamic range and high resolution, under 25 lighting conditions. We demonstrate the richness of this dataset by training state-of-the-art models for three challenging applications: single-image illumination estimation, image relighting, and mixed-illuminant white balance.
在单一的,不受控制的照明下的图像集合使得核心计算机视觉任务如分类,检测和分割的快速发展成为可能。但是,即使使用现代学习技术,许多涉及照明和材料理解的逆问题仍然过于严重不适定,无法用单照明数据集来解决。这些数据根本不包含必要的监控信号。众所周知,多照明数据集很难捕获,因此数据通常是在小规模的、受控的环境中收集的,要么使用多个光源,要么使用机器人龙门。这导致图像集合不能代表真实世界场景的多样性和复杂性。我们引入了一个新的多照明数据集,其中包含超过1000个真实场景,每个场景在25种照明条件下以高动态范围和高分辨率捕获。我们通过为三个具有挑战性的应用训练最先进的模型来展示该数据集的丰富性:单图像照明估计,图像重照明和混合光源白平衡。
{"title":"A Dataset of Multi-Illumination Images in the Wild","authors":"Lukas Murmann, Michaël Gharbi, M. Aittala, F. Durand","doi":"10.1109/ICCV.2019.00418","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00418","url":null,"abstract":"Collections of images under a single, uncontrolled illumination have enabled the rapid advancement of core computer vision tasks like classification, detection, and segmentation. But even with modern learning techniques, many inverse problems involving lighting and material understanding remain too severely ill-posed to be solved with single-illumination datasets. The data simply does not contain the necessary supervisory signals. Multi-illumination datasets are notoriously hard to capture, so the data is typically collected at small scale, in controlled environments, either using multiple light sources, or robotic gantries. This leads to image collections that are not representative of the variety and complexity of real world scenes. We introduce a new multi-illumination dataset of more than 1000 real scenes, each captured in high dynamic range and high resolution, under 25 lighting conditions. We demonstrate the richness of this dataset by training state-of-the-art models for three challenging applications: single-image illumination estimation, image relighting, and mixed-illuminant white balance.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"108 1","pages":"4079-4088"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77220678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
FW-GAN: Flow-Navigated Warping GAN for Video Virtual Try-On FW-GAN:流导航扭曲GAN视频虚拟试戴
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00125
Haoye Dong, Xiaodan Liang, Xiaohui Shen, Bowen Wu, Bing-cheng Chen, Jian Yin
Beyond current image-based virtual try-on systems that have attracted increasing attention, we move a step forward to developing a video virtual try-on system that precisely transfers clothes onto the person and generates visually realistic videos conditioned on arbitrary poses. Besides the challenges in image-based virtual try-on (e.g., clothes fidelity, image synthesis), video virtual try-on further requires spatiotemporal consistency. Directly adopting existing image-based approaches often fails to generate coherent video with natural and realistic textures. In this work, we propose Flow-navigated Warping Generative Adversarial Network (FW-GAN), a novel framework that learns to synthesize the video of virtual try-on based on a person image, the desired clothes image, and a series of target poses. FW-GAN aims to synthesize the coherent and natural video while manipulating the pose and clothes. It consists of: (i) a flow-guided fusion module that warps the past frames to assist synthesis, which is also adopted in the discriminator to help enhance the coherence and quality of the synthesized video; (ii) a warping net that is designed to warp clothes image for the refinement of clothes textures; (iii) a parsing constraint loss that alleviates the problem caused by the misalignment of segmentation maps from images with different poses and various clothes. Experiments on our newly collected dataset show that FW-GAN can synthesize high-quality video of virtual try-on and significantly outperforms other methods both qualitatively and quantitatively.
除了目前吸引越来越多关注的基于图像的虚拟试穿系统之外,我们又向前迈出了一步,开发了一种视频虚拟试穿系统,可以精确地将衣服转移到人身上,并根据任意姿势生成视觉上逼真的视频。除了基于图像的虚拟试穿(如服装保真度、图像合成)面临的挑战外,视频虚拟试穿还需要时空一致性。直接采用现有的基于图像的方法往往无法生成具有自然逼真纹理的连贯视频。在这项工作中,我们提出了Flow-navigated warp Generative Adversarial Network (FW-GAN),这是一个基于人的图像、所需的衣服图像和一系列目标姿势来学习合成虚拟试穿视频的新框架。FW-GAN的目标是在操纵姿势和服装的同时合成连贯自然的视频。它包括:(1)流引导融合模块,该模块对过去的帧进行扭曲以辅助合成,鉴别器也采用了流引导融合模块,以帮助增强合成视频的连贯性和质量;(ii)经翘曲网,其设计用于经翘曲衣服图像以改善衣服纹理;(iii)解析约束损失,缓解了不同姿态、不同衣服的图像分割图不对齐的问题。在新收集的数据集上的实验表明,FW-GAN可以合成高质量的虚拟试穿视频,并且在定性和定量上都明显优于其他方法。
{"title":"FW-GAN: Flow-Navigated Warping GAN for Video Virtual Try-On","authors":"Haoye Dong, Xiaodan Liang, Xiaohui Shen, Bowen Wu, Bing-cheng Chen, Jian Yin","doi":"10.1109/ICCV.2019.00125","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00125","url":null,"abstract":"Beyond current image-based virtual try-on systems that have attracted increasing attention, we move a step forward to developing a video virtual try-on system that precisely transfers clothes onto the person and generates visually realistic videos conditioned on arbitrary poses. Besides the challenges in image-based virtual try-on (e.g., clothes fidelity, image synthesis), video virtual try-on further requires spatiotemporal consistency. Directly adopting existing image-based approaches often fails to generate coherent video with natural and realistic textures. In this work, we propose Flow-navigated Warping Generative Adversarial Network (FW-GAN), a novel framework that learns to synthesize the video of virtual try-on based on a person image, the desired clothes image, and a series of target poses. FW-GAN aims to synthesize the coherent and natural video while manipulating the pose and clothes. It consists of: (i) a flow-guided fusion module that warps the past frames to assist synthesis, which is also adopted in the discriminator to help enhance the coherence and quality of the synthesized video; (ii) a warping net that is designed to warp clothes image for the refinement of clothes textures; (iii) a parsing constraint loss that alleviates the problem caused by the misalignment of segmentation maps from images with different poses and various clothes. Experiments on our newly collected dataset show that FW-GAN can synthesize high-quality video of virtual try-on and significantly outperforms other methods both qualitatively and quantitatively.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"1161-1170"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88902256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
Deep Single-Image Portrait Relighting 深单图像肖像重照明
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00729
Hao Zhou, Sunil Hadap, Kalyan Sunkavalli, D. Jacobs
Conventional physically-based methods for relighting portrait images need to solve an inverse rendering problem, estimating face geometry, reflectance and lighting. However, the inaccurate estimation of face components can cause strong artifacts in relighting, leading to unsatisfactory results. In this work, we apply a physically-based portrait relighting method to generate a large scale, high quality, “in the wild” portrait relighting dataset (DPR). A deep Convolutional Neural Network (CNN) is then trained using this dataset to generate a relit portrait image by using a source image and a target lighting as input. The training procedure regularizes the generated results, removing the artifacts caused by physically-based relighting methods. A GAN loss is further applied to improve the quality of the relit portrait image. Our trained network can relight portrait images with resolutions as high as 1024 × 1024. We evaluate the proposed method on the proposed DPR datset, Flickr portrait dataset and Multi-PIE dataset both qualitatively and quantitatively. Our experiments demonstrate that the proposed method achieves state-of-the-art results. Please refer to https://zhhoper.github.io/dpr.html for dataset and code.
传统的基于物理的重照明人像图像的方法需要解决一个反向渲染问题,估计人脸的几何形状、反射率和光照。然而,人脸分量的不准确估计会在重光照中产生强烈的伪影,从而导致不满意的结果。在这项工作中,我们应用基于物理的肖像重光照方法来生成大规模,高质量的“野外”肖像重光照数据集(DPR)。然后使用该数据集训练深度卷积神经网络(CNN),通过使用源图像和目标照明作为输入来生成逼真的肖像图像。训练过程对生成的结果进行正则化,去除由基于物理的重光照方法引起的伪影。在此基础上,进一步应用GAN损失来提高图像质量。我们训练的网络可以重亮分辨率高达1024 × 1024的肖像图像。我们对所提出的DPR数据集、Flickr肖像数据集和Multi-PIE数据集进行了定性和定量的评价。我们的实验表明,所提出的方法达到了最先进的结果。请参考https://zhhoper.github.io/dpr.html获取数据集和代码。
{"title":"Deep Single-Image Portrait Relighting","authors":"Hao Zhou, Sunil Hadap, Kalyan Sunkavalli, D. Jacobs","doi":"10.1109/ICCV.2019.00729","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00729","url":null,"abstract":"Conventional physically-based methods for relighting portrait images need to solve an inverse rendering problem, estimating face geometry, reflectance and lighting. However, the inaccurate estimation of face components can cause strong artifacts in relighting, leading to unsatisfactory results. In this work, we apply a physically-based portrait relighting method to generate a large scale, high quality, “in the wild” portrait relighting dataset (DPR). A deep Convolutional Neural Network (CNN) is then trained using this dataset to generate a relit portrait image by using a source image and a target lighting as input. The training procedure regularizes the generated results, removing the artifacts caused by physically-based relighting methods. A GAN loss is further applied to improve the quality of the relit portrait image. Our trained network can relight portrait images with resolutions as high as 1024 × 1024. We evaluate the proposed method on the proposed DPR datset, Flickr portrait dataset and Multi-PIE dataset both qualitatively and quantitatively. Our experiments demonstrate that the proposed method achieves state-of-the-art results. Please refer to https://zhhoper.github.io/dpr.html for dataset and code.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"4 1","pages":"7193-7201"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90566788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 154
Meta-Learning to Detect Rare Objects 检测稀有对象的元学习
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.01002
Yu-Xiong Wang, Deva Ramanan, M. Hebert
Few-shot learning, i.e., learning novel concepts from few examples, is fundamental to practical visual recognition systems. While most of existing work has focused on few-shot classification, we make a step towards few-shot object detection, a more challenging yet under-explored task. We develop a conceptually simple but powerful meta-learning based framework that simultaneously tackles few-shot classification and few-shot localization in a unified, coherent way. This framework leverages meta-level knowledge about "model parameter generation" from base classes with abundant data to facilitate the generation of a detector for novel classes. Our key insight is to disentangle the learning of category-agnostic and category-specific components in a CNN based detection model. In particular, we introduce a weight prediction meta-model that enables predicting the parameters of category-specific components from few examples. We systematically benchmark the performance of modern detectors in the small-sample size regime. Experiments in a variety of realistic scenarios, including within-domain, cross-domain, and long-tailed settings, demonstrate the effectiveness and generality of our approach under different notions of novel classes.
few -shot学习,即从少数例子中学习新概念,是实际视觉识别系统的基础。虽然大多数现有工作都集中在少镜头分类上,但我们向少镜头目标检测迈出了一步,这是一个更具挑战性但尚未被探索的任务。我们开发了一个概念简单但功能强大的基于元学习的框架,以统一连贯的方式同时处理少量分类和少量定位。该框架利用了关于“模型参数生成”的元级知识,这些知识来自具有丰富数据的基类,以促进新类检测器的生成。我们的关键见解是在基于CNN的检测模型中解开类别不可知论和类别特定组件的学习。特别地,我们引入了一个权重预测元模型,可以从几个例子中预测特定类别组件的参数。我们系统地测试了现代探测器在小样本量下的性能。在各种现实场景下的实验,包括域内、跨域和长尾设置,证明了我们的方法在不同新类概念下的有效性和普遍性。
{"title":"Meta-Learning to Detect Rare Objects","authors":"Yu-Xiong Wang, Deva Ramanan, M. Hebert","doi":"10.1109/ICCV.2019.01002","DOIUrl":"https://doi.org/10.1109/ICCV.2019.01002","url":null,"abstract":"Few-shot learning, i.e., learning novel concepts from few examples, is fundamental to practical visual recognition systems. While most of existing work has focused on few-shot classification, we make a step towards few-shot object detection, a more challenging yet under-explored task. We develop a conceptually simple but powerful meta-learning based framework that simultaneously tackles few-shot classification and few-shot localization in a unified, coherent way. This framework leverages meta-level knowledge about \"model parameter generation\" from base classes with abundant data to facilitate the generation of a detector for novel classes. Our key insight is to disentangle the learning of category-agnostic and category-specific components in a CNN based detection model. In particular, we introduce a weight prediction meta-model that enables predicting the parameters of category-specific components from few examples. We systematically benchmark the performance of modern detectors in the small-sample size regime. Experiments in a variety of realistic scenarios, including within-domain, cross-domain, and long-tailed settings, demonstrate the effectiveness and generality of our approach under different notions of novel classes.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"92 1","pages":"9924-9933"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85682020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 223
Stochastic Exposure Coding for Handling Multi-ToF-Camera Interference 随机曝光编码处理多tof相机干扰
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00797
Jongho Lee, Mohit Gupta
As continuous-wave time-of-flight (C-ToF) cameras become popular in 3D imaging applications, they need to contend with the problem of multi-camera interference (MCI). In a multi-camera environment, a ToF camera may receive light from the sources of other cameras, resulting in large depth errors. In this paper, we propose stochastic exposure coding (SEC), a novel approach for mitigating. SEC involves dividing a camera's integration time into multiple slots, and switching the camera off and on stochastically during each slot. This approach has two benefits. First, by appropriately choosing the on probability for each slot, the camera can effectively filter out both the AC and DC components of interfering signals, thereby mitigating depth errors while also maintaining high signal-to-noise ratio. This enables high accuracy depth recovery with low power consumption. Second, this approach can be implemented without modifying the C-ToF camera's coding functions, and thus, can be used with a wide range of cameras with minimal changes. We demonstrate the performance benefits of SEC with theoretical analysis, simulations and real experiments, across a wide range of imaging scenarios.
随着连续波飞行时间(C-ToF)相机在3D成像应用中的普及,它们需要解决多相机干扰(MCI)问题。在多相机环境中,ToF相机可能会接收到来自其他相机光源的光,从而导致较大的深度误差。在本文中,我们提出了随机暴露编码(SEC),一种新的缓解方法。SEC包括将摄像机的集成时间划分为多个插槽,并在每个插槽随机切换摄像机的开关。这种方法有两个好处。首先,通过适当选择每个插槽的导通概率,相机可以有效滤除干扰信号的交流和直流分量,从而减轻深度误差,同时保持较高的信噪比。这可以在低功耗的情况下实现高精度深度恢复。其次,这种方法可以在不修改C-ToF相机编码功能的情况下实现,因此,可以在很小的变化下与广泛的相机一起使用。我们通过理论分析、模拟和实际实验,在广泛的成像场景中展示了SEC的性能优势。
{"title":"Stochastic Exposure Coding for Handling Multi-ToF-Camera Interference","authors":"Jongho Lee, Mohit Gupta","doi":"10.1109/ICCV.2019.00797","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00797","url":null,"abstract":"As continuous-wave time-of-flight (C-ToF) cameras become popular in 3D imaging applications, they need to contend with the problem of multi-camera interference (MCI). In a multi-camera environment, a ToF camera may receive light from the sources of other cameras, resulting in large depth errors. In this paper, we propose stochastic exposure coding (SEC), a novel approach for mitigating. SEC involves dividing a camera's integration time into multiple slots, and switching the camera off and on stochastically during each slot. This approach has two benefits. First, by appropriately choosing the on probability for each slot, the camera can effectively filter out both the AC and DC components of interfering signals, thereby mitigating depth errors while also maintaining high signal-to-noise ratio. This enables high accuracy depth recovery with low power consumption. Second, this approach can be implemented without modifying the C-ToF camera's coding functions, and thus, can be used with a wide range of cameras with minimal changes. We demonstrate the performance benefits of SEC with theoretical analysis, simulations and real experiments, across a wide range of imaging scenarios.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"201 1","pages":"7879-7887"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88864205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
On the Over-Smoothing Problem of CNN Based Disparity Estimation 基于CNN视差估计的过平滑问题研究
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00909
Chuangrong Chen, Xiaozhi Chen, Hui Cheng
Currently, most deep learning based disparity estimation methods have the problem of over-smoothing at boundaries, which is unfavorable for some applications such as point cloud segmentation, mapping, etc. To address this problem, we first analyze the potential causes and observe that the estimated disparity at edge boundary pixels usually follows multimodal distributions, causing over-smoothing estimation. Based on this observation, we propose a single-modal weighted average operation on the probability distribution during inference, which can alleviate the problem effectively. To integrate the constraint of this inference method into training stage, we further analyze the characteristics of different loss functions and found that using cross entropy with gaussian distribution consistently further improves the performance. For quantitative evaluation, we propose a novel metric that measures the disparity error in the local structure of edge boundaries. Experiments on various datasets using various networks show our method's effectiveness and general applicability. Code will be available at https://github.com/chenchr/otosp.
目前,大多数基于深度学习的视差估计方法都存在边界过度平滑的问题,这不利于点云分割、映射等应用。为了解决这个问题,我们首先分析了可能的原因,并观察到估计的边缘边界像素的视差通常遵循多模态分布,导致过度平滑估计。在此基础上,我们提出了在推理过程中对概率分布进行单模态加权平均运算,可以有效地缓解这一问题。为了将该推理方法的约束整合到训练阶段,我们进一步分析了不同损失函数的特征,发现一致地使用高斯分布的交叉熵进一步提高了性能。为了定量评价,我们提出了一种新的度量方法来测量边缘边界局部结构的视差误差。在不同数据集、不同网络上的实验表明了该方法的有效性和通用性。代码将在https://github.com/chenchr/otosp上提供。
{"title":"On the Over-Smoothing Problem of CNN Based Disparity Estimation","authors":"Chuangrong Chen, Xiaozhi Chen, Hui Cheng","doi":"10.1109/ICCV.2019.00909","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00909","url":null,"abstract":"Currently, most deep learning based disparity estimation methods have the problem of over-smoothing at boundaries, which is unfavorable for some applications such as point cloud segmentation, mapping, etc. To address this problem, we first analyze the potential causes and observe that the estimated disparity at edge boundary pixels usually follows multimodal distributions, causing over-smoothing estimation. Based on this observation, we propose a single-modal weighted average operation on the probability distribution during inference, which can alleviate the problem effectively. To integrate the constraint of this inference method into training stage, we further analyze the characteristics of different loss functions and found that using cross entropy with gaussian distribution consistently further improves the performance. For quantitative evaluation, we propose a novel metric that measures the disparity error in the local structure of edge boundaries. Experiments on various datasets using various networks show our method's effectiveness and general applicability. Code will be available at https://github.com/chenchr/otosp.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"8996-9004"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89098054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Spatial Correspondence With Generative Adversarial Network: Learning Depth From Monocular Videos 空间对应与生成对抗网络:从单目视频学习深度
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00759
Zhenyao Wu, Xinyi Wu, Xiaoping Zhang, Song Wang, L. Ju
Depth estimation from monocular videos has important applications in many areas such as autonomous driving and robot navigation. It is a very challenging problem without knowing the camera pose since errors in camera-pose estimation can significantly affect the video-based depth estimation accuracy. In this paper, we present a novel SC-GAN network with end-to-end adversarial training for depth estimation from monocular videos without estimating the camera pose and pose change over time. To exploit cross-frame relations, SC-GAN includes a spatial correspondence module which uses Smolyak sparse grids to efficiently match the features across adjacent frames, and an attention mechanism to learn the importance of features in different directions. Furthermore, the generator in SC-GAN learns to estimate depth from the input frames, while the discriminator learns to distinguish between the ground-truth and estimated depth map for the reference frame. Experiments on the KITTI and Cityscapes datasets show that the proposed SC-GAN can achieve much more accurate depth maps than many existing state-of-the-art methods on monocular videos.
单目视频深度估计在自动驾驶、机器人导航等领域有着重要的应用。由于摄像机姿态估计的误差会严重影响基于视频的深度估计精度,因此在不知道摄像机姿态的情况下进行深度估计是一个非常具有挑战性的问题。在本文中,我们提出了一种新的SC-GAN网络,该网络具有端到端对抗性训练,用于从单目视频中进行深度估计,而无需估计相机姿势和姿势随时间的变化。为了利用跨帧关系,SC-GAN包括一个空间对应模块,该模块使用Smolyak稀疏网格来有效匹配相邻帧之间的特征,以及一个注意机制来学习不同方向上特征的重要性。此外,SC-GAN中的生成器学习从输入帧中估计深度,而鉴别器学习区分参考帧的真实深度图和估计深度图。在KITTI和cityscape数据集上的实验表明,所提出的SC-GAN可以在单目视频上获得比许多现有最先进的方法更精确的深度图。
{"title":"Spatial Correspondence With Generative Adversarial Network: Learning Depth From Monocular Videos","authors":"Zhenyao Wu, Xinyi Wu, Xiaoping Zhang, Song Wang, L. Ju","doi":"10.1109/ICCV.2019.00759","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00759","url":null,"abstract":"Depth estimation from monocular videos has important applications in many areas such as autonomous driving and robot navigation. It is a very challenging problem without knowing the camera pose since errors in camera-pose estimation can significantly affect the video-based depth estimation accuracy. In this paper, we present a novel SC-GAN network with end-to-end adversarial training for depth estimation from monocular videos without estimating the camera pose and pose change over time. To exploit cross-frame relations, SC-GAN includes a spatial correspondence module which uses Smolyak sparse grids to efficiently match the features across adjacent frames, and an attention mechanism to learn the importance of features in different directions. Furthermore, the generator in SC-GAN learns to estimate depth from the input frames, while the discriminator learns to distinguish between the ground-truth and estimated depth map for the reference frame. Experiments on the KITTI and Cityscapes datasets show that the proposed SC-GAN can achieve much more accurate depth maps than many existing state-of-the-art methods on monocular videos.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"55 1 1","pages":"7493-7503"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86070249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Neural Turtle Graphics for Modeling City Road Layouts 用于建模城市道路布局的神经龟图形
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00462
Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, A. Torralba, S. Fidler
We propose Neural Turtle Graphics (NTG), a novel generative model for spatial graphs, and demonstrate its applications in modeling city road layouts. Specifically, we represent the road layout using a graph where nodes in the graph represent control points and edges in the graph represents road segments. NTG is a sequential generative model parameterized by a neural network. It iteratively generates a new node and an edge connecting to an existing node conditioned on the current graph. We train NTG on Open Street Map data and show it outperforms existing approaches using a set of diverse performance metrics. Moreover, our method allows users to control styles of generated road layouts mimicking existing cities as well as to sketch a part of the city road layout to be synthesized. In addition to synthesis, the proposed NTG finds uses in an analytical task of aerial road parsing. Experimental results show that it achieves state-of-the-art performance on the SpaceNet dataset.
本文提出了一种新的空间图形生成模型——神经龟图形(Neural Turtle Graphics, NTG),并展示了其在城市道路布局建模中的应用。具体来说,我们使用图来表示道路布局,图中的节点表示控制点,图中的边表示道路段。NTG是一个由神经网络参数化的序列生成模型。它迭代地生成一个新节点和一条连接到当前图中已有节点的边。我们在开放街道地图数据上训练NTG,并使用一组不同的性能指标证明它优于现有的方法。此外,我们的方法允许用户控制模仿现有城市的生成道路布局的样式,以及绘制要合成的城市道路布局的一部分。除了综合之外,所提出的NTG在空中道路解析的分析任务中也有应用。实验结果表明,该方法在SpaceNet数据集上达到了最先进的性能。
{"title":"Neural Turtle Graphics for Modeling City Road Layouts","authors":"Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, A. Torralba, S. Fidler","doi":"10.1109/ICCV.2019.00462","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00462","url":null,"abstract":"We propose Neural Turtle Graphics (NTG), a novel generative model for spatial graphs, and demonstrate its applications in modeling city road layouts. Specifically, we represent the road layout using a graph where nodes in the graph represent control points and edges in the graph represents road segments. NTG is a sequential generative model parameterized by a neural network. It iteratively generates a new node and an edge connecting to an existing node conditioned on the current graph. We train NTG on Open Street Map data and show it outperforms existing approaches using a set of diverse performance metrics. Moreover, our method allows users to control styles of generated road layouts mimicking existing cities as well as to sketch a part of the city road layout to be synthesized. In addition to synthesis, the proposed NTG finds uses in an analytical task of aerial road parsing. Experimental results show that it achieves state-of-the-art performance on the SpaceNet dataset.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"47 1","pages":"4521-4529"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86399993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Adaptive Activation Thresholding: Dynamic Routing Type Behavior for Interpretability in Convolutional Neural Networks 自适应激活阈值:卷积神经网络中可解释性的动态路由类型行为
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00504
Yiyou Sun, Sathya Ravi, Vikas Singh
There is a growing interest in strategies that can help us understand or interpret neural networks -- that is, not merely provide a prediction, but also offer additional context explaining why and how. While many current methods offer tools to perform this analysis for a given (trained) network post-hoc, recent results (especially on capsule networks) suggest that when classes map to a few high level ``concepts'' in the preceding layers of the network, the behavior of the network is easier to interpret or explain. Such training may be accomplished via dynamic/EM routing where the network ``routes'' for individual classes (or subsets of images) are dynamic and involve few nodes even if the full network may not be sparse. In this paper, we show how a simple modification of the SGD scheme can help provide dynamic/EM routing type behavior in convolutional neural networks. Through extensive experiments, we evaluate the effect of this idea for interpretability where we obtain promising results, while also showing that no compromise in attainable accuracy is involved. Further, we show that the minor modification is seemingly ad-hoc, the new algorithm can be analyzed by an approximate method which provably matches known rates for SGD.
人们对能够帮助我们理解或解释神经网络的策略越来越感兴趣——也就是说,不仅提供预测,还提供解释原因和方式的额外背景。虽然许多当前的方法提供了工具来对给定的(训练过的)网络执行这种分析,但最近的结果(特别是在胶囊网络上)表明,当类映射到网络前几层中的一些高级“概念”时,网络的行为更容易解释或解释。这种训练可以通过动态/EM路由完成,其中单个类(或图像子集)的网络“路由”是动态的,即使整个网络可能不是稀疏的,也涉及很少的节点。在本文中,我们展示了SGD方案的简单修改如何有助于在卷积神经网络中提供动态/EM路由类型行为。通过广泛的实验,我们评估了这一想法对可解释性的影响,我们获得了有希望的结果,同时也表明在可达到的准确性方面没有妥协。此外,我们表明,微小的修改似乎是临时的,新算法可以用一种近似方法来分析,该方法可以证明与已知的SGD速率相匹配。
{"title":"Adaptive Activation Thresholding: Dynamic Routing Type Behavior for Interpretability in Convolutional Neural Networks","authors":"Yiyou Sun, Sathya Ravi, Vikas Singh","doi":"10.1109/ICCV.2019.00504","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00504","url":null,"abstract":"There is a growing interest in strategies that can help us understand or interpret neural networks -- that is, not merely provide a prediction, but also offer additional context explaining why and how. While many current methods offer tools to perform this analysis for a given (trained) network post-hoc, recent results (especially on capsule networks) suggest that when classes map to a few high level ``concepts'' in the preceding layers of the network, the behavior of the network is easier to interpret or explain. Such training may be accomplished via dynamic/EM routing where the network ``routes'' for individual classes (or subsets of images) are dynamic and involve few nodes even if the full network may not be sparse. In this paper, we show how a simple modification of the SGD scheme can help provide dynamic/EM routing type behavior in convolutional neural networks. Through extensive experiments, we evaluate the effect of this idea for interpretability where we obtain promising results, while also showing that no compromise in attainable accuracy is involved. Further, we show that the minor modification is seemingly ad-hoc, the new algorithm can be analyzed by an approximate method which provably matches known rates for SGD.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"18 1","pages":"4937-4946"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81281123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1