首页 > 最新文献

2019 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

英文 中文
View Independent Generative Adversarial Network for Novel View Synthesis 新颖视点合成的视点独立生成对抗网络
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00788
Xiaogang Xu, Ying-Cong Chen, Jiaya Jia
Synthesizing novel views from a 2D image requires to infer 3D structure and project it back to 2D from a new viewpoint. In this paper, we propose an encoder-decoder based generative adversarial network VI-GAN to tackle this problem. Our method is to let the network, after seeing many images of objects belonging to the same category in different views, obtain essential knowledge of intrinsic properties of the objects. To this end, an encoder is designed to extract view-independent feature that characterizes intrinsic properties of the input image, which includes 3D structure, color, texture etc. We also make the decoder hallucinate the image of a novel view based on the extracted feature and an arbitrary user-specific camera pose. Extensive experiments demonstrate that our model can synthesize high-quality images in different views with continuous camera poses, and is general for various applications.
从2D图像合成新视图需要推断3D结构,并从新的视点将其投影回2D。在本文中,我们提出了一个基于编码器-解码器的生成对抗网络VI-GAN来解决这个问题。我们的方法是让网络在不同的视图中看到属于同一类别的物体的许多图像后,获得物体内在属性的本质知识。为此,设计了一种编码器,用于提取与视图无关的特征,这些特征表征了输入图像的内在属性,包括3D结构、颜色、纹理等。我们还使解码器根据提取的特征和任意用户特定的相机姿势产生新视图的图像。大量实验表明,该模型可以在不同视角下合成高质量的连续相机姿态图像,适用于各种应用。
{"title":"View Independent Generative Adversarial Network for Novel View Synthesis","authors":"Xiaogang Xu, Ying-Cong Chen, Jiaya Jia","doi":"10.1109/ICCV.2019.00788","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00788","url":null,"abstract":"Synthesizing novel views from a 2D image requires to infer 3D structure and project it back to 2D from a new viewpoint. In this paper, we propose an encoder-decoder based generative adversarial network VI-GAN to tackle this problem. Our method is to let the network, after seeing many images of objects belonging to the same category in different views, obtain essential knowledge of intrinsic properties of the objects. To this end, an encoder is designed to extract view-independent feature that characterizes intrinsic properties of the input image, which includes 3D structure, color, texture etc. We also make the decoder hallucinate the image of a novel view based on the extracted feature and an arbitrary user-specific camera pose. Extensive experiments demonstrate that our model can synthesize high-quality images in different views with continuous camera poses, and is general for various applications.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"5 1","pages":"7790-7799"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73300554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Probabilistic Deep Ordinal Regression Based on Gaussian Processes 基于高斯过程的概率深度有序回归
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00540
Yanzhu Liu, Fan Wang, A. Kong
With excellent representation power for complex data, deep neural networks (DNNs) based approaches are state-of-the-art for ordinal regression problem which aims to classify instances into ordinal categories. However, DNNs are not able to capture uncertainties and produce probabilistic interpretations. As a probabilistic model, Gaussian Processes (GPs) on the other hand offers uncertainty information, which is nonetheless lack of scalability for large datasets. This paper adapts traditional GPs regression for ordinal regression problem by using both conjugate and non-conjugate ordinal likelihood. Based on that, it proposes a deep neural network with a GPs layer on the top, which is trained end-to-end by the stochastic gradient descent method for both neural network parameters and GPs parameters. The parameters in the ordinal likelihood function are learned as neural network parameters so that the proposed framework is able to produce fitted likelihood functions for training sets and make probabilistic predictions for test points. Experimental results on three real-world benchmarks -- image aesthetics rating, historical image grading and age group estimation -- demonstrate that in terms of mean absolute error, the proposed approach outperforms state-of-the-art ordinal regression approaches and provides the confidence for predictions.
基于深度神经网络(dnn)的方法具有对复杂数据的出色表示能力,是目前最先进的有序回归问题,旨在将实例分类为有序类别。然而,深度神经网络不能捕捉不确定性并产生概率解释。另一方面,作为一种概率模型,高斯过程(GPs)提供了不确定性信息,但对于大型数据集缺乏可扩展性。本文利用共轭序似然和非共轭序似然将传统的GPs回归应用于有序回归问题。在此基础上,提出了一种顶部有GPs层的深度神经网络,采用随机梯度下降法对神经网络参数和GPs参数进行端到端训练。将有序似然函数中的参数作为神经网络参数学习,使所提出的框架能够为训练集生成拟合的似然函数,并对测试点进行概率预测。在三个现实世界基准上的实验结果——图像美学评级、历史图像分级和年龄组估计——表明,就平均绝对误差而言,所提出的方法优于最先进的有序回归方法,并为预测提供了信心。
{"title":"Probabilistic Deep Ordinal Regression Based on Gaussian Processes","authors":"Yanzhu Liu, Fan Wang, A. Kong","doi":"10.1109/ICCV.2019.00540","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00540","url":null,"abstract":"With excellent representation power for complex data, deep neural networks (DNNs) based approaches are state-of-the-art for ordinal regression problem which aims to classify instances into ordinal categories. However, DNNs are not able to capture uncertainties and produce probabilistic interpretations. As a probabilistic model, Gaussian Processes (GPs) on the other hand offers uncertainty information, which is nonetheless lack of scalability for large datasets. This paper adapts traditional GPs regression for ordinal regression problem by using both conjugate and non-conjugate ordinal likelihood. Based on that, it proposes a deep neural network with a GPs layer on the top, which is trained end-to-end by the stochastic gradient descent method for both neural network parameters and GPs parameters. The parameters in the ordinal likelihood function are learned as neural network parameters so that the proposed framework is able to produce fitted likelihood functions for training sets and make probabilistic predictions for test points. Experimental results on three real-world benchmarks -- image aesthetics rating, historical image grading and age group estimation -- demonstrate that in terms of mean absolute error, the proposed approach outperforms state-of-the-art ordinal regression approaches and provides the confidence for predictions.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"48 1","pages":"5300-5308"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82134279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Joint Prediction for Kinematic Trajectories in Vehicle-Pedestrian-Mixed Scenes 车辆-行人-混合场景运动轨迹联合预测
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.01048
Huikun Bi, Zhong Fang, Tianlu Mao, Zhaoqi Wang, Z. Deng
Trajectory prediction for objects is challenging and critical for various applications (e.g., autonomous driving, and anomaly detection). Most of the existing methods focus on homogeneous pedestrian trajectories prediction, where pedestrians are treated as particles without size. However, they fall short of handling crowded vehicle-pedestrian-mixed scenes directly since vehicles, limited with kinematics in reality, should be treated as rigid, non-particle objects ideally. In this paper, we tackle this problem using separate LSTMs for heterogeneous vehicles and pedestrians. Specifically, we use an oriented bounding box to represent each vehicle, calculated based on its position and orientation, to denote its kinematic trajectories. We then propose a framework called VP-LSTM to predict the kinematic trajectories of both vehicles and pedestrians simultaneously. In order to evaluate our model, a large dataset containing the trajectories of both vehicles and pedestrians in vehicle-pedestrian-mixed scenes is specially built. Through comparisons between our method with state-of-the-art approaches, we show the effectiveness and advantages of our method on kinematic trajectories prediction in vehicle-pedestrian-mixed scenes.
物体的轨迹预测对于各种应用(例如,自动驾驶和异常检测)具有挑战性和关键性。现有的方法大多侧重于同质行人轨迹预测,将行人视为没有大小的粒子。然而,它们无法直接处理拥挤的车辆-行人混合场景,因为车辆在现实中受到运动学的限制,理想情况下应将其视为刚性的非粒子物体。在本文中,我们针对异构车辆和行人使用单独的lstm来解决这个问题。具体来说,我们使用一个定向的边界框来表示每个车辆,根据其位置和方向计算,以表示其运动轨迹。然后,我们提出了一个称为VP-LSTM的框架来同时预测车辆和行人的运动轨迹。为了评估我们的模型,专门建立了一个包含车辆和行人在车辆-行人混合场景中的轨迹的大型数据集。通过与现有方法的比较,我们展示了我们的方法在车辆-行人混合场景中运动轨迹预测的有效性和优势。
{"title":"Joint Prediction for Kinematic Trajectories in Vehicle-Pedestrian-Mixed Scenes","authors":"Huikun Bi, Zhong Fang, Tianlu Mao, Zhaoqi Wang, Z. Deng","doi":"10.1109/ICCV.2019.01048","DOIUrl":"https://doi.org/10.1109/ICCV.2019.01048","url":null,"abstract":"Trajectory prediction for objects is challenging and critical for various applications (e.g., autonomous driving, and anomaly detection). Most of the existing methods focus on homogeneous pedestrian trajectories prediction, where pedestrians are treated as particles without size. However, they fall short of handling crowded vehicle-pedestrian-mixed scenes directly since vehicles, limited with kinematics in reality, should be treated as rigid, non-particle objects ideally. In this paper, we tackle this problem using separate LSTMs for heterogeneous vehicles and pedestrians. Specifically, we use an oriented bounding box to represent each vehicle, calculated based on its position and orientation, to denote its kinematic trajectories. We then propose a framework called VP-LSTM to predict the kinematic trajectories of both vehicles and pedestrians simultaneously. In order to evaluate our model, a large dataset containing the trajectories of both vehicles and pedestrians in vehicle-pedestrian-mixed scenes is specially built. Through comparisons between our method with state-of-the-art approaches, we show the effectiveness and advantages of our method on kinematic trajectories prediction in vehicle-pedestrian-mixed scenes.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"51 1","pages":"10382-10391"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82267191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
GEOBIT: A Geodesic-Based Binary Descriptor Invariant to Non-Rigid Deformations for RGB-D Images GEOBIT:基于测地线的RGB-D图像非刚性变形二值描述符不变性
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.01010
Erickson R. Nascimento, Guilherme A. Potje, Renato Martins, Felipe C. Chamone, M. Campos, R. Bajcsy
At the core of most three-dimensional alignment and tracking tasks resides the critical problem of point correspondence. In this context, the design of descriptors that efficiently and uniquely identifies keypoints, to be matched, is of central importance. Numerous descriptors have been developed for dealing with affine/perspective warps, but few can also handle non-rigid deformations. In this paper, we introduce a novel binary RGB-D descriptor invariant to isometric deformations. Our method uses geodesic isocurves on smooth textured manifolds. It combines appearance and geometric information from RGB-D images to tackle non-rigid transformations. We used our descriptor to track multiple textured depth maps and demonstrate that it produces reliable feature descriptors even in the presence of strong non-rigid deformations and depth noise. The experiments show that our descriptor outperforms different state-of-the-art descriptors in both precision-recall and recognition rate metrics. We also provide to the community a new dataset composed of annotated RGB-D images of different objects (shirts, cloths, paintings, bags), subjected to strong non-rigid deformations, to evaluate point correspondence algorithms.
在大多数三维对准和跟踪任务的核心是点对应的关键问题。在这种情况下,设计有效且唯一地标识要匹配的关键点的描述符是至关重要的。已经开发了许多描述符来处理仿射/透视变形,但很少有描述符可以处理非刚性变形。在本文中,我们引入了一种新的二元RGB-D描述不变量。我们的方法在光滑纹理流形上使用测地线等曲线。它结合了来自RGB-D图像的外观和几何信息来处理非刚性转换。我们使用描述符来跟踪多个纹理深度图,并证明即使在存在强烈的非刚性变形和深度噪声的情况下,它也能产生可靠的特征描述符。实验表明,我们的描述符在精确召回率和识别率指标上都优于其他最先进的描述符。我们还向社区提供了一个新的数据集,该数据集由不同物体(衬衫,衣服,绘画,袋子)的注释RGB-D图像组成,受到强烈的非刚性变形,以评估点对应算法。
{"title":"GEOBIT: A Geodesic-Based Binary Descriptor Invariant to Non-Rigid Deformations for RGB-D Images","authors":"Erickson R. Nascimento, Guilherme A. Potje, Renato Martins, Felipe C. Chamone, M. Campos, R. Bajcsy","doi":"10.1109/ICCV.2019.01010","DOIUrl":"https://doi.org/10.1109/ICCV.2019.01010","url":null,"abstract":"At the core of most three-dimensional alignment and tracking tasks resides the critical problem of point correspondence. In this context, the design of descriptors that efficiently and uniquely identifies keypoints, to be matched, is of central importance. Numerous descriptors have been developed for dealing with affine/perspective warps, but few can also handle non-rigid deformations. In this paper, we introduce a novel binary RGB-D descriptor invariant to isometric deformations. Our method uses geodesic isocurves on smooth textured manifolds. It combines appearance and geometric information from RGB-D images to tackle non-rigid transformations. We used our descriptor to track multiple textured depth maps and demonstrate that it produces reliable feature descriptors even in the presence of strong non-rigid deformations and depth noise. The experiments show that our descriptor outperforms different state-of-the-art descriptors in both precision-recall and recognition rate metrics. We also provide to the community a new dataset composed of annotated RGB-D images of different objects (shirts, cloths, paintings, bags), subjected to strong non-rigid deformations, to evaluate point correspondence algorithms.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"85 1","pages":"10003-10011"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76050172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Agile Depth Sensing Using Triangulation Light Curtains 使用三角光幕的敏捷深度传感
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00799
Joseph R. Bartels, Jian Wang, W. Whittaker, S. Narasimhan
Depth sensors like LIDARs and Kinect use a fixed depth acquisition strategy that is independent of the scene of interest. Due to the low spatial and temporal resolution of these sensors, this strategy can undersample parts of the scene that are important (small or fast moving objects), or oversample areas that are not informative for the task at hand (a fixed planar wall). In this paper, we present an approach and system to dynamically and adaptively sample the depths of a scene using the principle of triangulation light curtains. The approach directly detects the presence or absence of objects at specified 3D lines. These 3D lines can be sampled sparsely, non-uniformly, or densely only at specified regions. The depth sampling can be varied in real-time, enabling quick object discovery or detailed exploration of areas of interest. These results are achieved using a novel prototype light curtain system that is based on a 2D rolling shutter camera with higher light efficiency, working range, and faster adaptation than previous work, making it useful broadly for autonomous navigation and exploration.
像lidar和Kinect这样的深度传感器使用独立于感兴趣的场景的固定深度获取策略。由于这些传感器的空间和时间分辨率较低,这种策略可能会对场景中重要的部分(小或快速移动的物体)进行欠采样,或者对手头任务(固定的平面墙)没有信息的区域进行过采样。本文提出了一种利用三角光幕原理对场景深度进行动态自适应采样的方法和系统。该方法直接检测指定3D线上物体的存在或不存在。这些3D线可以稀疏采样,不均匀采样,或者只在指定区域密集采样。深度采样可以实时变化,可以快速发现目标或对感兴趣的区域进行详细探索。这些结果是通过一种基于2D卷帘式相机的新型原型光幕系统实现的,该系统具有更高的光效、工作范围和比以前工作更快的适应性,使其在自主导航和探索中广泛应用。
{"title":"Agile Depth Sensing Using Triangulation Light Curtains","authors":"Joseph R. Bartels, Jian Wang, W. Whittaker, S. Narasimhan","doi":"10.1109/ICCV.2019.00799","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00799","url":null,"abstract":"Depth sensors like LIDARs and Kinect use a fixed depth acquisition strategy that is independent of the scene of interest. Due to the low spatial and temporal resolution of these sensors, this strategy can undersample parts of the scene that are important (small or fast moving objects), or oversample areas that are not informative for the task at hand (a fixed planar wall). In this paper, we present an approach and system to dynamically and adaptively sample the depths of a scene using the principle of triangulation light curtains. The approach directly detects the presence or absence of objects at specified 3D lines. These 3D lines can be sampled sparsely, non-uniformly, or densely only at specified regions. The depth sampling can be varied in real-time, enabling quick object discovery or detailed exploration of areas of interest. These results are achieved using a novel prototype light curtain system that is based on a 2D rolling shutter camera with higher light efficiency, working range, and faster adaptation than previous work, making it useful broadly for autonomous navigation and exploration.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"8 1","pages":"7899-7907"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87530700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Stacked Cross Refinement Network for Edge-Aware Salient Object Detection 边缘感知显著目标检测的堆叠交叉细化网络
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00736
Zhe Wu, Li Su, Qingming Huang
Salient object detection is a fundamental computer vision task. The majority of existing algorithms focus on aggregating multi-level features of pre-trained convolutional neural networks. Moreover, some researchers attempt to utilize edge information for auxiliary training. However, existing edge-aware models design unidirectional frameworks which only use edge features to improve the segmentation features. Motivated by the logical interrelations between binary segmentation and edge maps, we propose a novel Stacked Cross Refinement Network (SCRN) for salient object detection in this paper. Our framework aims to simultaneously refine multi-level features of salient object detection and edge detection by stacking Cross Refinement Unit (CRU). According to the logical interrelations, the CRU designs two direction-specific integration operations, and bidirectionally passes messages between the two tasks. Incorporating the refined edge-preserving features with the typical U-Net, our model detects salient objects accurately. Extensive experiments conducted on six benchmark datasets demonstrate that our method outperforms existing state-of-the-art algorithms in both accuracy and efficiency. Besides, the attribute-based performance on the SOC dataset show that the proposed model ranks first in the majority of challenging scenes. Code can be found at https://github.com/wuzhe71/SCAN.
显著目标检测是一项基本的计算机视觉任务。现有的卷积神经网络算法主要集中在对预训练卷积神经网络的多层特征进行聚合。此外,一些研究者试图利用边缘信息进行辅助训练。然而,现有的边缘感知模型设计了单向框架,仅利用边缘特征来改进分割特征。基于二值分割与边缘映射之间的逻辑关系,提出了一种新的用于显著目标检测的堆叠交叉细化网络(SCRN)。我们的框架旨在通过叠加交叉细化单元(Cross Refinement Unit, CRU)来同时细化显著目标检测和边缘检测的多层次特征。CRU根据逻辑关系设计两个特定方向的集成操作,并在两个任务之间双向传递消息。该模型将改进的边缘保持特征与典型的U-Net相结合,能够准确地检测出显著目标。在六个基准数据集上进行的大量实验表明,我们的方法在准确性和效率方面都优于现有的最先进算法。此外,基于属性的SOC数据集性能表明,该模型在大多数具有挑战性的场景中排名第一。代码可以在https://github.com/wuzhe71/SCAN上找到。
{"title":"Stacked Cross Refinement Network for Edge-Aware Salient Object Detection","authors":"Zhe Wu, Li Su, Qingming Huang","doi":"10.1109/ICCV.2019.00736","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00736","url":null,"abstract":"Salient object detection is a fundamental computer vision task. The majority of existing algorithms focus on aggregating multi-level features of pre-trained convolutional neural networks. Moreover, some researchers attempt to utilize edge information for auxiliary training. However, existing edge-aware models design unidirectional frameworks which only use edge features to improve the segmentation features. Motivated by the logical interrelations between binary segmentation and edge maps, we propose a novel Stacked Cross Refinement Network (SCRN) for salient object detection in this paper. Our framework aims to simultaneously refine multi-level features of salient object detection and edge detection by stacking Cross Refinement Unit (CRU). According to the logical interrelations, the CRU designs two direction-specific integration operations, and bidirectionally passes messages between the two tasks. Incorporating the refined edge-preserving features with the typical U-Net, our model detects salient objects accurately. Extensive experiments conducted on six benchmark datasets demonstrate that our method outperforms existing state-of-the-art algorithms in both accuracy and efficiency. Besides, the attribute-based performance on the SOC dataset show that the proposed model ranks first in the majority of challenging scenes. Code can be found at https://github.com/wuzhe71/SCAN.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"9 1","pages":"7263-7272"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88062889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 277
Attribute Manipulation Generative Adversarial Networks for Fashion Images 时尚图像属性操作生成对抗网络
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.01064
Kenan E. Ak, A. Kassim, Joo-Hwee Lim, J. Y. Tham
Recent advances in Generative Adversarial Networks (GANs) have made it possible to conduct multi-domain image-to-image translation using a single generative network. While recent methods such as Ganimation and SaGAN are able to conduct translations on attribute-relevant regions using attention, they do not perform well when the number of attributes increases as the training of attention masks mostly rely on classification losses. To address this and other limitations, we introduce Attribute Manipulation Generative Adversarial Networks (AMGAN) for fashion images. While AMGAN's generator network uses class activation maps (CAMs) to empower its attention mechanism, it also exploits perceptual losses by assigning reference (target) images based on attribute similarities. AMGAN incorporates an additional discriminator network that focuses on attribute-relevant regions to detect unrealistic translations. Additionally, AMGAN can be controlled to perform attribute manipulations on specific regions such as the sleeve or torso regions. Experiments show that AMGAN outperforms state-of-the-art methods using traditional evaluation metrics as well as an alternative one that is based on image retrieval.
生成对抗网络(GANs)的最新进展使得使用单个生成网络进行多域图像到图像的翻译成为可能。虽然最近的方法,如Ganimation和SaGAN能够使用注意力在属性相关区域上进行翻译,但当属性数量增加时,它们表现不佳,因为注意力掩模的训练主要依赖于分类损失。为了解决这个问题和其他限制,我们为时尚图像引入了属性操作生成对抗网络(AMGAN)。虽然AMGAN的生成器网络使用类激活图(CAMs)来增强其注意机制,但它也通过基于属性相似性分配参考(目标)图像来利用感知损失。AMGAN结合了一个额外的判别器网络,该网络专注于属性相关区域,以检测不现实的翻译。此外,可以控制AMGAN对特定区域(如袖子或躯干区域)执行属性操作。实验表明,AMGAN优于使用传统评估指标的最先进方法以及基于图像检索的替代方法。
{"title":"Attribute Manipulation Generative Adversarial Networks for Fashion Images","authors":"Kenan E. Ak, A. Kassim, Joo-Hwee Lim, J. Y. Tham","doi":"10.1109/ICCV.2019.01064","DOIUrl":"https://doi.org/10.1109/ICCV.2019.01064","url":null,"abstract":"Recent advances in Generative Adversarial Networks (GANs) have made it possible to conduct multi-domain image-to-image translation using a single generative network. While recent methods such as Ganimation and SaGAN are able to conduct translations on attribute-relevant regions using attention, they do not perform well when the number of attributes increases as the training of attention masks mostly rely on classification losses. To address this and other limitations, we introduce Attribute Manipulation Generative Adversarial Networks (AMGAN) for fashion images. While AMGAN's generator network uses class activation maps (CAMs) to empower its attention mechanism, it also exploits perceptual losses by assigning reference (target) images based on attribute similarities. AMGAN incorporates an additional discriminator network that focuses on attribute-relevant regions to detect unrealistic translations. Additionally, AMGAN can be controlled to perform attribute manipulations on specific regions such as the sleeve or torso regions. Experiments show that AMGAN outperforms state-of-the-art methods using traditional evaluation metrics as well as an alternative one that is based on image retrieval.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"24 1","pages":"10540-10549"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86518855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
Learning Joint 2D-3D Representations for Depth Completion 学习深度完成的联合2D-3D表示
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.01012
Yuxiang Chen, Binh Yang, Ming Liang, R. Urtasun
In this paper, we tackle the problem of depth completion from RGBD data. Towards this goal, we design a simple yet effective neural network block that learns to extract joint 2D and 3D features. Specifically, the block consists of two domain-specific sub-networks that apply 2D convolution on image pixels and continuous convolution on 3D points, with their output features fused in image space. We build the depth completion network simply by stacking the proposed block, which has the advantage of learning hierarchical representations that are fully fused between 2D and 3D spaces at multiple levels. We demonstrate the effectiveness of our approach on the challenging KITTI depth completion benchmark and show that our approach outperforms the state-of-the-art.
本文主要研究了基于RGBD数据的深度补全问题。为了实现这一目标,我们设计了一个简单而有效的神经网络块,学习提取关节的2D和3D特征。具体来说,该块由两个特定于领域的子网络组成,它们对图像像素进行二维卷积,对3D点进行连续卷积,并将其输出特征融合在图像空间中。我们通过简单地堆叠所提出的块来构建深度补全网络,其优点是学习在多层2D和3D空间之间完全融合的分层表示。我们证明了我们的方法在具有挑战性的KITTI深度完井基准上的有效性,并表明我们的方法优于最先进的方法。
{"title":"Learning Joint 2D-3D Representations for Depth Completion","authors":"Yuxiang Chen, Binh Yang, Ming Liang, R. Urtasun","doi":"10.1109/ICCV.2019.01012","DOIUrl":"https://doi.org/10.1109/ICCV.2019.01012","url":null,"abstract":"In this paper, we tackle the problem of depth completion from RGBD data. Towards this goal, we design a simple yet effective neural network block that learns to extract joint 2D and 3D features. Specifically, the block consists of two domain-specific sub-networks that apply 2D convolution on image pixels and continuous convolution on 3D points, with their output features fused in image space. We build the depth completion network simply by stacking the proposed block, which has the advantage of learning hierarchical representations that are fully fused between 2D and 3D spaces at multiple levels. We demonstrate the effectiveness of our approach on the challenging KITTI depth completion benchmark and show that our approach outperforms the state-of-the-art.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"21 1","pages":"10022-10031"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83955185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 127
On the Global Optima of Kernelized Adversarial Representation Learning 核化对抗表示学习的全局最优
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00806
Bashir Sadeghi, R. Yu, Vishnu Naresh Boddeti
Adversarial representation learning is a promising paradigm for obtaining data representations that are invariant to certain sensitive attributes while retaining the information necessary for predicting target attributes. Existing approaches solve this problem through iterative adversarial minimax optimization and lack theoretical guarantees. In this paper, we first study the ``linear" form of this problem i.e., the setting where all the players are linear functions. We show that the resulting optimization problem is both non-convex and non-differentiable. We obtain an exact closed-form expression for its global optima through spectral learning and provide performance guarantees in terms of analytical bounds on the achievable utility and invariance. We then extend this solution and analysis to non-linear functions through kernel representation. Numerical experiments on UCI, Extended Yale B and CIFAR-100 datasets indicate that, (a) practically, our solution is ideal for ``imparting" provable invariance to any biased pre-trained data representation, and (b) the global optima of the ``kernel" form can provide a comparable trade-off between utility and invariance in comparison to iterative minimax optimization of existing deep neural network based approaches, but with provable guarantees.
对抗性表示学习是一种很有前途的范式,它可以获得对某些敏感属性不变的数据表示,同时保留预测目标属性所需的信息。现有的方法是通过迭代的对抗性极大极小优化来解决这一问题,缺乏理论保证。本文首先研究了该问题的“线性”形式,即所有参与者都是线性函数的设置。我们证明了所得到的优化问题既非凸又不可微。我们通过谱学习得到了其全局最优解的精确封闭表达式,并从可实现效用和不变性的解析界方面提供了性能保证。然后,我们通过核表示法将该解决方案和分析扩展到非线性函数。在UCI、Extended Yale B和CIFAR-100数据集上的数值实验表明,(a)实际上,我们的解决方案非常适合“赋予”任何有偏差的预训练数据表示可证明的不变性;(B)与现有基于深度神经网络的迭代极大极小优化方法相比,“核”形式的全局最优可以在效用和不变性之间提供可比的权衡,但具有可证明的保证。
{"title":"On the Global Optima of Kernelized Adversarial Representation Learning","authors":"Bashir Sadeghi, R. Yu, Vishnu Naresh Boddeti","doi":"10.1109/ICCV.2019.00806","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00806","url":null,"abstract":"Adversarial representation learning is a promising paradigm for obtaining data representations that are invariant to certain sensitive attributes while retaining the information necessary for predicting target attributes. Existing approaches solve this problem through iterative adversarial minimax optimization and lack theoretical guarantees. In this paper, we first study the ``linear\" form of this problem i.e., the setting where all the players are linear functions. We show that the resulting optimization problem is both non-convex and non-differentiable. We obtain an exact closed-form expression for its global optima through spectral learning and provide performance guarantees in terms of analytical bounds on the achievable utility and invariance. We then extend this solution and analysis to non-linear functions through kernel representation. Numerical experiments on UCI, Extended Yale B and CIFAR-100 datasets indicate that, (a) practically, our solution is ideal for ``imparting\" provable invariance to any biased pre-trained data representation, and (b) the global optima of the ``kernel\" form can provide a comparable trade-off between utility and invariance in comparison to iterative minimax optimization of existing deep neural network based approaches, but with provable guarantees.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"22 8 1","pages":"7970-7978"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82923046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Deep Supervised Hashing With Anchor Graph 锚图的深度监督哈希
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00989
Yudong Chen, Zhihui Lai, Yujuan Ding, Kaiyi Lin, W. Wong
Recently, a series of deep supervised hashing methods were proposed for binary code learning. However, due to the high computation cost and the limited hardware's memory, these methods will first select a subset from the training set, and then form a mini-batch data to update the network in each iteration. Therefore, the remaining labeled data cannot be fully utilized and the model cannot directly obtain the binary codes of the entire training set for retrieval. To address these problems, this paper proposes an interesting regularized deep model to seamlessly integrate the advantages of deep hashing and efficient binary code learning by using the anchor graph. As such, the deep features and label matrix can be jointly used to optimize the binary codes, and the network can obtain more discriminative feedback from the linear combinations of the learned bits. Moreover, we also reveal the algorithm mechanism and its computation essence. Experiments on three large-scale datasets indicate that the proposed method achieves better retrieval performance with less training time compared to previous deep hashing methods.
近年来,人们提出了一系列用于二进制码学习的深度监督哈希方法。然而,由于计算成本高和硬件内存有限,这些方法首先从训练集中选择一个子集,然后在每次迭代中形成一个小批量数据来更新网络。因此,剩余的标记数据不能被充分利用,模型不能直接获得整个训练集的二进制码进行检索。为了解决这些问题,本文提出了一种有趣的正则化深度模型,利用锚图无缝地集成了深度哈希和高效二进制码学习的优点。因此,深度特征和标记矩阵可以共同用于优化二进制码,并且网络可以从学习到的比特的线性组合中获得更多的判别反馈。此外,我们还揭示了算法的机制和计算本质。在三个大规模数据集上的实验表明,与以往的深度哈希方法相比,该方法以更少的训练时间获得了更好的检索性能。
{"title":"Deep Supervised Hashing With Anchor Graph","authors":"Yudong Chen, Zhihui Lai, Yujuan Ding, Kaiyi Lin, W. Wong","doi":"10.1109/ICCV.2019.00989","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00989","url":null,"abstract":"Recently, a series of deep supervised hashing methods were proposed for binary code learning. However, due to the high computation cost and the limited hardware's memory, these methods will first select a subset from the training set, and then form a mini-batch data to update the network in each iteration. Therefore, the remaining labeled data cannot be fully utilized and the model cannot directly obtain the binary codes of the entire training set for retrieval. To address these problems, this paper proposes an interesting regularized deep model to seamlessly integrate the advantages of deep hashing and efficient binary code learning by using the anchor graph. As such, the deep features and label matrix can be jointly used to optimize the binary codes, and the network can obtain more discriminative feedback from the linear combinations of the learned bits. Moreover, we also reveal the algorithm mechanism and its computation essence. Experiments on three large-scale datasets indicate that the proposed method achieves better retrieval performance with less training time compared to previous deep hashing methods.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"89 1","pages":"9795-9803"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88997215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
期刊
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1