首页 > 最新文献

2019 IEEE/CVF International Conference on Computer Vision (ICCV)最新文献

英文 中文
Stochastic Exposure Coding for Handling Multi-ToF-Camera Interference 随机曝光编码处理多tof相机干扰
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00797
Jongho Lee, Mohit Gupta
As continuous-wave time-of-flight (C-ToF) cameras become popular in 3D imaging applications, they need to contend with the problem of multi-camera interference (MCI). In a multi-camera environment, a ToF camera may receive light from the sources of other cameras, resulting in large depth errors. In this paper, we propose stochastic exposure coding (SEC), a novel approach for mitigating. SEC involves dividing a camera's integration time into multiple slots, and switching the camera off and on stochastically during each slot. This approach has two benefits. First, by appropriately choosing the on probability for each slot, the camera can effectively filter out both the AC and DC components of interfering signals, thereby mitigating depth errors while also maintaining high signal-to-noise ratio. This enables high accuracy depth recovery with low power consumption. Second, this approach can be implemented without modifying the C-ToF camera's coding functions, and thus, can be used with a wide range of cameras with minimal changes. We demonstrate the performance benefits of SEC with theoretical analysis, simulations and real experiments, across a wide range of imaging scenarios.
随着连续波飞行时间(C-ToF)相机在3D成像应用中的普及,它们需要解决多相机干扰(MCI)问题。在多相机环境中,ToF相机可能会接收到来自其他相机光源的光,从而导致较大的深度误差。在本文中,我们提出了随机暴露编码(SEC),一种新的缓解方法。SEC包括将摄像机的集成时间划分为多个插槽,并在每个插槽随机切换摄像机的开关。这种方法有两个好处。首先,通过适当选择每个插槽的导通概率,相机可以有效滤除干扰信号的交流和直流分量,从而减轻深度误差,同时保持较高的信噪比。这可以在低功耗的情况下实现高精度深度恢复。其次,这种方法可以在不修改C-ToF相机编码功能的情况下实现,因此,可以在很小的变化下与广泛的相机一起使用。我们通过理论分析、模拟和实际实验,在广泛的成像场景中展示了SEC的性能优势。
{"title":"Stochastic Exposure Coding for Handling Multi-ToF-Camera Interference","authors":"Jongho Lee, Mohit Gupta","doi":"10.1109/ICCV.2019.00797","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00797","url":null,"abstract":"As continuous-wave time-of-flight (C-ToF) cameras become popular in 3D imaging applications, they need to contend with the problem of multi-camera interference (MCI). In a multi-camera environment, a ToF camera may receive light from the sources of other cameras, resulting in large depth errors. In this paper, we propose stochastic exposure coding (SEC), a novel approach for mitigating. SEC involves dividing a camera's integration time into multiple slots, and switching the camera off and on stochastically during each slot. This approach has two benefits. First, by appropriately choosing the on probability for each slot, the camera can effectively filter out both the AC and DC components of interfering signals, thereby mitigating depth errors while also maintaining high signal-to-noise ratio. This enables high accuracy depth recovery with low power consumption. Second, this approach can be implemented without modifying the C-ToF camera's coding functions, and thus, can be used with a wide range of cameras with minimal changes. We demonstrate the performance benefits of SEC with theoretical analysis, simulations and real experiments, across a wide range of imaging scenarios.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"201 1","pages":"7879-7887"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88864205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles 使用多个自主微型飞行器的无标记户外人体动作捕捉
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00091
Nitin Saini, E. Price, Rahul Tallamraju, R. Enficiaud, R. Ludwig, Igor Martinovic, Aamir Ahmad, Michael J. Black
Capturing human motion in natural scenarios means moving motion capture out of the lab and into the wild. Typical approaches rely on fixed, calibrated, cameras and reflective markers on the body, significantly limiting the motions that can be captured. To make motion capture truly unconstrained, we describe the first fully autonomous outdoor capture system based on flying vehicles. We use multiple micro-aerial-vehicles(MAVs), each equipped with a monocular RGB camera, an IMU, and a GPS receiver module. These detect the person, optimize their position, and localize themselves approximately. We then develop a markerless motion capture method that is suitable for this challenging scenario with a distant subject, viewed from above, with approximately calibrated and moving cameras. We combine multiple state-of-the-art 2D joint detectors with a 3D human body model and a powerful prior on human pose. We jointly optimize for 3D body pose and camera pose to robustly fit the 2D measurements. To our knowledge, this is the first successful demonstration of outdoor, full-body, markerless motion capture from autonomous flying vehicles.
在自然场景中捕捉人类动作意味着将动作捕捉从实验室搬到野外。典型的方法依赖于固定的、校准的相机和身体上的反射标记,这极大地限制了可以捕捉到的动作。为了使动作捕捉真正不受约束,我们描述了第一个基于飞行器的全自动户外捕捉系统。我们使用多个微型飞行器(MAVs),每个都配备了一个单目RGB相机,一个IMU和一个GPS接收器模块。它们检测人,优化他们的位置,并大致定位自己。然后,我们开发了一种无标记运动捕捉方法,适用于这种具有挑战性的场景,从上面看远处的主体,使用大约校准和移动的相机。我们将多个最先进的2D关节探测器与3D人体模型和强大的人体姿势先验相结合。我们共同优化了三维身体姿态和相机姿态,以鲁棒拟合二维测量值。据我们所知,这是第一次成功展示户外,全身,无标记的动作捕捉自动飞行车辆。
{"title":"Markerless Outdoor Human Motion Capture Using Multiple Autonomous Micro Aerial Vehicles","authors":"Nitin Saini, E. Price, Rahul Tallamraju, R. Enficiaud, R. Ludwig, Igor Martinovic, Aamir Ahmad, Michael J. Black","doi":"10.1109/ICCV.2019.00091","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00091","url":null,"abstract":"Capturing human motion in natural scenarios means moving motion capture out of the lab and into the wild. Typical approaches rely on fixed, calibrated, cameras and reflective markers on the body, significantly limiting the motions that can be captured. To make motion capture truly unconstrained, we describe the first fully autonomous outdoor capture system based on flying vehicles. We use multiple micro-aerial-vehicles(MAVs), each equipped with a monocular RGB camera, an IMU, and a GPS receiver module. These detect the person, optimize their position, and localize themselves approximately. We then develop a markerless motion capture method that is suitable for this challenging scenario with a distant subject, viewed from above, with approximately calibrated and moving cameras. We combine multiple state-of-the-art 2D joint detectors with a 3D human body model and a powerful prior on human pose. We jointly optimize for 3D body pose and camera pose to robustly fit the 2D measurements. To our knowledge, this is the first successful demonstration of outdoor, full-body, markerless motion capture from autonomous flying vehicles.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"26 1","pages":"823-832"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87004476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Deep Learning for Light Field Saliency Detection 光场显著性检测的深度学习
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00893
Tiantian Wang, Yongri Piao, Huchuan Lu, Xiao Li, Lihe Zhang
Recent research in 4D saliency detection is limited by the deficiency of a large-scale 4D light field dataset. To address this, we introduce a new dataset to assist the subsequent research in 4D light field saliency detection. To the best of our knowledge, this is to date the largest light field dataset in which the dataset provides 1465 all-focus images with human-labeled ground truth masks and the corresponding focal stacks for every light field image. To verify the effectiveness of the light field data, we first introduce a fusion framework which includes two CNN streams where the focal stacks and all-focus images serve as the input. The focal stack stream utilizes a recurrent attention mechanism to adaptively learn to integrate every slice in the focal stack, which benefits from the extracted features of the good slices. Then it is incorporated with the output map generated by the all-focus stream to make the saliency prediction. In addition, we introduce adversarial examples by adding noise intentionally into images to help train the deep network, which can improve the robustness of the proposed network. The noise is designed by users, which is imperceptible but can fool the CNNs to make the wrong prediction. Extensive experiments show the effectiveness and superiority of the proposed model on the popular evaluation metrics. The proposed method performs favorably compared with the existing 2D, 3D and 4D saliency detection methods on the proposed dataset and existing LFSD light field dataset. The code and results can be found at https://github.com/OIPLab-DUT/ ICCV2019_Deeplightfield_Saliency. Moreover, to facilitate research in this field, all images we collected are shared in a ready-to-use manner.
由于缺乏大规模的四维光场数据集,目前在四维显著性检测方面的研究受到了限制。为了解决这个问题,我们引入了一个新的数据集来辅助后续的4D光场显著性检测研究。据我们所知,这是迄今为止最大的光场数据集,其中数据集提供了1465张全聚焦图像,其中包含人工标记的地面真相掩模和每个光场图像的相应焦点堆栈。为了验证光场数据的有效性,我们首先引入了一个融合框架,该框架包括两个CNN流,其中焦点堆栈和全聚焦图像作为输入。焦点叠流利用循环注意机制自适应学习整合焦点叠中的每个切片,这得益于提取好的切片的特征。然后结合全焦点流生成的输出图进行显著性预测。此外,我们通过有意地在图像中添加噪声来引入对抗示例,以帮助训练深度网络,这可以提高所提出网络的鲁棒性。噪声是由用户设计的,它是难以察觉的,但可以欺骗cnn做出错误的预测。大量的实验证明了该模型在常用评价指标上的有效性和优越性。与现有的二维、三维和四维显著性检测方法相比,本文方法在本文数据集和现有的LFSD光场数据集上表现良好。代码和结果可以在https://github.com/OIPLab-DUT/ ICCV2019_Deeplightfield_Saliency上找到。此外,为了促进这一领域的研究,我们收集的所有图像都以现成的方式共享。
{"title":"Deep Learning for Light Field Saliency Detection","authors":"Tiantian Wang, Yongri Piao, Huchuan Lu, Xiao Li, Lihe Zhang","doi":"10.1109/ICCV.2019.00893","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00893","url":null,"abstract":"Recent research in 4D saliency detection is limited by the deficiency of a large-scale 4D light field dataset. To address this, we introduce a new dataset to assist the subsequent research in 4D light field saliency detection. To the best of our knowledge, this is to date the largest light field dataset in which the dataset provides 1465 all-focus images with human-labeled ground truth masks and the corresponding focal stacks for every light field image. To verify the effectiveness of the light field data, we first introduce a fusion framework which includes two CNN streams where the focal stacks and all-focus images serve as the input. The focal stack stream utilizes a recurrent attention mechanism to adaptively learn to integrate every slice in the focal stack, which benefits from the extracted features of the good slices. Then it is incorporated with the output map generated by the all-focus stream to make the saliency prediction. In addition, we introduce adversarial examples by adding noise intentionally into images to help train the deep network, which can improve the robustness of the proposed network. The noise is designed by users, which is imperceptible but can fool the CNNs to make the wrong prediction. Extensive experiments show the effectiveness and superiority of the proposed model on the popular evaluation metrics. The proposed method performs favorably compared with the existing 2D, 3D and 4D saliency detection methods on the proposed dataset and existing LFSD light field dataset. The code and results can be found at https://github.com/OIPLab-DUT/ ICCV2019_Deeplightfield_Saliency. Moreover, to facilitate research in this field, all images we collected are shared in a ready-to-use manner.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"22 1","pages":"8837-8847"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87386933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 79
On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos 利用单目视频增强单帧3D人体姿态估计
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00228
Zhi Li, Xuan Wang, Fei Wang, Peilin Jiang
The premise of training an accurate 3D human pose estimation network is the possession of huge amount of richly annotated training data. Nonetheless, manually obtaining rich and accurate annotations is, even not impossible, tedious and slow. In this paper, we propose to exploit monocular videos to complement the training dataset for the single-image 3D human pose estimation tasks. At the beginning, a baseline model is trained with a small set of annotations. By fixing some reliable estimations produced by the resulting model, our method automatically collects the annotations across the entire video as solving the 3D trajectory completion problem. Then, the baseline model is further trained with the collected annotations to learn the new poses. We evaluate our method on the broadly-adopted Human3.6M and MPI-INF-3DHP datasets. As illustrated in experiments, given only a small set of annotations, our method successfully makes the model to learn new poses from unlabelled monocular videos, promoting the accuracies of the baseline model by about 10%. By contrast with previous approaches, our method does not rely on either multi-view imagery or any explicit 2D keypoint annotations.
训练出一个准确的三维人体姿态估计网络的前提是拥有大量注释丰富的训练数据。尽管如此,手动获取丰富而准确的注释,即使不是不可能的,也是冗长而缓慢的。在本文中,我们提出利用单目视频来补充训练数据集,用于单图像3D人体姿态估计任务。一开始,基线模型是用一小组注释进行训练的。通过修复一些由结果模型产生的可靠估计,我们的方法自动收集整个视频中的注释,以解决3D轨迹完成问题。然后,使用收集到的注释对基线模型进行进一步训练,以学习新的姿态。我们在广泛采用的Human3.6M和MPI-INF-3DHP数据集上评估了我们的方法。实验表明,仅在少量注释的情况下,我们的方法就成功地使模型从未标记的单目视频中学习新的姿势,将基线模型的准确率提高了10%左右。与之前的方法相比,我们的方法既不依赖于多视图图像,也不依赖于任何显式的2D关键点注释。
{"title":"On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos","authors":"Zhi Li, Xuan Wang, Fei Wang, Peilin Jiang","doi":"10.1109/ICCV.2019.00228","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00228","url":null,"abstract":"The premise of training an accurate 3D human pose estimation network is the possession of huge amount of richly annotated training data. Nonetheless, manually obtaining rich and accurate annotations is, even not impossible, tedious and slow. In this paper, we propose to exploit monocular videos to complement the training dataset for the single-image 3D human pose estimation tasks. At the beginning, a baseline model is trained with a small set of annotations. By fixing some reliable estimations produced by the resulting model, our method automatically collects the annotations across the entire video as solving the 3D trajectory completion problem. Then, the baseline model is further trained with the collected annotations to learn the new poses. We evaluate our method on the broadly-adopted Human3.6M and MPI-INF-3DHP datasets. As illustrated in experiments, given only a small set of annotations, our method successfully makes the model to learn new poses from unlabelled monocular videos, promoting the accuracies of the baseline model by about 10%. By contrast with previous approaches, our method does not rely on either multi-view imagery or any explicit 2D keypoint annotations.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"45 1","pages":"2192-2201"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87726684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Convex Shape Prior for Multi-Object Segmentation Using a Single Level Set Function 基于单水平集函数的凸形状先验多目标分割
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00070
Shousheng Luo, X. Tai, Limei Huo, Yang Wang, R. Glowinski
Many objects in real world have convex shapes. It is a difficult task to have representations for convex shapes with good and fast numerical solutions. This paper proposes a method to incorporate convex shape prior for multi-object segmentation using level set method. The relationship between the convexity of the segmented objects and the signed distance function corresponding to their union is analyzed theoretically. This result is combined with Gaussian mixture method for the multiple objects segmentation with convexity shape prior. Alternating direction method of multiplier (ADMM) is adopted to solve the proposed model. Special boundary conditions are also imposed to obtain efficient algorithms for 4th order partial differential equations in one step of ADMM algorithm. In addition, our method only needs one level set function regardless of the number of objects. So the increase in the number of objects does not result in the increase of model and algorithm complexity. Various numerical experiments are illustrated to show the performance and advantages of the proposed method.
现实世界中的许多物体都具有凸形。对于凸形状,如何用快速且良好的数值解来表示是一项困难的任务。提出了一种融合凸形状先验的水平集多目标分割方法。从理论上分析了被分割对象的凸度与其并集所对应的带符号距离函数之间的关系。将此结果与高斯混合方法相结合,用于凸形先验的多目标分割。采用乘法器交替方向法(ADMM)对模型进行求解。为了在ADMM算法的一步中获得四阶偏微分方程的有效算法,还施加了特殊的边界条件。此外,无论对象的数量如何,我们的方法只需要一个级别集函数。因此,对象数量的增加并不会导致模型和算法复杂度的增加。各种数值实验表明了该方法的性能和优点。
{"title":"Convex Shape Prior for Multi-Object Segmentation Using a Single Level Set Function","authors":"Shousheng Luo, X. Tai, Limei Huo, Yang Wang, R. Glowinski","doi":"10.1109/ICCV.2019.00070","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00070","url":null,"abstract":"Many objects in real world have convex shapes. It is a difficult task to have representations for convex shapes with good and fast numerical solutions. This paper proposes a method to incorporate convex shape prior for multi-object segmentation using level set method. The relationship between the convexity of the segmented objects and the signed distance function corresponding to their union is analyzed theoretically. This result is combined with Gaussian mixture method for the multiple objects segmentation with convexity shape prior. Alternating direction method of multiplier (ADMM) is adopted to solve the proposed model. Special boundary conditions are also imposed to obtain efficient algorithms for 4th order partial differential equations in one step of ADMM algorithm. In addition, our method only needs one level set function regardless of the number of objects. So the increase in the number of objects does not result in the increase of model and algorithm complexity. Various numerical experiments are illustrated to show the performance and advantages of the proposed method.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"1 1","pages":"613-621"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90250099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
SANet: Scene Agnostic Network for Camera Localization SANet:用于摄像机定位的场景不可知网络
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00013
Luwei Yang, Ziqian Bai, Chengzhou Tang, Honghua Li, Yasutaka Furukawa, P. Tan
This paper presents a scene agnostic neural architecture for camera localization, where model parameters and scenes are independent from each other.Despite recent advancement in learning based methods, most approaches require training for each scene one by one, not applicable for online applications such as SLAM and robotic navigation, where a model must be built on-the-fly.Our approach learns to build a hierarchical scene representation and predicts a dense scene coordinate map of a query RGB image on-the-fly given an arbitrary scene. The 6D camera pose of the query image can be estimated with the predicted scene coordinate map. Additionally, the dense prediction can be used for other online robotic and AR applications such as obstacle avoidance. We demonstrate the effectiveness and efficiency of our method on both indoor and outdoor benchmarks, achieving state-of-the-art performance.
本文提出了一种场景不可知的摄像机定位神经网络体系结构,其中模型参数和场景是相互独立的。尽管基于学习的方法最近取得了进展,但大多数方法都需要对每个场景逐一进行训练,这不适用于在线应用,如SLAM和机器人导航,因为这些应用必须实时构建模型。我们的方法学习构建分层场景表示,并在给定任意场景的情况下实时预测查询RGB图像的密集场景坐标图。通过预测的场景坐标图,可以估计查询图像的6D相机姿态。此外,密集预测可用于其他在线机器人和AR应用,如避障。我们在室内和室外基准上展示了我们方法的有效性和效率,实现了最先进的性能。
{"title":"SANet: Scene Agnostic Network for Camera Localization","authors":"Luwei Yang, Ziqian Bai, Chengzhou Tang, Honghua Li, Yasutaka Furukawa, P. Tan","doi":"10.1109/ICCV.2019.00013","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00013","url":null,"abstract":"This paper presents a scene agnostic neural architecture for camera localization, where model parameters and scenes are independent from each other.Despite recent advancement in learning based methods, most approaches require training for each scene one by one, not applicable for online applications such as SLAM and robotic navigation, where a model must be built on-the-fly.Our approach learns to build a hierarchical scene representation and predicts a dense scene coordinate map of a query RGB image on-the-fly given an arbitrary scene. The 6D camera pose of the query image can be estimated with the predicted scene coordinate map. Additionally, the dense prediction can be used for other online robotic and AR applications such as obstacle avoidance. We demonstrate the effectiveness and efficiency of our method on both indoor and outdoor benchmarks, achieving state-of-the-art performance.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"21 1","pages":"42-51"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85166649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Neural Turtle Graphics for Modeling City Road Layouts 用于建模城市道路布局的神经龟图形
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00462
Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, A. Torralba, S. Fidler
We propose Neural Turtle Graphics (NTG), a novel generative model for spatial graphs, and demonstrate its applications in modeling city road layouts. Specifically, we represent the road layout using a graph where nodes in the graph represent control points and edges in the graph represents road segments. NTG is a sequential generative model parameterized by a neural network. It iteratively generates a new node and an edge connecting to an existing node conditioned on the current graph. We train NTG on Open Street Map data and show it outperforms existing approaches using a set of diverse performance metrics. Moreover, our method allows users to control styles of generated road layouts mimicking existing cities as well as to sketch a part of the city road layout to be synthesized. In addition to synthesis, the proposed NTG finds uses in an analytical task of aerial road parsing. Experimental results show that it achieves state-of-the-art performance on the SpaceNet dataset.
本文提出了一种新的空间图形生成模型——神经龟图形(Neural Turtle Graphics, NTG),并展示了其在城市道路布局建模中的应用。具体来说,我们使用图来表示道路布局,图中的节点表示控制点,图中的边表示道路段。NTG是一个由神经网络参数化的序列生成模型。它迭代地生成一个新节点和一条连接到当前图中已有节点的边。我们在开放街道地图数据上训练NTG,并使用一组不同的性能指标证明它优于现有的方法。此外,我们的方法允许用户控制模仿现有城市的生成道路布局的样式,以及绘制要合成的城市道路布局的一部分。除了综合之外,所提出的NTG在空中道路解析的分析任务中也有应用。实验结果表明,该方法在SpaceNet数据集上达到了最先进的性能。
{"title":"Neural Turtle Graphics for Modeling City Road Layouts","authors":"Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, A. Torralba, S. Fidler","doi":"10.1109/ICCV.2019.00462","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00462","url":null,"abstract":"We propose Neural Turtle Graphics (NTG), a novel generative model for spatial graphs, and demonstrate its applications in modeling city road layouts. Specifically, we represent the road layout using a graph where nodes in the graph represent control points and edges in the graph represents road segments. NTG is a sequential generative model parameterized by a neural network. It iteratively generates a new node and an edge connecting to an existing node conditioned on the current graph. We train NTG on Open Street Map data and show it outperforms existing approaches using a set of diverse performance metrics. Moreover, our method allows users to control styles of generated road layouts mimicking existing cities as well as to sketch a part of the city road layout to be synthesized. In addition to synthesis, the proposed NTG finds uses in an analytical task of aerial road parsing. Experimental results show that it achieves state-of-the-art performance on the SpaceNet dataset.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"47 1","pages":"4521-4529"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86399993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Spatial Correspondence With Generative Adversarial Network: Learning Depth From Monocular Videos 空间对应与生成对抗网络:从单目视频学习深度
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00759
Zhenyao Wu, Xinyi Wu, Xiaoping Zhang, Song Wang, L. Ju
Depth estimation from monocular videos has important applications in many areas such as autonomous driving and robot navigation. It is a very challenging problem without knowing the camera pose since errors in camera-pose estimation can significantly affect the video-based depth estimation accuracy. In this paper, we present a novel SC-GAN network with end-to-end adversarial training for depth estimation from monocular videos without estimating the camera pose and pose change over time. To exploit cross-frame relations, SC-GAN includes a spatial correspondence module which uses Smolyak sparse grids to efficiently match the features across adjacent frames, and an attention mechanism to learn the importance of features in different directions. Furthermore, the generator in SC-GAN learns to estimate depth from the input frames, while the discriminator learns to distinguish between the ground-truth and estimated depth map for the reference frame. Experiments on the KITTI and Cityscapes datasets show that the proposed SC-GAN can achieve much more accurate depth maps than many existing state-of-the-art methods on monocular videos.
单目视频深度估计在自动驾驶、机器人导航等领域有着重要的应用。由于摄像机姿态估计的误差会严重影响基于视频的深度估计精度,因此在不知道摄像机姿态的情况下进行深度估计是一个非常具有挑战性的问题。在本文中,我们提出了一种新的SC-GAN网络,该网络具有端到端对抗性训练,用于从单目视频中进行深度估计,而无需估计相机姿势和姿势随时间的变化。为了利用跨帧关系,SC-GAN包括一个空间对应模块,该模块使用Smolyak稀疏网格来有效匹配相邻帧之间的特征,以及一个注意机制来学习不同方向上特征的重要性。此外,SC-GAN中的生成器学习从输入帧中估计深度,而鉴别器学习区分参考帧的真实深度图和估计深度图。在KITTI和cityscape数据集上的实验表明,所提出的SC-GAN可以在单目视频上获得比许多现有最先进的方法更精确的深度图。
{"title":"Spatial Correspondence With Generative Adversarial Network: Learning Depth From Monocular Videos","authors":"Zhenyao Wu, Xinyi Wu, Xiaoping Zhang, Song Wang, L. Ju","doi":"10.1109/ICCV.2019.00759","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00759","url":null,"abstract":"Depth estimation from monocular videos has important applications in many areas such as autonomous driving and robot navigation. It is a very challenging problem without knowing the camera pose since errors in camera-pose estimation can significantly affect the video-based depth estimation accuracy. In this paper, we present a novel SC-GAN network with end-to-end adversarial training for depth estimation from monocular videos without estimating the camera pose and pose change over time. To exploit cross-frame relations, SC-GAN includes a spatial correspondence module which uses Smolyak sparse grids to efficiently match the features across adjacent frames, and an attention mechanism to learn the importance of features in different directions. Furthermore, the generator in SC-GAN learns to estimate depth from the input frames, while the discriminator learns to distinguish between the ground-truth and estimated depth map for the reference frame. Experiments on the KITTI and Cityscapes datasets show that the proposed SC-GAN can achieve much more accurate depth maps than many existing state-of-the-art methods on monocular videos.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"55 1 1","pages":"7493-7503"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86070249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Deep Blind Hyperspectral Image Fusion
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00425
Wu Wang, Weihong Zeng, Yue Huang, Xinghao Ding, J. Paisley
Hyperspectral image fusion (HIF) reconstructs high spatial resolution hyperspectral images from low spatial resolution hyperspectral images and high spatial resolution multispectral images. Previous works usually assume that the linear mapping between the point spread functions of the hyperspectral camera and the spectral response functions of the conventional camera is known. This is unrealistic in many scenarios. We propose a method for blind HIF problem based on deep learning, where the estimation of the observation model and fusion process are optimized iteratively and alternatingly during the super-resolution reconstruction. In addition, the proposed framework enforces simultaneous spatial and spectral accuracy. Using three public datasets, the experimental results demonstrate that the proposed algorithm outperforms existing blind and non-blind methods.
高光谱图像融合(HIF)是由低空间分辨率高光谱图像和高空间分辨率多光谱图像重构高空间分辨率高光谱图像。以往的研究通常假设高光谱相机的点扩展函数与普通相机的光谱响应函数之间的线性映射是已知的。这在很多情况下是不现实的。提出了一种基于深度学习的盲HIF问题的方法,在超分辨率重建过程中迭代交替优化观测模型的估计和融合过程。此外,提出的框架强制同时空间和光谱精度。在三个公开数据集上的实验结果表明,该算法优于现有的盲法和非盲法。
{"title":"Deep Blind Hyperspectral Image Fusion","authors":"Wu Wang, Weihong Zeng, Yue Huang, Xinghao Ding, J. Paisley","doi":"10.1109/ICCV.2019.00425","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00425","url":null,"abstract":"Hyperspectral image fusion (HIF) reconstructs high spatial resolution hyperspectral images from low spatial resolution hyperspectral images and high spatial resolution multispectral images. Previous works usually assume that the linear mapping between the point spread functions of the hyperspectral camera and the spectral response functions of the conventional camera is known. This is unrealistic in many scenarios. We propose a method for blind HIF problem based on deep learning, where the estimation of the observation model and fusion process are optimized iteratively and alternatingly during the super-resolution reconstruction. In addition, the proposed framework enforces simultaneous spatial and spectral accuracy. Using three public datasets, the experimental results demonstrate that the proposed algorithm outperforms existing blind and non-blind methods.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"14 1","pages":"4149-4158"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86615594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Discriminative Feature Transformation for Occluded Pedestrian Detection 遮挡行人检测的判别特征变换
Pub Date : 2019-10-01 DOI: 10.1109/ICCV.2019.00965
Chunluan Zhou, Ming Yang, Junsong Yuan
Despite promising performance achieved by deep con- volutional neural networks for non-occluded pedestrian de- tection, it remains a great challenge to detect partially oc- cluded pedestrians. Compared with non-occluded pedes- trian examples, it is generally more difficult to distinguish occluded pedestrian examples from background in featue space due to the missing of occluded parts. In this paper, we propose a discriminative feature transformation which en- forces feature separability of pedestrian and non-pedestrian examples to handle occlusions for pedestrian detection. Specifically, in feature space it makes pedestrian exam- ples approach the centroid of easily classified non-occluded pedestrian examples and pushes non-pedestrian examples close to the centroid of easily classified non-pedestrian ex- amples. Such a feature transformation partially compen- sates the missing contribution of occluded parts in feature space, therefore improving the performance for occluded pedestrian detection. We implement our approach in the Fast R-CNN framework by adding one transformation net- work branch. We validate the proposed approach on two widely used pedestrian detection datasets: Caltech and CityPersons. Experimental results show that our approach achieves promising performance for both non-occluded and occluded pedestrian detection.
尽管深度卷积神经网络在无遮挡行人检测方面取得了良好的效果,但检测部分遮挡行人仍然是一个巨大的挑战。与未遮挡的行人样例相比,由于遮挡部分的缺失,在特征空间中区分遮挡的行人样例与背景的难度较大。本文提出了一种判别特征变换方法,利用行人和非行人样本的特征可分离性来处理行人遮挡。具体来说,在特征空间中,它使行人检测样本接近易分类非遮挡行人样例的质心,并将非行人样例推向易分类非行人样例的质心。这种特征变换部分补偿了遮挡部分在特征空间中的缺失贡献,从而提高了遮挡行人检测的性能。我们在Fast R-CNN框架中通过增加一个转换网络分支来实现我们的方法。我们在两个广泛使用的行人检测数据集(Caltech和CityPersons)上验证了所提出的方法。实验结果表明,该方法在无遮挡和遮挡的行人检测中都取得了很好的效果。
{"title":"Discriminative Feature Transformation for Occluded Pedestrian Detection","authors":"Chunluan Zhou, Ming Yang, Junsong Yuan","doi":"10.1109/ICCV.2019.00965","DOIUrl":"https://doi.org/10.1109/ICCV.2019.00965","url":null,"abstract":"Despite promising performance achieved by deep con- volutional neural networks for non-occluded pedestrian de- tection, it remains a great challenge to detect partially oc- cluded pedestrians. Compared with non-occluded pedes- trian examples, it is generally more difficult to distinguish occluded pedestrian examples from background in featue space due to the missing of occluded parts. In this paper, we propose a discriminative feature transformation which en- forces feature separability of pedestrian and non-pedestrian examples to handle occlusions for pedestrian detection. Specifically, in feature space it makes pedestrian exam- ples approach the centroid of easily classified non-occluded pedestrian examples and pushes non-pedestrian examples close to the centroid of easily classified non-pedestrian ex- amples. Such a feature transformation partially compen- sates the missing contribution of occluded parts in feature space, therefore improving the performance for occluded pedestrian detection. We implement our approach in the Fast R-CNN framework by adding one transformation net- work branch. We validate the proposed approach on two widely used pedestrian detection datasets: Caltech and CityPersons. Experimental results show that our approach achieves promising performance for both non-occluded and occluded pedestrian detection.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"327 1","pages":"9556-9565"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86778241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
期刊
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1