首页 > 最新文献

2017 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
AOD-Net: All-in-One Dehazing Network AOD-Net:一体化除雾网络
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.511
Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, Dan Feng
This paper proposes an image dehazing model built with a convolutional neural network (CNN), called All-in-One Dehazing Network (AOD-Net). It is designed based on a re-formulated atmospheric scattering model. Instead of estimating the transmission matrix and the atmospheric light separately as most previous models did, AOD-Net directly generates the clean image through a light-weight CNN. Such a novel end-to-end design makes it easy to embed AOD-Net into other deep models, e.g., Faster R-CNN, for improving high-level tasks on hazy images. Experimental results on both synthesized and natural hazy image datasets demonstrate our superior performance than the state-of-the-art in terms of PSNR, SSIM and the subjective visual quality. Furthermore, when concatenating AOD-Net with Faster R-CNN, we witness a large improvement of the object detection performance on hazy images.
本文提出了一种基于卷积神经网络(CNN)的图像去雾模型,称为All-in-One dehaze network (AOD-Net)。它是基于一个重新制定的大气散射模型设计的。AOD-Net不像以前大多数模型那样分别估计传输矩阵和大气光,而是通过一个轻量级的CNN直接生成干净的图像。这种新颖的端到端设计使得AOD-Net很容易嵌入到其他深度模型中,例如Faster R-CNN,用于改善模糊图像的高级任务。在合成和自然模糊图像数据集上的实验结果表明,我们的算法在PSNR、SSIM和主观视觉质量方面都优于目前最先进的算法。此外,当将AOD-Net与Faster R-CNN连接时,我们看到在朦胧图像上目标检测性能有了很大的提高。
{"title":"AOD-Net: All-in-One Dehazing Network","authors":"Boyi Li, Xiulian Peng, Zhangyang Wang, Jizheng Xu, Dan Feng","doi":"10.1109/ICCV.2017.511","DOIUrl":"https://doi.org/10.1109/ICCV.2017.511","url":null,"abstract":"This paper proposes an image dehazing model built with a convolutional neural network (CNN), called All-in-One Dehazing Network (AOD-Net). It is designed based on a re-formulated atmospheric scattering model. Instead of estimating the transmission matrix and the atmospheric light separately as most previous models did, AOD-Net directly generates the clean image through a light-weight CNN. Such a novel end-to-end design makes it easy to embed AOD-Net into other deep models, e.g., Faster R-CNN, for improving high-level tasks on hazy images. Experimental results on both synthesized and natural hazy image datasets demonstrate our superior performance than the state-of-the-art in terms of PSNR, SSIM and the subjective visual quality. Furthermore, when concatenating AOD-Net with Faster R-CNN, we witness a large improvement of the object detection performance on hazy images.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"17 1","pages":"4780-4788"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74781172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1096
Generative Modeling of Audible Shapes for Object Perception 面向对象感知的可听形状生成建模
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.141
Zhoutong Zhang, Jiajun Wu, Qiujia Li, Zhengjia Huang, James Traer, Josh H. McDermott, J. Tenenbaum, W. Freeman
Humans infer rich knowledge of objects from both auditory and visual cues. Building a machine of such competency, however, is very challenging, due to the great difficulty in capturing large-scale, clean data of objects with both their appearance and the sound they make. In this paper, we present a novel, open-source pipeline that generates audiovisual data, purely from 3D object shapes and their physical properties. Through comparison with audio recordings and human behavioral studies, we validate the accuracy of the sounds it generates. Using this generative model, we are able to construct a synthetic audio-visual dataset, namely Sound-20K, for object perception tasks. We demonstrate that auditory and visual information play complementary roles in object perception, and further, that the representation learned on synthetic audio-visual data can transfer to real-world scenarios.
人类从听觉和视觉线索中推断出对物体的丰富知识。然而,建造一台具有这种能力的机器是非常具有挑战性的,因为很难捕获大规模的、干净的物体数据,包括它们的外观和声音。在本文中,我们提出了一种新颖的开源管道,可以纯粹从3D对象形状及其物理属性生成视听数据。通过与录音和人类行为研究的比较,我们验证了它产生的声音的准确性。使用这个生成模型,我们能够构建一个合成的视听数据集,即Sound-20K,用于对象感知任务。我们证明了听觉和视觉信息在物体感知中起着互补的作用,而且,在合成视听数据上学习到的表示可以转移到现实世界的场景中。
{"title":"Generative Modeling of Audible Shapes for Object Perception","authors":"Zhoutong Zhang, Jiajun Wu, Qiujia Li, Zhengjia Huang, James Traer, Josh H. McDermott, J. Tenenbaum, W. Freeman","doi":"10.1109/ICCV.2017.141","DOIUrl":"https://doi.org/10.1109/ICCV.2017.141","url":null,"abstract":"Humans infer rich knowledge of objects from both auditory and visual cues. Building a machine of such competency, however, is very challenging, due to the great difficulty in capturing large-scale, clean data of objects with both their appearance and the sound they make. In this paper, we present a novel, open-source pipeline that generates audiovisual data, purely from 3D object shapes and their physical properties. Through comparison with audio recordings and human behavioral studies, we validate the accuracy of the sounds it generates. Using this generative model, we are able to construct a synthetic audio-visual dataset, namely Sound-20K, for object perception tasks. We demonstrate that auditory and visual information play complementary roles in object perception, and further, that the representation learned on synthetic audio-visual data can transfer to real-world scenarios.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"38 1","pages":"1260-1269"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72877952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
BodyFusion: Real-Time Capture of Human Motion and Surface Geometry Using a Single Depth Camera BodyFusion:使用单个深度相机实时捕获人体运动和表面几何形状
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.104
Tao Yu, Kaiwen Guo, F. Xu, Yuan Dong, Zhaoqi Su, Jianhui Zhao, Jianguo Li, Qionghai Dai, Yebin Liu
We propose BodyFusion, a novel real-time geometry fusion method that can track and reconstruct non-rigid surface motion of a human performance using a single consumer-grade depth camera. To reduce the ambiguities of the non-rigid deformation parameterization on the surface graph nodes, we take advantage of the internal articulated motion prior for human performance and contribute a skeleton-embedded surface fusion (SSF) method. The key feature of our method is that it jointly solves for both the skeleton and graph-node deformations based on information of the attachments between the skeleton and the graph nodes. The attachments are also updated frame by frame based on the fused surface geometry and the computed deformations. Overall, our method enables increasingly denoised, detailed, and complete surface reconstruction as well as the updating of the skeleton and attachments as the temporal depth frames are fused. Experimental results show that our method exhibits substantially improved nonrigid motion fusion performance and tracking robustness compared with previous state-of-the-art fusion methods. We also contribute a dataset for the quantitative evaluation of fusion-based dynamic scene reconstruction algorithms using a single depth camera.
我们提出BodyFusion,这是一种新颖的实时几何融合方法,可以使用单个消费级深度相机跟踪和重建人类表演的非刚性表面运动。为了减少曲面图节点上非刚性变形参数化的模糊性,我们利用了人类性能的内部关节运动先验,并提出了一种骨架嵌入曲面融合(SSF)方法。该方法的关键特点是基于骨架和图节点之间的附件信息,联合求解骨架和图节点的变形。附件也根据融合的表面几何形状和计算的变形逐帧更新。总的来说,我们的方法可以实现越来越去噪、详细和完整的表面重建,以及随着时间深度帧的融合而更新骨架和附着物。实验结果表明,与以往的融合方法相比,我们的方法在非刚性运动融合性能和跟踪鲁棒性方面有了很大的提高。我们还提供了一个数据集,用于定量评估使用单深度相机的基于融合的动态场景重建算法。
{"title":"BodyFusion: Real-Time Capture of Human Motion and Surface Geometry Using a Single Depth Camera","authors":"Tao Yu, Kaiwen Guo, F. Xu, Yuan Dong, Zhaoqi Su, Jianhui Zhao, Jianguo Li, Qionghai Dai, Yebin Liu","doi":"10.1109/ICCV.2017.104","DOIUrl":"https://doi.org/10.1109/ICCV.2017.104","url":null,"abstract":"We propose BodyFusion, a novel real-time geometry fusion method that can track and reconstruct non-rigid surface motion of a human performance using a single consumer-grade depth camera. To reduce the ambiguities of the non-rigid deformation parameterization on the surface graph nodes, we take advantage of the internal articulated motion prior for human performance and contribute a skeleton-embedded surface fusion (SSF) method. The key feature of our method is that it jointly solves for both the skeleton and graph-node deformations based on information of the attachments between the skeleton and the graph nodes. The attachments are also updated frame by frame based on the fused surface geometry and the computed deformations. Overall, our method enables increasingly denoised, detailed, and complete surface reconstruction as well as the updating of the skeleton and attachments as the temporal depth frames are fused. Experimental results show that our method exhibits substantially improved nonrigid motion fusion performance and tracking robustness compared with previous state-of-the-art fusion methods. We also contribute a dataset for the quantitative evaluation of fusion-based dynamic scene reconstruction algorithms using a single depth camera.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"6 1","pages":"910-919"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74216470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 160
A Coarse-Fine Network for Keypoint Localization 关键点定位的粗-精网络
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.329
Shaoli Huang, Mingming Gong, D. Tao
We propose a coarse-fine network (CFN) that exploits multi-level supervisions for keypoint localization. Recently, convolutional neural networks (CNNs)-based methods have achieved great success due to the powerful hierarchical features in CNNs. These methods typically use confidence maps generated from ground-truth keypoint locations as supervisory signals. However, while some keypoints can be easily located with high accuracy, many of them are hard to localize due to appearance ambiguity. Thus, using strict supervision often fails to detect keypoints that are difficult to locate accurately To target this problem, we develop a keypoint localization network composed of several coarse detector branches, each of which is built on top of a feature layer in a CNN, and a fine detector branch built on top of multiple feature layers. We supervise each branch by a specified label map to explicate a certain supervision strictness level. All the branches are unified principally to produce the final accurate keypoint locations. We demonstrate the efficacy, efficiency, and generality of our method on several benchmarks for multiple tasks including bird part localization and human body pose estimation. Especially, our method achieves 72.2% AP on the 2016 COCO Keypoints Challenge dataset, which is an 18% improvement over the winning entry.
我们提出了一种利用多级监督进行关键点定位的粗-细网络(CFN)。近年来,基于卷积神经网络(cnn)的方法由于其强大的层次特征而取得了巨大的成功。这些方法通常使用从地面真值关键点位置生成的置信度图作为监督信号。然而,虽然一些关键点可以很容易地定位,精度很高,但许多关键点由于外观歧义而难以定位。因此,使用严格的监督往往无法检测到难以准确定位的关键点。针对这一问题,我们开发了一个关键点定位网络,该网络由几个粗检测器分支组成,每个粗检测器分支都建立在CNN的一个特征层之上,而一个细检测器分支则建立在多个特征层之上。我们通过指定的标签图对每个分支机构进行监管,以阐明一定的监管严格程度。所有分支基本统一,以产生最终准确的关键点位置。我们在包括鸟类部位定位和人体姿势估计在内的多个任务的几个基准上证明了我们的方法的有效性,效率和通用性。特别是,我们的方法在2016年COCO关键点挑战数据集上实现了72.2%的AP,比获胜条目提高了18%。
{"title":"A Coarse-Fine Network for Keypoint Localization","authors":"Shaoli Huang, Mingming Gong, D. Tao","doi":"10.1109/ICCV.2017.329","DOIUrl":"https://doi.org/10.1109/ICCV.2017.329","url":null,"abstract":"We propose a coarse-fine network (CFN) that exploits multi-level supervisions for keypoint localization. Recently, convolutional neural networks (CNNs)-based methods have achieved great success due to the powerful hierarchical features in CNNs. These methods typically use confidence maps generated from ground-truth keypoint locations as supervisory signals. However, while some keypoints can be easily located with high accuracy, many of them are hard to localize due to appearance ambiguity. Thus, using strict supervision often fails to detect keypoints that are difficult to locate accurately To target this problem, we develop a keypoint localization network composed of several coarse detector branches, each of which is built on top of a feature layer in a CNN, and a fine detector branch built on top of multiple feature layers. We supervise each branch by a specified label map to explicate a certain supervision strictness level. All the branches are unified principally to produce the final accurate keypoint locations. We demonstrate the efficacy, efficiency, and generality of our method on several benchmarks for multiple tasks including bird part localization and human body pose estimation. Especially, our method achieves 72.2% AP on the 2016 COCO Keypoints Challenge dataset, which is an 18% improvement over the winning entry.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"106 1","pages":"3047-3056"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79257455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 146
A Joint Intrinsic-Extrinsic Prior Model for Retinex 视网膜的联合内在-外在先验模型
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.431
Bolun Cai, Xian-shun Xu, K. Guo, K. Jia, B. Hu, D. Tao
We propose a joint intrinsic-extrinsic prior model to estimate both illumination and reflectance from an observed image. The 2D image formed from 3D object in the scene is affected by the intrinsic properties (shape and texture) and the extrinsic property (illumination). Based on a novel structure-preserving measure called local variation deviation, a joint intrinsic-extrinsic prior model is proposed for better representation. Better than conventional Retinex models, the proposed model can preserve the structure information by shape prior, estimate the reflectance with fine details by texture prior, and capture the luminous source by illumination prior. Experimental results demonstrate the effectiveness of the proposed method on simulated and real data. Compared with the other Retinex algorithms and state-of-the-art algorithms, the proposed model yields better results on both subjective and objective assessments.
我们提出了一个联合的内在-外在先验模型来估计从观测图像的照明和反射率。场景中三维物体形成的二维图像受其内在属性(形状和纹理)和外在属性(光照)的影响。基于一种新颖的结构保持度量——局部变异偏差,提出了一种联合的内在-外在先验模型,以便更好地表示。与传统的Retinex模型相比,该模型可以通过形状先验来保留结构信息,通过纹理先验来估计精细细节的反射率,通过照明先验来捕获光源。实验结果证明了该方法在仿真和实际数据上的有效性。与其他Retinex算法和最先进的算法相比,该模型在主观和客观评价方面都取得了更好的结果。
{"title":"A Joint Intrinsic-Extrinsic Prior Model for Retinex","authors":"Bolun Cai, Xian-shun Xu, K. Guo, K. Jia, B. Hu, D. Tao","doi":"10.1109/ICCV.2017.431","DOIUrl":"https://doi.org/10.1109/ICCV.2017.431","url":null,"abstract":"We propose a joint intrinsic-extrinsic prior model to estimate both illumination and reflectance from an observed image. The 2D image formed from 3D object in the scene is affected by the intrinsic properties (shape and texture) and the extrinsic property (illumination). Based on a novel structure-preserving measure called local variation deviation, a joint intrinsic-extrinsic prior model is proposed for better representation. Better than conventional Retinex models, the proposed model can preserve the structure information by shape prior, estimate the reflectance with fine details by texture prior, and capture the luminous source by illumination prior. Experimental results demonstrate the effectiveness of the proposed method on simulated and real data. Compared with the other Retinex algorithms and state-of-the-art algorithms, the proposed model yields better results on both subjective and objective assessments.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"529 1","pages":"4020-4029"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83367672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 173
Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identification 车辆再识别的方向不变特征嵌入与时空正则化
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.49
Zhongdao Wang, Luming Tang, Xihui Liu, Zhuliang Yao, Shuai Yi, Jing Shao, Junjie Yan, Shengjin Wang, Hongsheng Li, Xiaogang Wang
In this paper, we tackle the vehicle Re-identification (ReID) problem which is of great importance in urban surveillance and can be used for multiple applications. In our vehicle ReID framework, an orientation invariant feature embedding module and a spatial-temporal regularization module are proposed. With orientation invariant feature embedding, local region features of different orientations can be extracted based on 20 key point locations and can be well aligned and combined. With spatial-temporal regularization, the log-normal distribution is adopted to model the spatial-temporal constraints and the retrieval results can be refined. Experiments are conducted on public vehicle ReID datasets and our proposed method achieves state-of-the-art performance. Investigations of the proposed framework is conducted, including the landmark regressor and comparisons with attention mechanism. Both the orientation invariant feature embedding and the spatio-temporal regularization achieve considerable improvements.
车辆再识别(ReID)是城市监控中一个非常重要的问题,可用于多种应用。在车辆ReID框架中,提出了方向不变特征嵌入模块和时空正则化模块。通过方向不变特征嵌入,可以基于20个关键点位置提取不同方向的局部区域特征,并可以很好地对齐和组合。通过时空正则化,采用对数正态分布对时空约束进行建模,使检索结果更加精细化。在公共车辆ReID数据集上进行了实验,我们提出的方法达到了最先进的性能。对所提出的框架进行了研究,包括里程碑回归量和与注意机制的比较。方向不变特征嵌入和时空正则化都得到了很大的改进。
{"title":"Orientation Invariant Feature Embedding and Spatial Temporal Regularization for Vehicle Re-identification","authors":"Zhongdao Wang, Luming Tang, Xihui Liu, Zhuliang Yao, Shuai Yi, Jing Shao, Junjie Yan, Shengjin Wang, Hongsheng Li, Xiaogang Wang","doi":"10.1109/ICCV.2017.49","DOIUrl":"https://doi.org/10.1109/ICCV.2017.49","url":null,"abstract":"In this paper, we tackle the vehicle Re-identification (ReID) problem which is of great importance in urban surveillance and can be used for multiple applications. In our vehicle ReID framework, an orientation invariant feature embedding module and a spatial-temporal regularization module are proposed. With orientation invariant feature embedding, local region features of different orientations can be extracted based on 20 key point locations and can be well aligned and combined. With spatial-temporal regularization, the log-normal distribution is adopted to model the spatial-temporal constraints and the retrieval results can be refined. Experiments are conducted on public vehicle ReID datasets and our proposed method achieves state-of-the-art performance. Investigations of the proposed framework is conducted, including the landmark regressor and comparisons with attention mechanism. Both the orientation invariant feature embedding and the spatio-temporal regularization achieve considerable improvements.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"29 1","pages":"379-387"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82241690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 314
Rolling-Shutter-Aware Differential SfM and Image Rectification 滚动快门感知差分SfM和图像校正
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.108
Bingbing Zhuang, L. Cheong, Gim Hee Lee
In this paper, we develop a modified differential Structure from Motion (SfM) algorithm that can estimate relative pose from two consecutive frames despite of Rolling Shutter (RS) artifacts. In particular, we show that under constant velocity assumption, the errors induced by the rolling shutter effect can be easily rectified by a linear scaling operation on each optical flow. We further propose a 9-point algorithm to recover the relative pose of a rolling shutter camera that undergoes constant acceleration motion. We demonstrate that the dense depth maps recovered from the relative pose of the RS camera can be used in a RS-aware warping for image rectification to recover high-quality Global Shutter (GS) images. Experiments on both synthetic and real RS images show that our RS-aware differential SfM algorithm produces more accurate results on relative pose estimation and 3D reconstruction from images distorted by RS effect compared to standard SfM algorithms that assume a GS camera model. We also demonstrate that our RS-aware warping for image rectification method outperforms state-of-the-art commercial software products, i.e. Adobe After Effects and Apple Imovie, at removing RS artifacts.
在本文中,我们开发了一种改进的运动微分结构(SfM)算法,该算法可以在滚动快门(RS)伪影的情况下从两个连续帧中估计相对姿态。特别是,我们证明了在恒定速度假设下,滚动快门效应引起的误差可以很容易地通过对每个光流进行线性缩放操作来纠正。我们进一步提出了一种9点算法来恢复恒定加速度运动下卷帘式相机的相对姿态。我们证明了从RS相机的相对姿态恢复的密集深度图可以用于RS感知的图像校正,以恢复高质量的全局快门(GS)图像。在合成和真实RS图像上的实验表明,与假设GS相机模型的标准SfM算法相比,我们的RS感知差分SfM算法在被RS影响的图像的相对姿态估计和3D重建方面产生了更准确的结果。我们还证明,我们的图像校正方法的RS感知翘曲优于最先进的商业软件产品,即Adobe After Effects和Apple Imovie,在去除RS伪影。
{"title":"Rolling-Shutter-Aware Differential SfM and Image Rectification","authors":"Bingbing Zhuang, L. Cheong, Gim Hee Lee","doi":"10.1109/ICCV.2017.108","DOIUrl":"https://doi.org/10.1109/ICCV.2017.108","url":null,"abstract":"In this paper, we develop a modified differential Structure from Motion (SfM) algorithm that can estimate relative pose from two consecutive frames despite of Rolling Shutter (RS) artifacts. In particular, we show that under constant velocity assumption, the errors induced by the rolling shutter effect can be easily rectified by a linear scaling operation on each optical flow. We further propose a 9-point algorithm to recover the relative pose of a rolling shutter camera that undergoes constant acceleration motion. We demonstrate that the dense depth maps recovered from the relative pose of the RS camera can be used in a RS-aware warping for image rectification to recover high-quality Global Shutter (GS) images. Experiments on both synthetic and real RS images show that our RS-aware differential SfM algorithm produces more accurate results on relative pose estimation and 3D reconstruction from images distorted by RS effect compared to standard SfM algorithms that assume a GS camera model. We also demonstrate that our RS-aware warping for image rectification method outperforms state-of-the-art commercial software products, i.e. Adobe After Effects and Apple Imovie, at removing RS artifacts.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"46 1","pages":"948-956"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80996521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Estimating Defocus Blur via Rank of Local Patches 通过局部补丁的秩估计散焦模糊
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.574
Guodong Xu, Yuhui Quan, Hui Ji
This paper addresses the problem of defocus map estimation from a single image. We present a fast yet effective approach to estimate the spatially varying amounts of defocus blur at edge locations, which is based on the maximum ranks of the corresponding local patches with different orientations in gradient domain. Such an approach is motivated by the theoretical analysis which reveals the connection between the rank of a local patch blurred by a defocus-blur kernel and the blur amount by the kernel. After the amounts of defocus blur at edge locations are obtained, a complete defocus map is generated by a standard propagation procedure. The proposed method is extensively evaluated on real image datasets, and the experimental results show its superior performance to existing approaches.
本文解决了单幅图像的离焦图估计问题。我们提出了一种快速有效的方法来估计边缘位置的空间变化的离焦模糊量,该方法基于梯度域中不同方向的相应局部斑块的最大秩。理论分析揭示了散焦模糊核模糊后局部斑块的秩与模糊量之间的关系。在获得边缘位置的散焦模糊量后,通过标准的传播程序生成完整的散焦图。该方法在真实图像数据集上进行了广泛的评估,实验结果表明其优于现有方法。
{"title":"Estimating Defocus Blur via Rank of Local Patches","authors":"Guodong Xu, Yuhui Quan, Hui Ji","doi":"10.1109/ICCV.2017.574","DOIUrl":"https://doi.org/10.1109/ICCV.2017.574","url":null,"abstract":"This paper addresses the problem of defocus map estimation from a single image. We present a fast yet effective approach to estimate the spatially varying amounts of defocus blur at edge locations, which is based on the maximum ranks of the corresponding local patches with different orientations in gradient domain. Such an approach is motivated by the theoretical analysis which reveals the connection between the rank of a local patch blurred by a defocus-blur kernel and the blur amount by the kernel. After the amounts of defocus blur at edge locations are obtained, a complete defocus map is generated by a standard propagation procedure. The proposed method is extensively evaluated on real image datasets, and the experimental results show its superior performance to existing approaches.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"22 1","pages":"5381-5389"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83210700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Moving Object Detection in Time-Lapse or Motion Trigger Image Sequences Using Low-Rank and Invariant Sparse Decomposition 基于低秩和不变稀疏分解的延时或运动触发图像序列中的运动目标检测
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.548
M. Shakeri, Hong Zhang
Low-rank and sparse representation based methods have attracted wide attention in background subtraction and moving object detection, where moving objects in the scene are modeled as pixel-wise sparse outliers. Since in real scenarios moving objects are also structurally sparse, recently researchers have attempted to extract moving objects using structured sparse outliers. Although existing methods with structured sparsity-inducing norms produce promising results, they are still vulnerable to various illumination changes that frequently occur in real environments, specifically for time-lapse image sequences where assumptions about sparsity between images such as group sparsity are not valid. In this paper, we first introduce a prior map obtained by illumination invariant representation of images. Next, we propose a low-rank and invariant sparse decomposition using the prior map to detect moving objects under significant illumination changes. Experiments on challenging benchmark datasets demonstrate the superior performance of our proposed method under complex illumination changes.
基于低秩和稀疏表示的方法在背景减除和运动目标检测中引起了广泛的关注,其中场景中的运动目标被建模为逐像素的稀疏异常值。由于在真实场景中运动物体也是结构稀疏的,最近研究人员尝试使用结构化稀疏离群值来提取运动物体。尽管现有的具有结构化稀疏性诱导规范的方法产生了有希望的结果,但它们仍然容易受到真实环境中经常发生的各种照明变化的影响,特别是对于延时图像序列,其中图像之间的稀疏性假设(如组稀疏性)是无效的。本文首先引入了一种基于图像光照不变表示的先验映射。接下来,我们提出了一种基于先验映射的低秩不变稀疏分解方法来检测光照显著变化下的运动物体。在具有挑战性的基准数据集上的实验证明了该方法在复杂光照变化下的优越性能。
{"title":"Moving Object Detection in Time-Lapse or Motion Trigger Image Sequences Using Low-Rank and Invariant Sparse Decomposition","authors":"M. Shakeri, Hong Zhang","doi":"10.1109/ICCV.2017.548","DOIUrl":"https://doi.org/10.1109/ICCV.2017.548","url":null,"abstract":"Low-rank and sparse representation based methods have attracted wide attention in background subtraction and moving object detection, where moving objects in the scene are modeled as pixel-wise sparse outliers. Since in real scenarios moving objects are also structurally sparse, recently researchers have attempted to extract moving objects using structured sparse outliers. Although existing methods with structured sparsity-inducing norms produce promising results, they are still vulnerable to various illumination changes that frequently occur in real environments, specifically for time-lapse image sequences where assumptions about sparsity between images such as group sparsity are not valid. In this paper, we first introduce a prior map obtained by illumination invariant representation of images. Next, we propose a low-rank and invariant sparse decomposition using the prior map to detect moving objects under significant illumination changes. Experiments on challenging benchmark datasets demonstrate the superior performance of our proposed method under complex illumination changes.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"27 1","pages":"5133-5141"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78837942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes 用于街景语义理解的Mapillary远景数据集
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.534
Gerhard Neuhold, Tobias Ollmann, S. R. Bulò, P. Kontschieder
The Mapillary Vistas Dataset is a novel, large-scale street-level image dataset containing 25000 high-resolution images annotated into 66 object categories with additional, instance-specific labels for 37 classes. Annotation is performed in a dense and fine-grained style by using polygons for delineating individual objects. Our dataset is 5× larger than the total amount of fine annotations for Cityscapes and contains images from all around the world, captured at various conditions regarding weather, season and daytime. Images come from different imaging devices (mobile phones, tablets, action cameras, professional capturing rigs) and differently experienced photographers. In such a way, our dataset has been designed and compiled to cover diversity, richness of detail and geographic extent. As default benchmark tasks, we define semantic image segmentation and instance-specific image segmentation, aiming to significantly further the development of state-of-the-art methods for visual road-scene understanding.
Mapillary远景数据集是一个新颖的大规模街道图像数据集,包含25000张高分辨率图像,标注为66个对象类别,并为37个类别添加了实例特定的标签。通过使用多边形来描绘单个对象,以密集和细粒度的方式执行注释。我们的数据集比城市景观的精细注释总量大5倍,包含来自世界各地的图像,在不同的天气、季节和白天条件下拍摄。图像来自不同的成像设备(手机,平板电脑,运动相机,专业捕捉设备)和不同经验的摄影师。通过这种方式,我们的数据集被设计和编译为涵盖多样性,丰富的细节和地理范围。作为默认的基准任务,我们定义了语义图像分割和特定实例的图像分割,旨在进一步开发最先进的视觉道路场景理解方法。
{"title":"The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes","authors":"Gerhard Neuhold, Tobias Ollmann, S. R. Bulò, P. Kontschieder","doi":"10.1109/ICCV.2017.534","DOIUrl":"https://doi.org/10.1109/ICCV.2017.534","url":null,"abstract":"The Mapillary Vistas Dataset is a novel, large-scale street-level image dataset containing 25000 high-resolution images annotated into 66 object categories with additional, instance-specific labels for 37 classes. Annotation is performed in a dense and fine-grained style by using polygons for delineating individual objects. Our dataset is 5× larger than the total amount of fine annotations for Cityscapes and contains images from all around the world, captured at various conditions regarding weather, season and daytime. Images come from different imaging devices (mobile phones, tablets, action cameras, professional capturing rigs) and differently experienced photographers. In such a way, our dataset has been designed and compiled to cover diversity, richness of detail and geographic extent. As default benchmark tasks, we define semantic image segmentation and instance-specific image segmentation, aiming to significantly further the development of state-of-the-art methods for visual road-scene understanding.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"66 1","pages":"5000-5009"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82706596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 984
期刊
2017 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1