首页 > 最新文献

2017 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Robust Hand Pose Estimation during the Interaction with an Unknown Object 与未知物体交互过程中手部姿态的鲁棒估计
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.339
Chiho Choi, S. Yoon, China Chen, K. Ramani
This paper proposes a robust solution for accurate 3D hand pose estimation in the presence of an external object interacting with hands. Our main insight is that the shape of an object causes a configuration of the hand in the form of a hand grasp. Along this line, we simultaneously train deep neural networks using paired depth images. The object-oriented network learns functional grasps from an object perspective, whereas the hand-oriented network explores the details of hand configurations from a hand perspective. The two networks share intermediate observations produced from different perspectives to create a more informed representation. Our system then collaboratively classifies the grasp types and orientation of the hand and further constrains a pose space using these estimates. Finally, we collectively refine the unknown pose parameters to reconstruct the final hand pose. To this end, we conduct extensive evaluations to validate the efficacy of the proposed collaborative learning approach by comparing it with self-generated baselines and the state-of-the-art method.
本文提出了一种鲁棒的解决方案,用于在外部物体与手相互作用的情况下准确估计三维手的姿态。我们的主要见解是,物体的形状导致手以手抓的形式配置。沿着这条线,我们同时使用成对深度图像训练深度神经网络。面向对象的网络从对象的角度学习功能掌握,而面向手的网络从手的角度探索手的配置细节。这两个网络共享从不同角度产生的中间观察结果,以创建更知情的表示。然后,我们的系统协同分类手的抓取类型和方向,并使用这些估计进一步约束姿态空间。最后,我们共同对未知的姿态参数进行细化,重建最终的手部姿态。为此,我们进行了广泛的评估,通过将所提出的协作学习方法与自生成基线和最先进的方法进行比较,来验证其有效性。
{"title":"Robust Hand Pose Estimation during the Interaction with an Unknown Object","authors":"Chiho Choi, S. Yoon, China Chen, K. Ramani","doi":"10.1109/ICCV.2017.339","DOIUrl":"https://doi.org/10.1109/ICCV.2017.339","url":null,"abstract":"This paper proposes a robust solution for accurate 3D hand pose estimation in the presence of an external object interacting with hands. Our main insight is that the shape of an object causes a configuration of the hand in the form of a hand grasp. Along this line, we simultaneously train deep neural networks using paired depth images. The object-oriented network learns functional grasps from an object perspective, whereas the hand-oriented network explores the details of hand configurations from a hand perspective. The two networks share intermediate observations produced from different perspectives to create a more informed representation. Our system then collaboratively classifies the grasp types and orientation of the hand and further constrains a pose space using these estimates. Finally, we collectively refine the unknown pose parameters to reconstruct the final hand pose. To this end, we conduct extensive evaluations to validate the efficacy of the proposed collaborative learning approach by comparing it with self-generated baselines and the state-of-the-art method.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"56 1","pages":"3142-3151"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79829979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Monocular Free-Head 3D Gaze Tracking with Deep Learning and Geometry Constraints 基于深度学习和几何约束的单目自由头3D凝视跟踪
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.341
Haoping Deng, Wangjiang Zhu
Free-head 3D gaze tracking outputs both the eye location and the gaze vector in 3D space, and it has wide applications in scenarios such as driver monitoring, advertisement analysis and surveillance. A reliable and low-cost monocular solution is critical for pervasive usage in these areas. Noticing that a gaze vector is a composition of head pose and eyeball movement in a geometrically deterministic way, we propose a novel gaze transform layer to connect separate head pose and eyeball movement models. The proposed decomposition does not suffer from head-gaze correlation overfitting and makes it possible to use datasets existing for other tasks. To add stronger supervision for better network training, we propose a two-step training strategy, which first trains sub-tasks with rough labels and then jointly trains with accurate gaze labels. To enable good cross-subject performance under various conditions, we collect a large dataset which has full coverage of head poses and eyeball movements, contains 200 subjects, and has diverse illumination conditions. Our deep solution achieves state-of-the-art gaze tracking accuracy, reaching 5.6° cross-subject prediction error using a small network running at 1000 fps on a single CPU (excluding face alignment time) and 4.3° cross-subject error with a deeper network.
自由头三维凝视跟踪在三维空间中输出眼睛位置和凝视矢量,在驾驶员监控、广告分析和监控等场景中有着广泛的应用。可靠和低成本的单目解决方案对于在这些领域的普遍使用至关重要。注意到凝视向量是头部姿势和眼球运动以几何确定性的方式组成的,我们提出了一种新的凝视变换层来连接单独的头部姿势和眼球运动模型。所提出的分解不会受到头部凝视相关过拟合的影响,并且可以将现有的数据集用于其他任务。为了给网络训练增加更强的监督,我们提出了一种两步训练策略,首先用粗糙标签训练子任务,然后用精确的注视标签联合训练。为了在各种条件下实现良好的跨主体性能,我们收集了一个大型数据集,该数据集涵盖了头部姿势和眼球运动,包含200个受试者,并且具有不同的照明条件。我们的深度解决方案实现了最先进的凝视跟踪精度,使用在单个CPU上以1000 fps运行的小型网络(不包括面部对齐时间)达到5.6°的交叉对象预测误差,使用更深的网络达到4.3°的交叉对象误差。
{"title":"Monocular Free-Head 3D Gaze Tracking with Deep Learning and Geometry Constraints","authors":"Haoping Deng, Wangjiang Zhu","doi":"10.1109/ICCV.2017.341","DOIUrl":"https://doi.org/10.1109/ICCV.2017.341","url":null,"abstract":"Free-head 3D gaze tracking outputs both the eye location and the gaze vector in 3D space, and it has wide applications in scenarios such as driver monitoring, advertisement analysis and surveillance. A reliable and low-cost monocular solution is critical for pervasive usage in these areas. Noticing that a gaze vector is a composition of head pose and eyeball movement in a geometrically deterministic way, we propose a novel gaze transform layer to connect separate head pose and eyeball movement models. The proposed decomposition does not suffer from head-gaze correlation overfitting and makes it possible to use datasets existing for other tasks. To add stronger supervision for better network training, we propose a two-step training strategy, which first trains sub-tasks with rough labels and then jointly trains with accurate gaze labels. To enable good cross-subject performance under various conditions, we collect a large dataset which has full coverage of head poses and eyeball movements, contains 200 subjects, and has diverse illumination conditions. Our deep solution achieves state-of-the-art gaze tracking accuracy, reaching 5.6° cross-subject prediction error using a small network running at 1000 fps on a single CPU (excluding face alignment time) and 4.3° cross-subject error with a deeper network.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"6 1","pages":"3162-3171"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91169621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 107
Surface Registration via Foliation 通过叶理进行表面配准
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.107
Xiaopeng Zheng, Chengfeng Wen, Na Lei, Ming Ma, X. Gu
This work introduces a novel surface registration method based on foliation. A foliation decomposes the surface into a family of closed loops, such that the decomposition has local tensor product structure. By projecting each loop to a point, the surface is collapsed into a graph. Two homeomorphic surfaces with consistent foliations can be registered by first matching their foliation graphs, then matching the corresponding leaves.,,This foliation based method is capable of handling surfaces with complicated topologies and large non-isometric deformations, rigorous with solid theoretic foundation, easy to implement, robust to compute. The result mapping is diffeomorphic. Our experimental results show the efficiency and efficacy of the proposed method.
本文介绍了一种新的基于面理的表面配准方法。叶理将曲面分解为一组闭环,使得分解具有局部张量积结构。通过将每个回路投射到一个点,曲面被折叠成一个图。两个具有一致叶的同胚曲面可以先匹配它们的叶图,然后匹配相应的叶。这种基于叶理的方法能够处理拓扑复杂、非等距变形大的曲面,理论基础扎实、严谨、易于实现、计算鲁棒。结果映射是微分同构的。实验结果表明了该方法的有效性和有效性。
{"title":"Surface Registration via Foliation","authors":"Xiaopeng Zheng, Chengfeng Wen, Na Lei, Ming Ma, X. Gu","doi":"10.1109/ICCV.2017.107","DOIUrl":"https://doi.org/10.1109/ICCV.2017.107","url":null,"abstract":"This work introduces a novel surface registration method based on foliation. A foliation decomposes the surface into a family of closed loops, such that the decomposition has local tensor product structure. By projecting each loop to a point, the surface is collapsed into a graph. Two homeomorphic surfaces with consistent foliations can be registered by first matching their foliation graphs, then matching the corresponding leaves.,,This foliation based method is capable of handling surfaces with complicated topologies and large non-isometric deformations, rigorous with solid theoretic foundation, easy to implement, robust to compute. The result mapping is diffeomorphic. Our experimental results show the efficiency and efficacy of the proposed method.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"15 1","pages":"938-947"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83765385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
PolyFit: Polygonal Surface Reconstruction from Point Clouds PolyFit:从点云的多边形表面重建
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.258
L. Nan, Peter Wonka
We propose a novel framework for reconstructing lightweight polygonal surfaces from point clouds. Unlike traditional methods that focus on either extracting good geometric primitives or obtaining proper arrangements of primitives, the emphasis of this work lies in intersecting the primitives (planes only) and seeking for an appropriate combination of them to obtain a manifold polygonal surface model without boundary.,,We show that reconstruction from point clouds can be cast as a binary labeling problem. Our method is based on a hypothesizing and selection strategy. We first generate a reasonably large set of face candidates by intersecting the extracted planar primitives. Then an optimal subset of the candidate faces is selected through optimization. Our optimization is based on a binary linear programming formulation under hard constraints that enforce the final polygonal surface model to be manifold and watertight. Experiments on point clouds from various sources demonstrate that our method can generate lightweight polygonal surface models of arbitrary piecewise planar objects. Besides, our method is capable of recovering sharp features and is robust to noise, outliers, and missing data.
我们提出了一种从点云重建轻量级多边形表面的新框架。与传统方法的重点在于提取好的几何基元或获得合适的基元排列不同,本工作的重点在于将基元(仅平面)相交并寻求它们的适当组合以获得无边界的流形多边形曲面模型。我们证明了点云的重建可以作为一个二值标记问题。我们的方法是基于假设和选择策略。我们首先通过交叉提取的平面基元生成一个相当大的候选人脸集。然后通过优化选择候选人脸的最优子集。我们的优化是基于硬约束下的二进制线性规划公式,强制最终多边形表面模型是流形和水密的。在不同来源的点云上的实验表明,我们的方法可以生成任意分段平面物体的轻量级多边形表面模型。此外,我们的方法能够恢复尖锐的特征,并且对噪声、异常值和缺失数据具有鲁棒性。
{"title":"PolyFit: Polygonal Surface Reconstruction from Point Clouds","authors":"L. Nan, Peter Wonka","doi":"10.1109/ICCV.2017.258","DOIUrl":"https://doi.org/10.1109/ICCV.2017.258","url":null,"abstract":"We propose a novel framework for reconstructing lightweight polygonal surfaces from point clouds. Unlike traditional methods that focus on either extracting good geometric primitives or obtaining proper arrangements of primitives, the emphasis of this work lies in intersecting the primitives (planes only) and seeking for an appropriate combination of them to obtain a manifold polygonal surface model without boundary.,,We show that reconstruction from point clouds can be cast as a binary labeling problem. Our method is based on a hypothesizing and selection strategy. We first generate a reasonably large set of face candidates by intersecting the extracted planar primitives. Then an optimal subset of the candidate faces is selected through optimization. Our optimization is based on a binary linear programming formulation under hard constraints that enforce the final polygonal surface model to be manifold and watertight. Experiments on point clouds from various sources demonstrate that our method can generate lightweight polygonal surface models of arbitrary piecewise planar objects. Besides, our method is capable of recovering sharp features and is robust to noise, outliers, and missing data.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"34 1","pages":"2372-2380"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83117546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 152
Shadow Detection with Conditional Generative Adversarial Networks 基于条件生成对抗网络的阴影检测
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.483
Vu Nguyen, Tomas F. Yago Vicente, Maozheng Zhao, Minh Hoai, D. Samaras
We introduce scGAN, a novel extension of conditional Generative Adversarial Networks (GAN) tailored for the challenging problem of shadow detection in images. Previous methods for shadow detection focus on learning the local appearance of shadow regions, while using limited local context reasoning in the form of pairwise potentials in a Conditional Random Field. In contrast, the proposed adversarial approach is able to model higher level relationships and global scene characteristics. We train a shadow detector that corresponds to the generator of a conditional GAN, and augment its shadow accuracy by combining the typical GAN loss with a data loss term. Due to the unbalanced distribution of the shadow labels, we use weighted cross entropy. With the standard GAN architecture, properly setting the weight for the cross entropy would require training multiple GANs, a computationally expensive grid procedure. In scGAN, we introduce an additional sensitivity parameter w to the generator. The proposed approach effectively parameterizes the loss of the trained detector. The resulting shadow detector is a single network that can generate shadow maps corresponding to different sensitivity levels, obviating the need for multiple models and a costly training procedure. We evaluate our method on the large-scale SBU and UCF shadow datasets, and observe up to 17% error reduction with respect to the previous state-of-the-art method.
我们介绍了scGAN,这是条件生成对抗网络(GAN)的新扩展,专为图像中的阴影检测这一具有挑战性的问题而设计。以前的阴影检测方法侧重于学习阴影区域的局部外观,而在条件随机场中以成对电位的形式使用有限的局部上下文推理。相比之下,所提出的对抗方法能够模拟更高层次的关系和全局场景特征。我们训练了一个对应于条件GAN生成器的阴影检测器,并通过将典型GAN损失与数据丢失项相结合来增强其阴影准确性。由于阴影标签的不平衡分布,我们使用加权交叉熵。在标准GAN架构中,正确设置交叉熵的权重需要训练多个GAN,这是一个计算成本很高的网格过程。在scGAN中,我们向发生器引入了一个额外的灵敏度参数w。该方法有效地参数化了训练好的检测器的损耗。由此产生的阴影检测器是一个单一的网络,可以生成对应于不同灵敏度水平的阴影图,避免了对多个模型和昂贵的训练过程的需要。我们在大规模SBU和UCF阴影数据集上评估了我们的方法,并观察到与之前最先进的方法相比,误差减少了17%。
{"title":"Shadow Detection with Conditional Generative Adversarial Networks","authors":"Vu Nguyen, Tomas F. Yago Vicente, Maozheng Zhao, Minh Hoai, D. Samaras","doi":"10.1109/ICCV.2017.483","DOIUrl":"https://doi.org/10.1109/ICCV.2017.483","url":null,"abstract":"We introduce scGAN, a novel extension of conditional Generative Adversarial Networks (GAN) tailored for the challenging problem of shadow detection in images. Previous methods for shadow detection focus on learning the local appearance of shadow regions, while using limited local context reasoning in the form of pairwise potentials in a Conditional Random Field. In contrast, the proposed adversarial approach is able to model higher level relationships and global scene characteristics. We train a shadow detector that corresponds to the generator of a conditional GAN, and augment its shadow accuracy by combining the typical GAN loss with a data loss term. Due to the unbalanced distribution of the shadow labels, we use weighted cross entropy. With the standard GAN architecture, properly setting the weight for the cross entropy would require training multiple GANs, a computationally expensive grid procedure. In scGAN, we introduce an additional sensitivity parameter w to the generator. The proposed approach effectively parameterizes the loss of the trained detector. The resulting shadow detector is a single network that can generate shadow maps corresponding to different sensitivity levels, obviating the need for multiple models and a costly training procedure. We evaluate our method on the large-scale SBU and UCF shadow datasets, and observe up to 17% error reduction with respect to the previous state-of-the-art method.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"18 1","pages":"4520-4528"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83463610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 156
Transformed Low-Rank Model for Line Pattern Noise Removal 变换的低秩线性模式去噪
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.191
Yi Chang, Luxin Yan, Sheng Zhong
This paper addresses the problem of line pattern noise removal from a single image, such as rain streak, hyperspectral stripe and so on. Most of the previous methods model the line pattern noise in original image domain, which fail to explicitly exploit the directional characteristic, thus resulting in a redundant subspace with poor representation ability for those line pattern noise. To achieve a compact subspace for the line pattern structure, in this work, we incorporate a transformation into the image decomposition model so that maps the input image to a domain where the line pattern appearance has an extremely distinct low-rank structure, which naturally allows us to enforce a low-rank prior to extract the line pattern streak/stripe from the noisy image. Moreover, the random noise is usually mixed up with the line pattern noise, which makes the challenging problem much more difficult. While previous methods resort to the spectral or temporal correlation of the multi-images, we give a detailed analysis between the noisy and clean image in both local gradient and nonlocal domain, and propose a compositional directional total variational and low-rank prior for the image layer, thus to simultaneously accommodate both types of noise. The proposed method has been evaluated on two different tasks, including remote sensing image mixed random-stripe noise removal and rain streak removal, all of which obtain very impressive performances.
本文研究了单幅图像的线纹噪声去除问题,如雨条纹、高光谱条纹等。以往的方法大多是在原始图像域对线纹噪声进行建模,但没有明确地利用方向特征,导致多余的子空间对线纹噪声的表示能力较差。为了实现线条图案结构的紧凑子空间,在这项工作中,我们将一个转换合并到图像分解模型中,以便将输入图像映射到线条图案外观具有非常明显的低秩结构的域,这自然允许我们在从噪声图像中提取线条图案条纹/条纹之前强制执行低秩。此外,随机噪声通常与线形噪声混合在一起,这使得具有挑战性的问题变得更加困难。与以往的方法依赖于多幅图像的光谱或时间相关性不同,本文在局部梯度域和非局部梯度域对噪声图像和干净图像进行了详细的分析,并提出了图像层的合成方向全变分和低秩先验,从而同时适应两种类型的噪声。本文提出的方法在遥感图像混合随机条纹噪声去除和雨纹去除两个不同的任务上进行了测试,均获得了令人印象深刻的效果。
{"title":"Transformed Low-Rank Model for Line Pattern Noise Removal","authors":"Yi Chang, Luxin Yan, Sheng Zhong","doi":"10.1109/ICCV.2017.191","DOIUrl":"https://doi.org/10.1109/ICCV.2017.191","url":null,"abstract":"This paper addresses the problem of line pattern noise removal from a single image, such as rain streak, hyperspectral stripe and so on. Most of the previous methods model the line pattern noise in original image domain, which fail to explicitly exploit the directional characteristic, thus resulting in a redundant subspace with poor representation ability for those line pattern noise. To achieve a compact subspace for the line pattern structure, in this work, we incorporate a transformation into the image decomposition model so that maps the input image to a domain where the line pattern appearance has an extremely distinct low-rank structure, which naturally allows us to enforce a low-rank prior to extract the line pattern streak/stripe from the noisy image. Moreover, the random noise is usually mixed up with the line pattern noise, which makes the challenging problem much more difficult. While previous methods resort to the spectral or temporal correlation of the multi-images, we give a detailed analysis between the noisy and clean image in both local gradient and nonlocal domain, and propose a compositional directional total variational and low-rank prior for the image layer, thus to simultaneously accommodate both types of noise. The proposed method has been evaluated on two different tasks, including remote sensing image mixed random-stripe noise removal and rain streak removal, all of which obtain very impressive performances.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"649 1","pages":"1735-1743"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77670966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 125
Image Super-Resolution Using Dense Skip Connections 使用密集跳过连接的图像超分辨率
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.514
T. Tong, Gen Li, Xiejie Liu, Qinquan Gao
Recent studies have shown that the performance of single-image super-resolution methods can be significantly boosted by using deep convolutional neural networks. In this study, we present a novel single-image super-resolution method by introducing dense skip connections in a very deep network. In the proposed network, the feature maps of each layer are propagated into all subsequent layers, providing an effective way to combine the low-level features and high-level features to boost the reconstruction performance. In addition, the dense skip connections in the network enable short paths to be built directly from the output to each layer, alleviating the vanishing-gradient problem of very deep networks. Moreover, deconvolution layers are integrated into the network to learn the upsampling filters and to speedup the reconstruction process. Further, the proposed method substantially reduces the number of parameters, enhancing the computational efficiency. We evaluate the proposed method using images from four benchmark datasets and set a new state of the art.
最近的研究表明,使用深度卷积神经网络可以显著提高单图像超分辨率方法的性能。在这项研究中,我们提出了一种新的单图像超分辨率方法,通过在一个非常深的网络中引入密集的跳跃连接。在该网络中,每一层的特征映射被传播到所有后续层,提供了一种有效的方法来结合低级特征和高级特征来提高重建性能。此外,网络中密集的跳跃连接使得从输出到每一层可以直接建立短路径,缓解了非常深的网络的梯度消失问题。此外,在网络中加入反卷积层来学习上采样滤波器,加快重建过程。此外,该方法大大减少了参数的数量,提高了计算效率。我们使用来自四个基准数据集的图像来评估所提出的方法,并设置了一个新的技术状态。
{"title":"Image Super-Resolution Using Dense Skip Connections","authors":"T. Tong, Gen Li, Xiejie Liu, Qinquan Gao","doi":"10.1109/ICCV.2017.514","DOIUrl":"https://doi.org/10.1109/ICCV.2017.514","url":null,"abstract":"Recent studies have shown that the performance of single-image super-resolution methods can be significantly boosted by using deep convolutional neural networks. In this study, we present a novel single-image super-resolution method by introducing dense skip connections in a very deep network. In the proposed network, the feature maps of each layer are propagated into all subsequent layers, providing an effective way to combine the low-level features and high-level features to boost the reconstruction performance. In addition, the dense skip connections in the network enable short paths to be built directly from the output to each layer, alleviating the vanishing-gradient problem of very deep networks. Moreover, deconvolution layers are integrated into the network to learn the upsampling filters and to speedup the reconstruction process. Further, the proposed method substantially reduces the number of parameters, enhancing the computational efficiency. We evaluate the proposed method using images from four benchmark datasets and set a new state of the art.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"6 1","pages":"4809-4817"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78333469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 989
High Order Tensor Formulation for Convolutional Sparse Coding 卷积稀疏编码的高阶张量公式
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.197
Adel Bibi, Bernard Ghanem
Convolutional sparse coding (CSC) has gained attention for its successful role as a reconstruction and a classification tool in the computer vision and machine learning community. Current CSC methods can only reconstruct singlefeature 2D images independently. However, learning multidimensional dictionaries and sparse codes for the reconstruction of multi-dimensional data is very important, as it examines correlations among all the data jointly. This provides more capacity for the learned dictionaries to better reconstruct data. In this paper, we propose a generic and novel formulation for the CSC problem that can handle an arbitrary order tensor of data. Backed with experimental results, our proposed formulation can not only tackle applications that are not possible with standard CSC solvers, including colored video reconstruction (5D- tensors), but it also performs favorably in reconstruction with much fewer parameters as compared to naive extensions of standard CSC to multiple features/channels.
卷积稀疏编码(CSC)作为一种重建和分类工具在计算机视觉和机器学习领域获得了广泛的关注。目前的CSC方法只能独立地重建单一特征的二维图像。然而,学习多维字典和稀疏代码对于多维数据的重建是非常重要的,因为它共同检查了所有数据之间的相关性。这为学习到的字典提供了更大的容量来更好地重构数据。在本文中,我们提出了一个通用的、新颖的CSC问题的公式,它可以处理任意阶数据张量。在实验结果的支持下,我们提出的公式不仅可以解决标准CSC求解器无法解决的应用,包括彩色视频重建(5D张量),而且与标准CSC的幼稚扩展到多个特征/通道相比,它在参数少得多的重建中也表现良好。
{"title":"High Order Tensor Formulation for Convolutional Sparse Coding","authors":"Adel Bibi, Bernard Ghanem","doi":"10.1109/ICCV.2017.197","DOIUrl":"https://doi.org/10.1109/ICCV.2017.197","url":null,"abstract":"Convolutional sparse coding (CSC) has gained attention for its successful role as a reconstruction and a classification tool in the computer vision and machine learning community. Current CSC methods can only reconstruct singlefeature 2D images independently. However, learning multidimensional dictionaries and sparse codes for the reconstruction of multi-dimensional data is very important, as it examines correlations among all the data jointly. This provides more capacity for the learned dictionaries to better reconstruct data. In this paper, we propose a generic and novel formulation for the CSC problem that can handle an arbitrary order tensor of data. Backed with experimental results, our proposed formulation can not only tackle applications that are not possible with standard CSC solvers, including colored video reconstruction (5D- tensors), but it also performs favorably in reconstruction with much fewer parameters as compared to naive extensions of standard CSC to multiple features/channels.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"77 1","pages":"1790-1798"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76660225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal 龙卷风:一个时空卷积回归网络的视频动作建议
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.619
Hongyuan Zhu, Romain Vial, Shijian Lu
Given a video clip, action proposal aims to quickly generate a number of spatio-temporal tubes that enclose candidate human activities. Recently, the regression-based networks and long-term recurrent convolutional network (L-RCN) have demonstrated superior performance in object detection and action recognition. However, the regression-based detectors perform inference without considering the temporal context among neighboring frames, and the LRC-N using global visual percepts lacks the capability to capture local temporal dynamics. In this paper, we present a novel framework called TORNADO for human action proposal detection in un-trimmed video clips. Specifically, we propose a spatio-temporal convolutional network that combines the advantages of regression-based detector and L-RCN by empowering Convolutional LSTM with regression capability. Our approach consists of a temporal convolutional regression network (T-CRN) and a spatial regression network (S-CRN) which are trained end-to-end on both RGB and optical flow streams. They fuse appearance, motion and temporal contexts to regress the bounding boxes of candidate human actions simultaneously in 28 FPS. The action proposals are constructed by solving dynamic programming with peak trimming of the generated action boxes. Extensive experiments on the challenging UCF-101 and UCF-Sports datasets show that our method achieves superior performance as compared with the state-of-the-arts.
给定一个视频片段,行动提案旨在快速生成一些包含候选人类活动的时空管。近年来,基于回归的网络和长期递归卷积网络(L-RCN)在目标检测和动作识别方面表现出了优异的性能。然而,基于回归的检测器执行推理时没有考虑相邻帧之间的时间上下文,并且使用全局视觉感知的LRC-N缺乏捕获局部时间动态的能力。在本文中,我们提出了一个名为TORNADO的新框架,用于在未修剪的视频片段中检测人类动作建议。具体来说,我们提出了一个时空卷积网络,它结合了基于回归的检测器和L-RCN的优点,赋予卷积LSTM回归能力。我们的方法包括一个时间卷积回归网络(T-CRN)和一个空间回归网络(S-CRN),它们在RGB和光流上进行端到端训练。它们融合了外观,动作和时间背景,从而在28 FPS中同时还原候选人类行动的边界框。通过对生成的动作框进行削峰处理,求解动态规划,构造动作建议。在具有挑战性的UCF-101和UCF-Sports数据集上进行的大量实验表明,与最先进的方法相比,我们的方法具有优越的性能。
{"title":"TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal","authors":"Hongyuan Zhu, Romain Vial, Shijian Lu","doi":"10.1109/ICCV.2017.619","DOIUrl":"https://doi.org/10.1109/ICCV.2017.619","url":null,"abstract":"Given a video clip, action proposal aims to quickly generate a number of spatio-temporal tubes that enclose candidate human activities. Recently, the regression-based networks and long-term recurrent convolutional network (L-RCN) have demonstrated superior performance in object detection and action recognition. However, the regression-based detectors perform inference without considering the temporal context among neighboring frames, and the LRC-N using global visual percepts lacks the capability to capture local temporal dynamics. In this paper, we present a novel framework called TORNADO for human action proposal detection in un-trimmed video clips. Specifically, we propose a spatio-temporal convolutional network that combines the advantages of regression-based detector and L-RCN by empowering Convolutional LSTM with regression capability. Our approach consists of a temporal convolutional regression network (T-CRN) and a spatial regression network (S-CRN) which are trained end-to-end on both RGB and optical flow streams. They fuse appearance, motion and temporal contexts to regress the bounding boxes of candidate human actions simultaneously in 28 FPS. The action proposals are constructed by solving dynamic programming with peak trimming of the generated action boxes. Extensive experiments on the challenging UCF-101 and UCF-Sports datasets show that our method achieves superior performance as compared with the state-of-the-arts.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"35 1","pages":"5814-5822"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77093847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
Space-Time Localization and Mapping 时空定位与映射
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.422
Minhaeng Lee, Charless C. Fowlkes
This paper addresses the problem of building a spatiotemporal model of the world from a stream of time-stamped data. Unlike traditional models for simultaneous localization and mapping (SLAM) and structure-from-motion (SfM) which focus on recovering a single rigid 3D model, we tackle the problem of mapping scenes in which dynamic components appear, move and disappear independently of each other over time. We introduce a simple generative probabilistic model of 4D structure which specifies location, spatial and temporal extent of rigid surface patches by local Gaussian mixtures. We fit this model to a time-stamped stream of input data using expectation-maximization to estimate the model structure parameters (mapping) and the alignment of the input data to the model (localization). By explicitly representing the temporal extent and observability of surfaces in a scene, our method yields superior localization and reconstruction relative to baselines that assume a static 3D scene. We carry out experiments on both synthetic RGB-D data streams as well as challenging real-world datasets, tracking scene dynamics in a human workspace over the course of several weeks.
本文解决了从时间戳数据流构建世界时空模型的问题。与传统的同步定位和映射(SLAM)和运动结构(SfM)模型不同,我们解决了映射场景的问题,其中动态组件随着时间的推移彼此独立地出现、移动和消失。我们引入了一种简单的四维结构生成概率模型,该模型通过局部高斯混合来指定刚性表面斑块的位置、空间和时间范围。我们使用期望最大化来估计模型结构参数(映射)和输入数据与模型的对齐(定位),从而将该模型拟合到带有时间戳的输入数据流中。通过明确表示场景中表面的时间范围和可观察性,我们的方法相对于假设静态3D场景的基线产生了更好的定位和重建。我们在合成RGB-D数据流和具有挑战性的现实世界数据集上进行实验,在几个星期的时间里跟踪人类工作空间中的场景动态。
{"title":"Space-Time Localization and Mapping","authors":"Minhaeng Lee, Charless C. Fowlkes","doi":"10.1109/ICCV.2017.422","DOIUrl":"https://doi.org/10.1109/ICCV.2017.422","url":null,"abstract":"This paper addresses the problem of building a spatiotemporal model of the world from a stream of time-stamped data. Unlike traditional models for simultaneous localization and mapping (SLAM) and structure-from-motion (SfM) which focus on recovering a single rigid 3D model, we tackle the problem of mapping scenes in which dynamic components appear, move and disappear independently of each other over time. We introduce a simple generative probabilistic model of 4D structure which specifies location, spatial and temporal extent of rigid surface patches by local Gaussian mixtures. We fit this model to a time-stamped stream of input data using expectation-maximization to estimate the model structure parameters (mapping) and the alignment of the input data to the model (localization). By explicitly representing the temporal extent and observability of surfaces in a scene, our method yields superior localization and reconstruction relative to baselines that assume a static 3D scene. We carry out experiments on both synthetic RGB-D data streams as well as challenging real-world datasets, tracking scene dynamics in a human workspace over the course of several weeks.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"78 1","pages":"3932-3941"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83233708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2017 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1