首页 > 最新文献

2017 IEEE International Conference on Computer Vision (ICCV)最新文献

英文 中文
Robust Hand Pose Estimation during the Interaction with an Unknown Object 与未知物体交互过程中手部姿态的鲁棒估计
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.339
Chiho Choi, S. Yoon, China Chen, K. Ramani
This paper proposes a robust solution for accurate 3D hand pose estimation in the presence of an external object interacting with hands. Our main insight is that the shape of an object causes a configuration of the hand in the form of a hand grasp. Along this line, we simultaneously train deep neural networks using paired depth images. The object-oriented network learns functional grasps from an object perspective, whereas the hand-oriented network explores the details of hand configurations from a hand perspective. The two networks share intermediate observations produced from different perspectives to create a more informed representation. Our system then collaboratively classifies the grasp types and orientation of the hand and further constrains a pose space using these estimates. Finally, we collectively refine the unknown pose parameters to reconstruct the final hand pose. To this end, we conduct extensive evaluations to validate the efficacy of the proposed collaborative learning approach by comparing it with self-generated baselines and the state-of-the-art method.
本文提出了一种鲁棒的解决方案,用于在外部物体与手相互作用的情况下准确估计三维手的姿态。我们的主要见解是,物体的形状导致手以手抓的形式配置。沿着这条线,我们同时使用成对深度图像训练深度神经网络。面向对象的网络从对象的角度学习功能掌握,而面向手的网络从手的角度探索手的配置细节。这两个网络共享从不同角度产生的中间观察结果,以创建更知情的表示。然后,我们的系统协同分类手的抓取类型和方向,并使用这些估计进一步约束姿态空间。最后,我们共同对未知的姿态参数进行细化,重建最终的手部姿态。为此,我们进行了广泛的评估,通过将所提出的协作学习方法与自生成基线和最先进的方法进行比较,来验证其有效性。
{"title":"Robust Hand Pose Estimation during the Interaction with an Unknown Object","authors":"Chiho Choi, S. Yoon, China Chen, K. Ramani","doi":"10.1109/ICCV.2017.339","DOIUrl":"https://doi.org/10.1109/ICCV.2017.339","url":null,"abstract":"This paper proposes a robust solution for accurate 3D hand pose estimation in the presence of an external object interacting with hands. Our main insight is that the shape of an object causes a configuration of the hand in the form of a hand grasp. Along this line, we simultaneously train deep neural networks using paired depth images. The object-oriented network learns functional grasps from an object perspective, whereas the hand-oriented network explores the details of hand configurations from a hand perspective. The two networks share intermediate observations produced from different perspectives to create a more informed representation. Our system then collaboratively classifies the grasp types and orientation of the hand and further constrains a pose space using these estimates. Finally, we collectively refine the unknown pose parameters to reconstruct the final hand pose. To this end, we conduct extensive evaluations to validate the efficacy of the proposed collaborative learning approach by comparing it with self-generated baselines and the state-of-the-art method.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"56 1","pages":"3142-3151"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79829979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks 伪三维残差网络的时空表征学习
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.590
Zhaofan Qiu, Ting Yao, Tao Mei
Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for image recognition problems. Nevertheless, it is not trivial when utilizing a CNN for learning spatio-temporal video representation. A few studies have shown that performing 3D convolutions is a rewarding approach to capture both spatial and temporal dimensions in videos. However, the development of a very deep 3D CNN from scratch results in expensive computational cost and memory demand. A valid question is why not recycle off-the-shelf 2D networks for a 3D CNN. In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating 3 x 3 x 3 convolutions with 1 × 3 × 3 convolutional filters on spatial domain (equivalent to 2D CNN) plus 3 × 1 × 1 convolutions to construct temporal connections on adjacent feature maps in time. Furthermore, we propose a new architecture, named Pseudo-3D Residual Net (P3D ResNet), that exploits all the variants of blocks but composes each in different placement of ResNet, following the philosophy that enhancing structural diversity with going deep could improve the power of neural networks. Our P3D ResNet achieves clear improvements on Sports-1M video classification dataset against 3D CNN and frame-based 2D CNN by 5.3% and 1.8%, respectively. We further examine the generalization performance of video representation produced by our pre-trained P3D ResNet on five different benchmarks and three different tasks, demonstrating superior performances over several state-of-the-art techniques.
卷积神经网络(CNN)被认为是图像识别问题的一个强大的模型类别。然而,当使用CNN来学习时空视频表示时,这并不是微不足道的。一些研究表明,执行3D卷积是捕获视频中空间和时间维度的有益方法。然而,从零开始开发一个非常深的3D CNN会导致昂贵的计算成本和内存需求。一个有效的问题是,为什么不回收现成的2D网络来制作3D CNN。在本文中,我们在残差学习框架中设计了瓶颈构建块的多种变体,通过在空间域上使用1 × 3 × 3卷积滤波器(相当于2D CNN)模拟3 × 3 × 3卷积,再加上3 × 1 × 1卷积,在相邻的特征映射上及时构建时间连接。此外,我们提出了一种名为Pseudo-3D Residual Net (P3D ResNet)的新架构,它利用了所有块的变体,但将每个块组合在ResNet的不同位置,遵循通过深入增强结构多样性可以提高神经网络能力的理念。与3D CNN和基于帧的2D CNN相比,我们的P3D ResNet在Sports-1M视频分类数据集上分别实现了5.3%和1.8%的明显改进。我们进一步研究了我们的预训练P3D ResNet在五个不同的基准和三个不同的任务上产生的视频表示的泛化性能,展示了几种最先进的技术的卓越性能。
{"title":"Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks","authors":"Zhaofan Qiu, Ting Yao, Tao Mei","doi":"10.1109/ICCV.2017.590","DOIUrl":"https://doi.org/10.1109/ICCV.2017.590","url":null,"abstract":"Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for image recognition problems. Nevertheless, it is not trivial when utilizing a CNN for learning spatio-temporal video representation. A few studies have shown that performing 3D convolutions is a rewarding approach to capture both spatial and temporal dimensions in videos. However, the development of a very deep 3D CNN from scratch results in expensive computational cost and memory demand. A valid question is why not recycle off-the-shelf 2D networks for a 3D CNN. In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating 3 x 3 x 3 convolutions with 1 × 3 × 3 convolutional filters on spatial domain (equivalent to 2D CNN) plus 3 × 1 × 1 convolutions to construct temporal connections on adjacent feature maps in time. Furthermore, we propose a new architecture, named Pseudo-3D Residual Net (P3D ResNet), that exploits all the variants of blocks but composes each in different placement of ResNet, following the philosophy that enhancing structural diversity with going deep could improve the power of neural networks. Our P3D ResNet achieves clear improvements on Sports-1M video classification dataset against 3D CNN and frame-based 2D CNN by 5.3% and 1.8%, respectively. We further examine the generalization performance of video representation produced by our pre-trained P3D ResNet on five different benchmarks and three different tasks, demonstrating superior performances over several state-of-the-art techniques.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"7 1","pages":"5534-5542"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80574257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1420
Personalized Image Aesthetics 个性化形象美学
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.76
Jian Ren, Xiaohui Shen, Zhe L. Lin, R. Mech, D. Foran
Automatic image aesthetics rating has received a growing interest with the recent breakthrough in deep learning. Although many studies exist for learning a generic or universal aesthetics model, investigation of aesthetics models incorporating individual user’s preference is quite limited. We address this personalized aesthetics problem by showing that individual’s aesthetic preferences exhibit strong correlations with content and aesthetic attributes, and hence the deviation of individual’s perception from generic image aesthetics is predictable. To accommodate our study, we first collect two distinct datasets, a large image dataset from Flickr and annotated by Amazon Mechanical Turk, and a small dataset of real personal albums rated by owners. We then propose a new approach to personalized aesthetics learning that can be trained even with a small set of annotated images from a user. The approach is based on a residual-based model adaptation scheme which learns an offset to compensate for the generic aesthetics score. Finally, we introduce an active learning algorithm to optimize personalized aesthetics prediction for real-world application scenarios. Experiments demonstrate that our approach can effectively learn personalized aesthetics preferences, and outperforms existing methods on quantitative comparisons.
随着近年来深度学习的突破,自动图像美学评价受到了越来越多的关注。尽管已有许多研究致力于学习通用或通用的美学模型,但对包含个体用户偏好的美学模型的研究却相当有限。我们通过展示个人的审美偏好与内容和审美属性表现出强烈的相关性来解决这一个性化美学问题,因此个人对一般图像美学的感知偏差是可以预测的。为了适应我们的研究,我们首先收集了两个不同的数据集,一个来自Flickr并由Amazon Mechanical Turk注释的大型图像数据集,以及一个由所有者评级的真实个人相册的小数据集。然后,我们提出了一种个性化美学学习的新方法,即使使用一小组来自用户的注释图像也可以进行训练。该方法基于基于残差的模型自适应方案,该方案学习偏移量来补偿通用美学评分。最后,我们引入了一种主动学习算法来优化个性化美学预测,以适应现实应用场景。实验表明,我们的方法可以有效地学习个性化的审美偏好,并且在定量比较方面优于现有的方法。
{"title":"Personalized Image Aesthetics","authors":"Jian Ren, Xiaohui Shen, Zhe L. Lin, R. Mech, D. Foran","doi":"10.1109/ICCV.2017.76","DOIUrl":"https://doi.org/10.1109/ICCV.2017.76","url":null,"abstract":"Automatic image aesthetics rating has received a growing interest with the recent breakthrough in deep learning. Although many studies exist for learning a generic or universal aesthetics model, investigation of aesthetics models incorporating individual user’s preference is quite limited. We address this personalized aesthetics problem by showing that individual’s aesthetic preferences exhibit strong correlations with content and aesthetic attributes, and hence the deviation of individual’s perception from generic image aesthetics is predictable. To accommodate our study, we first collect two distinct datasets, a large image dataset from Flickr and annotated by Amazon Mechanical Turk, and a small dataset of real personal albums rated by owners. We then propose a new approach to personalized aesthetics learning that can be trained even with a small set of annotated images from a user. The approach is based on a residual-based model adaptation scheme which learns an offset to compensate for the generic aesthetics score. Finally, we introduce an active learning algorithm to optimize personalized aesthetics prediction for real-world application scenarios. Experiments demonstrate that our approach can effectively learn personalized aesthetics preferences, and outperforms existing methods on quantitative comparisons.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"17 1","pages":"638-647"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89508891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 86
Going Unconstrained with Rolling Shutter Deblurring 不受约束地使用滚动快门去模糊
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.432
R. MaheshMohanM., A. Rajagopalan
Most present-day imaging devices are equipped with CMOS sensors. Motion blur is a common artifact in handheld cameras. Because CMOS sensors mostly employ a rolling shutter (RS), the motion deblurring problem takes on a new dimension. Although few works have recently addressed this problem, they suffer from many constraints including heavy computational cost, need for precise sensor information, and inability to deal with wide-angle systems (which most cell-phone and drone cameras are) and irregular camera trajectory. In this work, we propose a model for RS blind motion deblurring that mitigates these issues significantly. Comprehensive comparisons with state-of-the-art methods reveal that our approach not only exhibits significant computational gains and unconstrained functionality but also leads to improved deblurring performance.
目前大多数成像设备都配备了CMOS传感器。运动模糊是手持相机中常见的人工制品。由于CMOS传感器大多采用滚动快门(RS),运动去模糊问题就有了一个新的维度。尽管最近解决这个问题的工作很少,但它们受到许多限制,包括沉重的计算成本,需要精确的传感器信息,无法处理广角系统(大多数手机和无人机相机都是)和不规则的相机轨迹。在这项工作中,我们提出了一个RS盲运动去模糊模型,显著减轻了这些问题。与最先进的方法的综合比较表明,我们的方法不仅表现出显着的计算增益和不受约束的功能,而且还导致改进的去模糊性能。
{"title":"Going Unconstrained with Rolling Shutter Deblurring","authors":"R. MaheshMohanM., A. Rajagopalan","doi":"10.1109/ICCV.2017.432","DOIUrl":"https://doi.org/10.1109/ICCV.2017.432","url":null,"abstract":"Most present-day imaging devices are equipped with CMOS sensors. Motion blur is a common artifact in handheld cameras. Because CMOS sensors mostly employ a rolling shutter (RS), the motion deblurring problem takes on a new dimension. Although few works have recently addressed this problem, they suffer from many constraints including heavy computational cost, need for precise sensor information, and inability to deal with wide-angle systems (which most cell-phone and drone cameras are) and irregular camera trajectory. In this work, we propose a model for RS blind motion deblurring that mitigates these issues significantly. Comprehensive comparisons with state-of-the-art methods reveal that our approach not only exhibits significant computational gains and unconstrained functionality but also leads to improved deblurring performance.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"12 1","pages":"4030-4038"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86680476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A Microfacet-Based Reflectance Model for Photometric Stereo with Highly Specular Surfaces 基于微面的高镜面光度计立体反射模型
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.343
Lixiong Chen, Yinqiang Zheng, Boxin Shi, Art Subpa-Asa, Imari Sato
A precise, stable and invertible model for surface reflectance is the key to the success of photometric stereo with real world materials. Recent developments in the field have enabled shape recovery techniques for surfaces of various types, but an effective solution to directly estimating the surface normal in the presence of highly specular reflectance remains elusive. In this paper, we derive an analytical isotropic microfacet-based reflectance model, based on which a physically interpretable approximate is tailored for highly specular surfaces. With this approximate, we identify the equivalence between the surface recovery problem and the ellipsoid of revolution fitting problem, where the latter can be described as a system of polynomials. Additionally, we devise a fast, non-iterative and globally optimal solver for this problem. Experimental results on both synthetic and real images validate our model and demonstrate that our solution can stably deliver superior performance in its targeted application domain.
一个精确、稳定和可逆的表面反射率模型是利用真实材料进行光度立体成像成功的关键。该领域的最新发展使各种类型表面的形状恢复技术成为可能,但在高镜面反射存在的情况下,直接估计表面法线的有效解决方案仍然难以捉摸。在本文中,我们推导了一个基于解析各向同性微面的反射模型,在此基础上,为高镜面定制了一个物理可解释的近似。通过这种近似,我们确定了表面恢复问题与椭球旋转拟合问题之间的等价性,其中后者可以被描述为多项式系统。此外,我们还设计了一个快速、非迭代、全局最优的求解器。在合成图像和真实图像上的实验结果验证了我们的模型,并表明我们的解决方案可以在目标应用领域稳定地提供优越的性能。
{"title":"A Microfacet-Based Reflectance Model for Photometric Stereo with Highly Specular Surfaces","authors":"Lixiong Chen, Yinqiang Zheng, Boxin Shi, Art Subpa-Asa, Imari Sato","doi":"10.1109/ICCV.2017.343","DOIUrl":"https://doi.org/10.1109/ICCV.2017.343","url":null,"abstract":"A precise, stable and invertible model for surface reflectance is the key to the success of photometric stereo with real world materials. Recent developments in the field have enabled shape recovery techniques for surfaces of various types, but an effective solution to directly estimating the surface normal in the presence of highly specular reflectance remains elusive. In this paper, we derive an analytical isotropic microfacet-based reflectance model, based on which a physically interpretable approximate is tailored for highly specular surfaces. With this approximate, we identify the equivalence between the surface recovery problem and the ellipsoid of revolution fitting problem, where the latter can be described as a system of polynomials. Additionally, we devise a fast, non-iterative and globally optimal solver for this problem. Experimental results on both synthetic and real images validate our model and demonstrate that our solution can stably deliver superior performance in its targeted application domain.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"96 1","pages":"3181-3189"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85862376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Surface Registration via Foliation 通过叶理进行表面配准
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.107
Xiaopeng Zheng, Chengfeng Wen, Na Lei, Ming Ma, X. Gu
This work introduces a novel surface registration method based on foliation. A foliation decomposes the surface into a family of closed loops, such that the decomposition has local tensor product structure. By projecting each loop to a point, the surface is collapsed into a graph. Two homeomorphic surfaces with consistent foliations can be registered by first matching their foliation graphs, then matching the corresponding leaves.,,This foliation based method is capable of handling surfaces with complicated topologies and large non-isometric deformations, rigorous with solid theoretic foundation, easy to implement, robust to compute. The result mapping is diffeomorphic. Our experimental results show the efficiency and efficacy of the proposed method.
本文介绍了一种新的基于面理的表面配准方法。叶理将曲面分解为一组闭环,使得分解具有局部张量积结构。通过将每个回路投射到一个点,曲面被折叠成一个图。两个具有一致叶的同胚曲面可以先匹配它们的叶图,然后匹配相应的叶。这种基于叶理的方法能够处理拓扑复杂、非等距变形大的曲面,理论基础扎实、严谨、易于实现、计算鲁棒。结果映射是微分同构的。实验结果表明了该方法的有效性和有效性。
{"title":"Surface Registration via Foliation","authors":"Xiaopeng Zheng, Chengfeng Wen, Na Lei, Ming Ma, X. Gu","doi":"10.1109/ICCV.2017.107","DOIUrl":"https://doi.org/10.1109/ICCV.2017.107","url":null,"abstract":"This work introduces a novel surface registration method based on foliation. A foliation decomposes the surface into a family of closed loops, such that the decomposition has local tensor product structure. By projecting each loop to a point, the surface is collapsed into a graph. Two homeomorphic surfaces with consistent foliations can be registered by first matching their foliation graphs, then matching the corresponding leaves.,,This foliation based method is capable of handling surfaces with complicated topologies and large non-isometric deformations, rigorous with solid theoretic foundation, easy to implement, robust to compute. The result mapping is diffeomorphic. Our experimental results show the efficiency and efficacy of the proposed method.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"15 1","pages":"938-947"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83765385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
VegFru: A Domain-Specific Dataset for Fine-Grained Visual Categorization VegFru:用于细粒度视觉分类的特定领域数据集
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.66
Saihui Hou, Yushan Feng, Zilei Wang
In this paper, we propose a novel domain-specific dataset named VegFru for fine-grained visual categorization (FGVC). While the existing datasets for FGVC are mainly focused on animal breeds or man-made objects with limited labelled data, VegFru is a larger dataset consisting of vegetables and fruits which are closely associated with the daily life of everyone. Aiming at domestic cooking and food management, VegFru categorizes vegetables and fruits according to their eating characteristics, and each image contains at least one edible part of vegetables or fruits with the same cooking usage. Particularly, all the images are labelled hierarchically. The current version covers vegetables and fruits of 25 upper-level categories and 292 subordinate classes. And it contains more than 160,000 images in total and at least 200 images for each subordinate class. Accompanying the dataset, we also propose an effective framework called HybridNet to exploit the label hierarchy for FGVC. Specifically, multiple granularity features are first extracted by dealing with the hierarchical labels separately. And then they are fused through explicit operation, e.g., Compact Bilinear Pooling, to form a unified representation for the ultimate recognition. The experimental results on the novel VegFru, the public FGVC-Aircraft and CUB-200-2011 indicate that HybridNet achieves one of the top performance on these datasets. The dataset and code are available at https://github.com/ustc-vim/vegfru.
在本文中,我们提出了一个新的领域特定数据集VegFru用于细粒度视觉分类(FGVC)。现有的FGVC数据集主要集中在动物品种或人造物体上,标签数据有限,而VegFru是一个更大的数据集,包括与每个人的日常生活密切相关的蔬菜和水果。VegFru以家庭烹饪和食品管理为目标,根据蔬菜和水果的食用特征进行分类,每张图片至少包含一个烹饪方法相同的蔬菜或水果的可食用部分。特别地,所有的图像都是分层标记的。目前的版本涵盖了蔬菜和水果的25个上级类和292个下级类。它总共包含超过16万张图片,每个从属类至少包含200张图片。伴随着数据集,我们还提出了一个称为HybridNet的有效框架来利用FGVC的标签层次结构。具体来说,首先通过分别处理分层标签来提取多个粒度特征。然后通过显式运算(如Compact Bilinear Pooling)将它们融合,形成一个统一的表示,用于最终识别。在新型VegFru、公共FGVC-Aircraft和CUB-200-2011上的实验结果表明,HybridNet在这些数据集上取得了最好的性能之一。数据集和代码可在https://github.com/ustc-vim/vegfru上获得。
{"title":"VegFru: A Domain-Specific Dataset for Fine-Grained Visual Categorization","authors":"Saihui Hou, Yushan Feng, Zilei Wang","doi":"10.1109/ICCV.2017.66","DOIUrl":"https://doi.org/10.1109/ICCV.2017.66","url":null,"abstract":"In this paper, we propose a novel domain-specific dataset named VegFru for fine-grained visual categorization (FGVC). While the existing datasets for FGVC are mainly focused on animal breeds or man-made objects with limited labelled data, VegFru is a larger dataset consisting of vegetables and fruits which are closely associated with the daily life of everyone. Aiming at domestic cooking and food management, VegFru categorizes vegetables and fruits according to their eating characteristics, and each image contains at least one edible part of vegetables or fruits with the same cooking usage. Particularly, all the images are labelled hierarchically. The current version covers vegetables and fruits of 25 upper-level categories and 292 subordinate classes. And it contains more than 160,000 images in total and at least 200 images for each subordinate class. Accompanying the dataset, we also propose an effective framework called HybridNet to exploit the label hierarchy for FGVC. Specifically, multiple granularity features are first extracted by dealing with the hierarchical labels separately. And then they are fused through explicit operation, e.g., Compact Bilinear Pooling, to form a unified representation for the ultimate recognition. The experimental results on the novel VegFru, the public FGVC-Aircraft and CUB-200-2011 indicate that HybridNet achieves one of the top performance on these datasets. The dataset and code are available at https://github.com/ustc-vim/vegfru.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"68 1","pages":"541-549"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84197544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 84
DeepCD: Learning Deep Complementary Descriptors for Patch Representations DeepCD:学习补丁表示的深度互补描述符
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.359
Tsun-Yi Yang, Jo-Han Hsu, Yen-Yu Lin, Yung-Yu Chuang
This paper presents the DeepCD framework which learns a pair of complementary descriptors jointly for image patch representation by employing deep learning techniques. It can be achieved by taking any descriptor learning architecture for learning a leading descriptor and augmenting the architecture with an additional network stream for learning a complementary descriptor. To enforce the complementary property, a new network layer, called data-dependent modulation (DDM) layer, is introduced for adaptively learning the augmented network stream with the emphasis on the training data that are not well handled by the leading stream. By optimizing the proposed joint loss function with late fusion, the obtained descriptors are complementary to each other and their fusion improves performance. Experiments on several problems and datasets show that the proposed method1 is simple yet effective, outperforming state-of-the-art methods.
本文提出了一种深度学习框架,该框架利用深度学习技术,共同学习一对互补描述符用于图像patch表示。它可以通过采用任何描述符学习体系结构来学习主要描述符,并使用额外的网络流来扩展该体系结构以学习补充描述符来实现。为了增强互补性,引入了一个新的网络层,称为数据依赖调制(DDM)层,用于自适应学习增强的网络流,重点是对未被领先流处理好的训练数据进行学习。通过对所提出的联合损失函数进行后期融合优化,得到的描述符相互补充,它们的融合提高了性能。在几个问题和数据集上的实验表明,所提出的方法1简单而有效,优于目前最先进的方法。
{"title":"DeepCD: Learning Deep Complementary Descriptors for Patch Representations","authors":"Tsun-Yi Yang, Jo-Han Hsu, Yen-Yu Lin, Yung-Yu Chuang","doi":"10.1109/ICCV.2017.359","DOIUrl":"https://doi.org/10.1109/ICCV.2017.359","url":null,"abstract":"This paper presents the DeepCD framework which learns a pair of complementary descriptors jointly for image patch representation by employing deep learning techniques. It can be achieved by taking any descriptor learning architecture for learning a leading descriptor and augmenting the architecture with an additional network stream for learning a complementary descriptor. To enforce the complementary property, a new network layer, called data-dependent modulation (DDM) layer, is introduced for adaptively learning the augmented network stream with the emphasis on the training data that are not well handled by the leading stream. By optimizing the proposed joint loss function with late fusion, the obtained descriptors are complementary to each other and their fusion improves performance. Experiments on several problems and datasets show that the proposed method1 is simple yet effective, outperforming state-of-the-art methods.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"52 1","pages":"3334-3342"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81053342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Region-Based Correspondence Between 3D Shapes via Spatially Smooth Biclustering 基于空间平滑双聚类的三维形状区域对应
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.457
M. Denitto, S. Melzi, M. Bicego, U. Castellani, A. Farinelli, Mário A. T. Figueiredo, Yanir Kleiman, M. Ovsjanikov
Region-based correspondence (RBC) is a highly relevant and non-trivial computer vision problem. Given two 3D shapes, RBC seeks segments/regions on these shapes that can be reliably put in correspondence. The problem thus consists both in finding the regions and determining the correspondences between them. This problem statement is similar to that of “biclustering ”, implying that RBC can be cast as a biclustering problem. Here, we exploit this implication by tackling RBC via a novel biclustering approach, called S4B (spatially smooth spike and slab biclustering), which: (i) casts the problem in a probabilistic low-rank matrix factorization perspective; (ii) uses a spike and slab prior to induce sparsity; (iii) is enriched with a spatial smoothness prior, based on geodesic distances, encouraging nearby vertices to belong to the same bicluster. This type of spatial prior cannot be used in classical biclustering techniques. We test the proposed approach on the FAUST dataset, outperforming both state-of-the-art RBC techniques and classical biclustering methods.
基于区域的对应(RBC)是一个高度相关且重要的计算机视觉问题。给定两个3D形状,RBC在这些形状上寻找可以可靠地对应的片段/区域。因此,问题既在于找到这些区域,又在于确定它们之间的对应关系。这个问题陈述类似于“双聚类”,这意味着RBC可以被视为一个双聚类问题。在这里,我们利用这一含义,通过一种新的双聚类方法来处理RBC,称为S4B(空间平滑尖峰和平板双聚类),它:(i)在概率低秩矩阵分解的角度来处理问题;(ii)在诱导稀疏之前使用尖钉和板;(iii)基于测地线距离丰富了空间平滑先验,鼓励附近的顶点属于相同的双聚类。这种类型的空间先验不能在经典的双聚类技术中使用。我们在FAUST数据集上测试了提出的方法,优于最先进的RBC技术和经典的双聚类方法。
{"title":"Region-Based Correspondence Between 3D Shapes via Spatially Smooth Biclustering","authors":"M. Denitto, S. Melzi, M. Bicego, U. Castellani, A. Farinelli, Mário A. T. Figueiredo, Yanir Kleiman, M. Ovsjanikov","doi":"10.1109/ICCV.2017.457","DOIUrl":"https://doi.org/10.1109/ICCV.2017.457","url":null,"abstract":"Region-based correspondence (RBC) is a highly relevant and non-trivial computer vision problem. Given two 3D shapes, RBC seeks segments/regions on these shapes that can be reliably put in correspondence. The problem thus consists both in finding the regions and determining the correspondences between them. This problem statement is similar to that of “biclustering ”, implying that RBC can be cast as a biclustering problem. Here, we exploit this implication by tackling RBC via a novel biclustering approach, called S4B (spatially smooth spike and slab biclustering), which: (i) casts the problem in a probabilistic low-rank matrix factorization perspective; (ii) uses a spike and slab prior to induce sparsity; (iii) is enriched with a spatial smoothness prior, based on geodesic distances, encouraging nearby vertices to belong to the same bicluster. This type of spatial prior cannot be used in classical biclustering techniques. We test the proposed approach on the FAUST dataset, outperforming both state-of-the-art RBC techniques and classical biclustering methods.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"93 1","pages":"4270-4279"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79433023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Monocular Video-Based Trailer Coupler Detection Using Multiplexer Convolutional Neural Network 基于多路卷积神经网络的单目视频拖车耦合器检测
Pub Date : 2017-10-01 DOI: 10.1109/ICCV.2017.584
Yousef Atoum, Joseph Roth, Michael Bliss, Wende Zhang, Xiaoming Liu
This paper presents an automated monocular-camera-based computer vision system for autonomous self-backing-up a vehicle towards a trailer, by continuously estimating the 3D trailer coupler position and feeding it to the vehicle control system, until the alignment of the tow hitch with the trailers coupler. This system is made possible through our proposed distance-driven Multiplexer-CNN method, which selects the most suitable CNN using the estimated coupler-to-vehicle distance. The input of the multiplexer is a group made of a CNN detector, trackers, and 3D localizer. In the CNN detector, we propose a novel algorithm to provide a presence confidence score with each detection. The score reflects the existence of the target object in a region, as well as how accurate is the 2D target detection. We demonstrate the accuracy and efficiency of the system on a large trailer database. Our system achieves an estimation error of 1.4 cm when the ball reaches the coupler, while running at 18.9 FPS on a regular PC.
本文提出了一种基于单目相机的自动计算机视觉系统,该系统通过不断估计3D拖车耦合器的位置并将其馈送给车辆控制系统,直到拖车钩与拖车耦合器对齐。该系统是通过我们提出的距离驱动的multipler -CNN方法实现的,该方法使用估计的耦合器到车辆的距离来选择最合适的CNN。多路复用器的输入是由CNN检测器、跟踪器和3D定位器组成的一组。在CNN检测器中,我们提出了一种新的算法,为每次检测提供存在置信度评分。分数反映了目标物体在一个区域内的存在程度,以及二维目标检测的准确性。我们在一个大型拖车数据库上验证了该系统的准确性和效率。当球到达耦合器时,我们的系统实现了1.4 cm的估计误差,而在普通PC上以18.9 FPS运行。
{"title":"Monocular Video-Based Trailer Coupler Detection Using Multiplexer Convolutional Neural Network","authors":"Yousef Atoum, Joseph Roth, Michael Bliss, Wende Zhang, Xiaoming Liu","doi":"10.1109/ICCV.2017.584","DOIUrl":"https://doi.org/10.1109/ICCV.2017.584","url":null,"abstract":"This paper presents an automated monocular-camera-based computer vision system for autonomous self-backing-up a vehicle towards a trailer, by continuously estimating the 3D trailer coupler position and feeding it to the vehicle control system, until the alignment of the tow hitch with the trailers coupler. This system is made possible through our proposed distance-driven Multiplexer-CNN method, which selects the most suitable CNN using the estimated coupler-to-vehicle distance. The input of the multiplexer is a group made of a CNN detector, trackers, and 3D localizer. In the CNN detector, we propose a novel algorithm to provide a presence confidence score with each detection. The score reflects the existence of the target object in a region, as well as how accurate is the 2D target detection. We demonstrate the accuracy and efficiency of the system on a large trailer database. Our system achieves an estimation error of 1.4 cm when the ball reaches the coupler, while running at 18.9 FPS on a regular PC.","PeriodicalId":6559,"journal":{"name":"2017 IEEE International Conference on Computer Vision (ICCV)","volume":"7 1","pages":"5478-5486"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81858465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2017 IEEE International Conference on Computer Vision (ICCV)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1