首页 > 最新文献

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)最新文献

英文 中文
Local Depth Edge Detection in Humans and Deep Neural Networks 人体局部深度边缘检测与深度神经网络
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.316
Krista A. Ehinger, E. Graf, W. Adams, J. Elder
Distinguishing edges caused by a change in depth from other types of edges is an important problem in early vision. We investigate the performance of humans and computer vision models on this task. We use spherical imagery with ground-truth LiDAR range data to build an objective ground-truth dataset for edge classification. We compare various computational models for classifying depth from non-depth edges in small images patches and achieve the best performance (86%) with a convolutional neural network. We investigate human performance on this task in a behavioral experiment and find that human performance is lower than the CNN. Although human and CNN depth responses are correlated, observers' responses are better predicted by other observers than by the CNN. The responses of CNNs and human observers also show a slightly different pattern of correlation with low-level edge cues, which suggests that CNNs and human observers may weight these features differently for classifying edges.
区分深度变化引起的边缘和其他类型的边缘是早期视觉中的一个重要问题。我们研究了人类和计算机视觉模型在这个任务上的表现。我们利用球面图像和真地激光雷达距离数据建立了一个客观的真地数据集,用于边缘分类。我们比较了各种计算模型在小图像斑块中从非深度边缘分类深度,并使用卷积神经网络获得了最佳性能(86%)。我们在一个行为实验中调查了人类在这个任务上的表现,发现人类的表现低于CNN。虽然人类和CNN的深度响应是相关的,但观察者的反应被其他观察者比CNN更好地预测。cnn和人类观察者的反应也显示出与低水平边缘线索的关联模式略有不同,这表明cnn和人类观察者在对边缘进行分类时可能对这些特征的权重不同。
{"title":"Local Depth Edge Detection in Humans and Deep Neural Networks","authors":"Krista A. Ehinger, E. Graf, W. Adams, J. Elder","doi":"10.1109/ICCVW.2017.316","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.316","url":null,"abstract":"Distinguishing edges caused by a change in depth from other types of edges is an important problem in early vision. We investigate the performance of humans and computer vision models on this task. We use spherical imagery with ground-truth LiDAR range data to build an objective ground-truth dataset for edge classification. We compare various computational models for classifying depth from non-depth edges in small images patches and achieve the best performance (86%) with a convolutional neural network. We investigate human performance on this task in a behavioral experiment and find that human performance is lower than the CNN. Although human and CNN depth responses are correlated, observers' responses are better predicted by other observers than by the CNN. The responses of CNNs and human observers also show a slightly different pattern of correlation with low-level edge cues, which suggests that CNNs and human observers may weight these features differently for classifying edges.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128928736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Margin Based Semi-Supervised Elastic Embedding for Face Image Analysis 基于边缘的半监督弹性嵌入人脸图像分析
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.156
F. Dornaika, Y. E. Traboulsi
This paper introduces a graph-based semi-supervised elastic embedding method as well as its kernelized version for face image embedding and classification. The proposed frameworks combines Flexible Manifold Embedding and non-linear graph based embedding for semi-supervised learning. In both proposed methods, the nonlinear manifold and the mapping (linear transform for the linear method and the kernel multipliers for the kernelized method) are simultaneously estimated, which overcomes the shortcomings of a cascaded estimation. Unlike many state-of-the art non-linear embedding approaches which suffer from the out-of-sample problem, our proposed methods have a direct out-of-sample extension to novel samples. We conduct experiments for tackling the face recognition and image-based face orientation problems on four public databases. These experiments show improvement over the state-of-the-art algorithms that are based on label propagation or graph-based semi-supervised embedding.
介绍了一种基于图的半监督弹性嵌入方法及其核化版本,用于人脸图像的嵌入和分类。该框架将柔性流形嵌入和基于非线性图的嵌入相结合,用于半监督学习。在这两种方法中,非线性流形和映射(线性方法是线性变换,核方法是核乘子)同时估计,克服了级联估计的缺点。不同于许多受样本外问题困扰的最先进的非线性嵌入方法,我们提出的方法具有直接的样本外扩展到新样本。我们在四个公共数据库上进行了人脸识别和基于图像的人脸定位问题的实验。这些实验显示了基于标签传播或基于图的半监督嵌入的最先进算法的改进。
{"title":"Margin Based Semi-Supervised Elastic Embedding for Face Image Analysis","authors":"F. Dornaika, Y. E. Traboulsi","doi":"10.1109/ICCVW.2017.156","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.156","url":null,"abstract":"This paper introduces a graph-based semi-supervised elastic embedding method as well as its kernelized version for face image embedding and classification. The proposed frameworks combines Flexible Manifold Embedding and non-linear graph based embedding for semi-supervised learning. In both proposed methods, the nonlinear manifold and the mapping (linear transform for the linear method and the kernel multipliers for the kernelized method) are simultaneously estimated, which overcomes the shortcomings of a cascaded estimation. Unlike many state-of-the art non-linear embedding approaches which suffer from the out-of-sample problem, our proposed methods have a direct out-of-sample extension to novel samples. We conduct experiments for tackling the face recognition and image-based face orientation problems on four public databases. These experiments show improvement over the state-of-the-art algorithms that are based on label propagation or graph-based semi-supervised embedding.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116355711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Max-Boost-GAN: Max Operation to Boost Generative Ability of Generative Adversarial Networks Max-Boost- gan:提高生成对抗网络生成能力的最大操作
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.140
Xinhan Di, Pengqian Yu
Generative adversarial networks (GANs) can be used to learn a generation function from a joint probability distribution as an input, and then visual samples with semantic properties can be generated from a marginal probability distribution. In this paper, we propose a novel algorithm named Max-Boost-GAN, which is demonstrated to boost the generative ability of GANs when the error of generation is upper bounded. Moreover, the Max-Boost-GAN can be used to learn the generation functions from two marginal probability distributions as the input, and samples of higher visual quality and variety could be generated from the joint probability distribution. Finally, novel objective functions are proposed for obtaining convergence during training the Max-Boost-GAN. Experiments on the generation of binary digits and RGB human faces show that the Max-Boost-GAN achieves boosted ability of generation as expected.
生成式对抗网络(GANs)可以从联合概率分布中学习生成函数作为输入,然后从边缘概率分布中生成具有语义属性的视觉样本。在本文中,我们提出了一种新的算法Max-Boost-GAN,当生成误差为上界时,该算法可以提高gan的生成能力。此外,Max-Boost-GAN可以从两个边缘概率分布中学习生成函数作为输入,并且可以从联合概率分布中生成更高视觉质量和多样性的样本。最后,提出了在训练Max-Boost-GAN时获得收敛性的新目标函数。对二进制数和RGB人脸的生成实验表明,Max-Boost-GAN达到了预期的增强生成能力。
{"title":"Max-Boost-GAN: Max Operation to Boost Generative Ability of Generative Adversarial Networks","authors":"Xinhan Di, Pengqian Yu","doi":"10.1109/ICCVW.2017.140","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.140","url":null,"abstract":"Generative adversarial networks (GANs) can be used to learn a generation function from a joint probability distribution as an input, and then visual samples with semantic properties can be generated from a marginal probability distribution. In this paper, we propose a novel algorithm named Max-Boost-GAN, which is demonstrated to boost the generative ability of GANs when the error of generation is upper bounded. Moreover, the Max-Boost-GAN can be used to learn the generation functions from two marginal probability distributions as the input, and samples of higher visual quality and variety could be generated from the joint probability distribution. Finally, novel objective functions are proposed for obtaining convergence during training the Max-Boost-GAN. Experiments on the generation of binary digits and RGB human faces show that the Max-Boost-GAN achieves boosted ability of generation as expected.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126629739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior 他们要过马路吗?人行横道行为的基准数据集和基线
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.33
Amir Rasouli, Iuliia Kotseruba, John K. Tsotsos
Designing autonomous vehicles suitable for urban environments remains an unresolved problem. One of the major dilemmas faced by autonomous cars is how to understand the intention of other road users and communicate with them. The existing datasets do not provide the necessary means for such higher level analysis of traffic scenes. With this in mind, we introduce a novel dataset which in addition to providing the bounding box information for pedestrian detection, also includes the behavioral and contextual annotations for the scenes. This allows combining visual and semantic information for better understanding of pedestrians' intentions in various traffic scenarios. We establish baseline approaches for analyzing the data and show that combining visual and contextual information can improve prediction of pedestrian intention at the point of crossing by at least 20%.
设计适合城市环境的自动驾驶汽车仍然是一个未解决的问题。自动驾驶汽车面临的主要难题之一是如何理解其他道路使用者的意图并与他们沟通。现有的数据集无法提供对交通场景进行更高层次分析的必要手段。考虑到这一点,我们引入了一个新的数据集,除了为行人检测提供边界框信息外,还包括场景的行为和上下文注释。这可以将视觉和语义信息结合起来,更好地理解行人在各种交通场景中的意图。我们建立了基线方法来分析数据,并表明结合视觉和上下文信息可以将行人意图的预测提高至少20%。
{"title":"Are They Going to Cross? A Benchmark Dataset and Baseline for Pedestrian Crosswalk Behavior","authors":"Amir Rasouli, Iuliia Kotseruba, John K. Tsotsos","doi":"10.1109/ICCVW.2017.33","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.33","url":null,"abstract":"Designing autonomous vehicles suitable for urban environments remains an unresolved problem. One of the major dilemmas faced by autonomous cars is how to understand the intention of other road users and communicate with them. The existing datasets do not provide the necessary means for such higher level analysis of traffic scenes. With this in mind, we introduce a novel dataset which in addition to providing the bounding box information for pedestrian detection, also includes the behavioral and contextual annotations for the scenes. This allows combining visual and semantic information for better understanding of pedestrians' intentions in various traffic scenarios. We establish baseline approaches for analyzing the data and show that combining visual and contextual information can improve prediction of pedestrian intention at the point of crossing by at least 20%.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128787549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 200
Locating Crop Plant Centers from UAV-Based RGB Imagery 从基于无人机的RGB图像定位农作物中心
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.238
Yuhao Chen, Javier Ribera, C. Boomsma, E. Delp
In this paper we propose a method to find the location of crop plants in Unmanned Aerial Vehicle (UAV) imagery. Finding the location of plants is a crucial step to derive and track phenotypic traits for each plant. We describe some initial work in estimating field crop plant locations. We approach the problem by classifying pixels as a plant center or a non plant center. We use Multiple Instance Learning (MIL) to handle the ambiguity of plant center labeling in training data. The classification results are then post-processed to estimate the exact location of the crop plant. Experimental evaluation is conducted to evaluate the method and the result achieved an overall precision and recall of 66% and 64%, respectively.
本文提出了一种在无人机(UAV)图像中寻找农作物位置的方法。寻找植物的位置是推导和追踪每种植物表型性状的关键步骤。我们描述了估算田间作物种植位置的一些初步工作。我们通过将像素分类为植物中心或非植物中心来解决这个问题。我们使用多实例学习(MIL)来处理训练数据中植物中心标注的模糊性。然后对分类结果进行后处理,以估计作物的确切位置。对该方法进行了实验评价,结果表明,该方法的总体查准率和查全率分别达到66%和64%。
{"title":"Locating Crop Plant Centers from UAV-Based RGB Imagery","authors":"Yuhao Chen, Javier Ribera, C. Boomsma, E. Delp","doi":"10.1109/ICCVW.2017.238","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.238","url":null,"abstract":"In this paper we propose a method to find the location of crop plants in Unmanned Aerial Vehicle (UAV) imagery. Finding the location of plants is a crucial step to derive and track phenotypic traits for each plant. We describe some initial work in estimating field crop plant locations. We approach the problem by classifying pixels as a plant center or a non plant center. We use Multiple Instance Learning (MIL) to handle the ambiguity of plant center labeling in training data. The classification results are then post-processed to estimate the exact location of the crop plant. Experimental evaluation is conducted to evaluate the method and the result achieved an overall precision and recall of 66% and 64%, respectively.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130721906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Fusing Geometry and Appearance for Road Segmentation 融合几何和外观的道路分割
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.28
Gong Cheng, Yiming Qian, J. Elder
We propose a novel method for fusing geometric and appearance cues for road surface segmentation. Modeling colour cues using Gaussian mixtures allows the fusion to be performed optimally within a Bayesian framework, avoiding ad hoc weights. Adaptation to different scene conditions is accomplished through nearest-neighbour appearance model selection over a dictionary of mixture models learned from training data, and the thorny problem of selecting the number of components in each mixture is solved through a novel cross-validation approach. Quantitative evaluation reveals that the proposed fusion method significantly improves segmentation accuracy relative to a method that uses geometric cues alone.
我们提出了一种融合几何和外观线索的路面分割新方法。使用高斯混合建模颜色线索允许在贝叶斯框架内最佳地进行融合,避免了特别的权重。通过从训练数据中学习到的混合模型字典中选择最近邻外观模型来实现对不同场景条件的适应,并通过一种新的交叉验证方法解决了每种混合物中选择组件数量的棘手问题。定量评价表明,与单独使用几何线索的方法相比,所提出的融合方法显著提高了分割精度。
{"title":"Fusing Geometry and Appearance for Road Segmentation","authors":"Gong Cheng, Yiming Qian, J. Elder","doi":"10.1109/ICCVW.2017.28","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.28","url":null,"abstract":"We propose a novel method for fusing geometric and appearance cues for road surface segmentation. Modeling colour cues using Gaussian mixtures allows the fusion to be performed optimally within a Bayesian framework, avoiding ad hoc weights. Adaptation to different scene conditions is accomplished through nearest-neighbour appearance model selection over a dictionary of mixture models learned from training data, and the thorny problem of selecting the number of components in each mixture is solved through a novel cross-validation approach. Quantitative evaluation reveals that the proposed fusion method significantly improves segmentation accuracy relative to a method that uses geometric cues alone.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116768027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Spatial Attention Improves Object Localization: A Biologically Plausible Neuro-Computational Model for Use in Virtual Reality 空间注意力提高对象定位:一种在虚拟现实中使用的生物学上合理的神经计算模型
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.320
A. Jamalian, Julia Bergelt, H. Dinkelbach
Visual attention is a smart mechanism performed by the brain to avoid unnecessary processing and to focus on the most relevant part of the visual scene. It can result in a remarkable reduction in the computational complexity of scene understanding. Two major kinds of top-down visual attention signals are spatial and feature-based attention. The former deals with the places in scene which are worth to attend, while the latter is more involved with the basic features of objects e.g. color, intensity, edges. In principle, there are two known sources of generating a spatial attention signal: Frontal Eye Field (FEF) in the prefrontal cortex and Lateral Intraparietal Cortex (LIP) in the parietal cortex. In this paper, first, a combined neuro-computational model of ventral and dorsal stream is introduced and then, it is shown in Virtual Reality (VR) that the spatial attention, provided by LIP, acts as a transsaccadic memory pointer which accelerates object localization.
视觉注意是一种由大脑执行的智能机制,它可以避免不必要的处理,并将注意力集中在视觉场景中最相关的部分。它可以显著降低场景理解的计算复杂度。两种主要的自上而下的视觉注意信号是基于空间和特征的注意。前者处理场景中值得关注的地方,而后者更多地涉及物体的基本特征,如颜色、强度、边缘。原则上,产生空间注意信号有两个已知的来源:前额叶皮层的额眼场(FEF)和顶叶皮层的外侧顶叶内皮层(LIP)。本文首先介绍了一种腹侧流和背侧流的联合神经计算模型,然后在虚拟现实(VR)中证明了LIP提供的空间注意力作为跨跳记忆指针,加速了目标的定位。
{"title":"Spatial Attention Improves Object Localization: A Biologically Plausible Neuro-Computational Model for Use in Virtual Reality","authors":"A. Jamalian, Julia Bergelt, H. Dinkelbach","doi":"10.1109/ICCVW.2017.320","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.320","url":null,"abstract":"Visual attention is a smart mechanism performed by the brain to avoid unnecessary processing and to focus on the most relevant part of the visual scene. It can result in a remarkable reduction in the computational complexity of scene understanding. Two major kinds of top-down visual attention signals are spatial and feature-based attention. The former deals with the places in scene which are worth to attend, while the latter is more involved with the basic features of objects e.g. color, intensity, edges. In principle, there are two known sources of generating a spatial attention signal: Frontal Eye Field (FEF) in the prefrontal cortex and Lateral Intraparietal Cortex (LIP) in the parietal cortex. In this paper, first, a combined neuro-computational model of ventral and dorsal stream is introduced and then, it is shown in Virtual Reality (VR) that the spatial attention, provided by LIP, acts as a transsaccadic memory pointer which accelerates object localization.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"39 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132555873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Double-Task Deep Q-Learning with Multiple Views 基于多视图的双任务深度q学习
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.128
Tingzhu Bai, Jianing Yang, Jun Chen, Xian Guo, Xiangsheng Huang, Yu-Ni Yao
Deep Reinforcement learning enables autonomous robots to learn large repertories of behavioral skill with minimal human intervention. However, the applications of direct deep reinforcement learning have been restricted. For complicated robotic systems, these limitations result from high dimensional action space, high freedom of robotic system and high correlation between images. In this paper we introduce a new definition of action space and propose a double-task deep Q-Network with multiple views (DMDQN) based on double-DQN and dueling-DQN. For extension, we define multi-task model for more complex jobs. Moreover data augment policy is applied, which includes auto-sampling and action-overturn. The exploration policy is formed when DMDQN and data augment are combined. For robotic system's steady exploration, we designed the safety constraints according to working condition. Our experiments show that our double-task DQN with multiple views performs better than the single-task and single-view model. Combining our DMDQN and data augment, the robotic system can reach the object in an exploration way.
深度强化学习使自主机器人能够在最少的人为干预下学习大量的行为技能。然而,直接深度强化学习的应用一直受到限制。对于复杂的机器人系统,这些限制来自于高维动作空间、机器人系统的高度自由度和图像之间的高度相关性。本文引入了动作空间的新定义,提出了一种基于double-DQN和决斗- dqn的双任务多视图深度q网络(DMDQN)。为了扩展,我们为更复杂的作业定义了多任务模型。此外,还采用了数据增强策略,包括自动采样和动作翻转。将DMDQN与数据扩充相结合,形成勘探策略。为了机器人系统的稳定探索,根据工作条件设计了安全约束。实验表明,我们的多视图双任务DQN模型比单任务单视图模型性能更好。结合我们的DMDQN和数据增强,机器人系统可以以一种探索的方式到达目标。
{"title":"Double-Task Deep Q-Learning with Multiple Views","authors":"Tingzhu Bai, Jianing Yang, Jun Chen, Xian Guo, Xiangsheng Huang, Yu-Ni Yao","doi":"10.1109/ICCVW.2017.128","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.128","url":null,"abstract":"Deep Reinforcement learning enables autonomous robots to learn large repertories of behavioral skill with minimal human intervention. However, the applications of direct deep reinforcement learning have been restricted. For complicated robotic systems, these limitations result from high dimensional action space, high freedom of robotic system and high correlation between images. In this paper we introduce a new definition of action space and propose a double-task deep Q-Network with multiple views (DMDQN) based on double-DQN and dueling-DQN. For extension, we define multi-task model for more complex jobs. Moreover data augment policy is applied, which includes auto-sampling and action-overturn. The exploration policy is formed when DMDQN and data augment are combined. For robotic system's steady exploration, we designed the safety constraints according to working condition. Our experiments show that our double-task DQN with multiple views performs better than the single-task and single-view model. Combining our DMDQN and data augment, the robotic system can reach the object in an exploration way.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"239 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132821971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
A Biophysical 3D Morphable Model of Face Appearance 面部外观的生物物理三维变形模型
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.102
S. Alotaibi, W. Smith
Skin colour forms a curved manifold in RGB space. The variations in skin colour are largely caused by variations in concentration of the pigments melanin and hemoglobin. Hence, linear statistical models of appearance or skin albedo are insufficiently constrained (they can produce implausible skin tones) and lack compactness (they require additional dimensions to linearly approximate a curved manifold). In this paper, we propose to use a biophysical model of skin colouration in order to transform skin colour into a parameter space where linear statistical modelling can take place. Hence, we propose a hybrid of biophysical and statistical modelling. We present a two parameter spectral model of skin colouration, methods for fitting the model to data captured in a lightstage and then build our hybrid model on a sample of such registered data. We present face editing results and compare our model against a pure statistical model built directly on textures.
肤色在RGB空间中形成一个弯曲的流形。皮肤颜色的变化很大程度上是由黑色素和血红蛋白浓度的变化引起的。因此,外表或皮肤反照率的线性统计模型没有足够的约束(它们可以产生令人难以置信的肤色),并且缺乏紧凑性(它们需要额外的维度来线性近似弯曲的流形)。在本文中,我们建议使用皮肤颜色的生物物理模型,以便将皮肤颜色转换为可以进行线性统计建模的参数空间。因此,我们提出了生物物理和统计模型的混合。我们提出了一个皮肤颜色的两参数光谱模型,将模型拟合到光台上捕获的数据的方法,然后在这种注册数据的样本上建立我们的混合模型。我们展示了人脸编辑结果,并将我们的模型与直接建立在纹理上的纯统计模型进行了比较。
{"title":"A Biophysical 3D Morphable Model of Face Appearance","authors":"S. Alotaibi, W. Smith","doi":"10.1109/ICCVW.2017.102","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.102","url":null,"abstract":"Skin colour forms a curved manifold in RGB space. The variations in skin colour are largely caused by variations in concentration of the pigments melanin and hemoglobin. Hence, linear statistical models of appearance or skin albedo are insufficiently constrained (they can produce implausible skin tones) and lack compactness (they require additional dimensions to linearly approximate a curved manifold). In this paper, we propose to use a biophysical model of skin colouration in order to transform skin colour into a parameter space where linear statistical modelling can take place. Hence, we propose a hybrid of biophysical and statistical modelling. We present a two parameter spectral model of skin colouration, methods for fitting the model to data captured in a lightstage and then build our hybrid model on a sample of such registered data. We present face editing results and compare our model against a pure statistical model built directly on textures.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131802351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
ViTS: Video Tagging System from Massive Web Multimedia Collections ViTS:基于海量网络多媒体馆藏的视频标签系统
Pub Date : 2017-10-01 DOI: 10.1109/ICCVW.2017.48
Delia Fernandez, David Varas, Joan Espadaler, Issey Masuda, Jordi Ferreira, A. Woodward, David Rodriguez, Xavier Giró-i-Nieto, J. C. Riveiro, Elisenda Bou
The popularization of multimedia content on the Web has arised the need to automatically understand, index and retrieve it. In this paper we present ViTS, an automatic Video Tagging System which learns from videos, their web context and comments shared on social networks. ViTS analyses massive multimedia collections by Internet crawling, and maintains a knowledge base that updates in real time with no need of human supervision. As a result, each video is indexed with a rich set of labels and linked with other related contents. ViTS is an industrial product under exploitation with a vocabulary of over 2.5M concepts, capable of indexing more than 150k videos per month. We compare the quality and completeness of our tags with respect to the ones in the YouTube-8M dataset, and we show how ViTS enhances the semantic annotation of the videos with a larger number of labels (10.04 tags/video), with an accuracy of 80,87%. Extracted tags and video summaries are publicly available.1
随着网络上多媒体内容的普及,产生了对多媒体内容自动理解、索引和检索的需求。在本文中,我们提出了一种自动视频标记系统ViTS,它可以从视频、网络背景和社交网络上分享的评论中学习。ViTS通过互联网爬行分析大量多媒体收藏,并维护一个知识库,该知识库无需人工监督即可实时更新。因此,每个视频都用一组丰富的标签进行索引,并与其他相关内容链接。ViTS是一个正在开发的工业产品,拥有超过250万个概念词汇表,每月能够索引超过15万个视频。我们将我们的标签的质量和完整性与YouTube-8M数据集中的标签进行了比较,并展示了ViTS如何增强具有更多标签(10.04个标签/视频)的视频的语义注释,准确率为80,87%。提取的标签和视频摘要是公开的
{"title":"ViTS: Video Tagging System from Massive Web Multimedia Collections","authors":"Delia Fernandez, David Varas, Joan Espadaler, Issey Masuda, Jordi Ferreira, A. Woodward, David Rodriguez, Xavier Giró-i-Nieto, J. C. Riveiro, Elisenda Bou","doi":"10.1109/ICCVW.2017.48","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.48","url":null,"abstract":"The popularization of multimedia content on the Web has arised the need to automatically understand, index and retrieve it. In this paper we present ViTS, an automatic Video Tagging System which learns from videos, their web context and comments shared on social networks. ViTS analyses massive multimedia collections by Internet crawling, and maintains a knowledge base that updates in real time with no need of human supervision. As a result, each video is indexed with a rich set of labels and linked with other related contents. ViTS is an industrial product under exploitation with a vocabulary of over 2.5M concepts, capable of indexing more than 150k videos per month. We compare the quality and completeness of our tags with respect to the ones in the YouTube-8M dataset, and we show how ViTS enhances the semantic annotation of the videos with a larger number of labels (10.04 tags/video), with an accuracy of 80,87%. Extracted tags and video summaries are publicly available.1","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131841558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1