首页 > 最新文献

2017 IEEE International Conference on Computer Vision Workshops (ICCVW)最新文献

英文 中文
UCT: Learning Unified Convolutional Networks for Real-Time Visual Tracking UCT:学习统一卷积网络用于实时视觉跟踪
Pub Date : 2017-11-10 DOI: 10.1109/ICCVW.2017.231
Zheng Zhu, Guan Huang, Wei Zou, Dalong Du, Chang Huang
Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks. Nonetheless, the chosen CNN features are always pre-trained in different task and individual components in tracking systems are learned separately, thus the achieved tracking performance may be suboptimal. Besides, most of these trackers are not designed towards realtime applications because of their time-consuming feature extraction and complex optimization details. In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT). Specifically, The UCT treats feature extractor and tracking process (ridge regression) both as convolution operation and trains them jointly, enabling learned CNN features are tightly coupled to tracking process. In online tracking, an efficient updating method is proposed by introducing peak-versus-noise ratio (PNR) criterion, and scale changes are handled efficiently by incorporating a scale branch into network. The proposed approach results in superior tracking performance, while maintaining real-time speed. The standard UCT and UCT-Lite can track generic objects at 41 FPS and 154 FPS without further optimization, respectively. Experiments are performed on four challenging benchmark tracking datasets: OTB2013, OTB2015, VOT2014 and VOT2015, and our method achieves state-of-the-art results on these benchmarks compared with other real-time trackers.
基于卷积神经网络(CNN)的跟踪方法在最近的基准测试中表现出良好的性能。然而,所选择的CNN特征总是在不同的任务中进行预训练,并且跟踪系统中的单个组件是单独学习的,因此所获得的跟踪性能可能不是最优的。此外,这些跟踪器大多不是针对实时应用而设计的,因为它们的特征提取耗时且优化细节复杂。在本文中,我们提出了一个端到端的框架来学习卷积特征并同时执行跟踪过程,即统一卷积跟踪器(UCT)。具体来说,UCT将特征提取器和跟踪过程(脊回归)都视为卷积操作,并联合训练它们,使学习到的CNN特征与跟踪过程紧密耦合。在在线跟踪中,通过引入峰噪比(PNR)准则提出了一种有效的更新方法,并通过在网络中加入尺度分支来有效地处理尺度变化。该方法在保持实时速度的同时,具有良好的跟踪性能。无需进一步优化,标准UCT和UCT- lite可以分别以41 FPS和154 FPS的速度跟踪通用对象。在OTB2013、OTB2015、VOT2014和VOT2015四个具有挑战性的基准跟踪数据集上进行了实验,与其他实时跟踪器相比,我们的方法在这些基准上取得了最先进的结果。
{"title":"UCT: Learning Unified Convolutional Networks for Real-Time Visual Tracking","authors":"Zheng Zhu, Guan Huang, Wei Zou, Dalong Du, Chang Huang","doi":"10.1109/ICCVW.2017.231","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.231","url":null,"abstract":"Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks. Nonetheless, the chosen CNN features are always pre-trained in different task and individual components in tracking systems are learned separately, thus the achieved tracking performance may be suboptimal. Besides, most of these trackers are not designed towards realtime applications because of their time-consuming feature extraction and complex optimization details. In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT). Specifically, The UCT treats feature extractor and tracking process (ridge regression) both as convolution operation and trains them jointly, enabling learned CNN features are tightly coupled to tracking process. In online tracking, an efficient updating method is proposed by introducing peak-versus-noise ratio (PNR) criterion, and scale changes are handled efficiently by incorporating a scale branch into network. The proposed approach results in superior tracking performance, while maintaining real-time speed. The standard UCT and UCT-Lite can track generic objects at 41 FPS and 154 FPS without further optimization, respectively. Experiments are performed on four challenging benchmark tracking datasets: OTB2013, OTB2015, VOT2014 and VOT2015, and our method achieves state-of-the-art results on these benchmarks compared with other real-time trackers.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132276624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Ancient Roman Coin Recognition in the Wild Using Deep Learning Based Recognition of Artistically Depicted Face Profiles 使用基于艺术描绘的面部轮廓识别的深度学习在野外识别古罗马硬币
Pub Date : 2017-10-29 DOI: 10.1109/ICCVW.2017.342
Imanol Schlag, Ognjen Arandjelovic
As a particularly interesting application in the realm of cultural heritage on the one hand, and a technically challenging problem, computer vision based analysis of Roman Imperial coins has been attracting an increasing amount of research. In this paper we make several important contributions. Firstly, we address a key limitation of existing work which is largely characterized by the application of generic object recognition techniques and the lack of use of domain knowledge. In contrast, our work approaches coin recognition in much the same way as a human expert would: by identifying the emperor universally shown on the obverse. To this end we develop a deep convolutional network, carefully crafted for what is effectively a specific instance of profile face recognition. No less importantly, we also address a major methodological flaw of previous research which is, as we explain in detail, insufficiently systematic and rigorous, and mired with confounding factors. Lastly, we introduce three carefully collected and annotated data sets, and using these demonstrate the effectiveness of the proposed approach which is shown to exceed the performance of the state of the art by approximately an order of magnitude.
一方面,作为文化遗产领域中一个特别有趣的应用,同时也是一个技术上具有挑战性的问题,基于计算机视觉的罗马帝国硬币分析吸引了越来越多的研究。在本文中,我们做了一些重要的贡献。首先,我们解决了现有工作的一个关键限制,即主要以应用通用目标识别技术和缺乏使用领域知识为特征。相比之下,我们的工作接近硬币识别的方式与人类专家大致相同:通过识别普遍显示在正面的皇帝。为此,我们开发了一个深度卷积网络,精心设计了一个有效的人脸识别实例。同样重要的是,我们还解决了先前研究的一个主要方法缺陷,正如我们详细解释的那样,这个缺陷不够系统和严谨,并且受到混杂因素的困扰。最后,我们介绍了三个精心收集和注释的数据集,并使用这些数据集证明了所提出方法的有效性,该方法的性能超过了目前最先进的性能大约一个数量级。
{"title":"Ancient Roman Coin Recognition in the Wild Using Deep Learning Based Recognition of Artistically Depicted Face Profiles","authors":"Imanol Schlag, Ognjen Arandjelovic","doi":"10.1109/ICCVW.2017.342","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.342","url":null,"abstract":"As a particularly interesting application in the realm of cultural heritage on the one hand, and a technically challenging problem, computer vision based analysis of Roman Imperial coins has been attracting an increasing amount of research. In this paper we make several important contributions. Firstly, we address a key limitation of existing work which is largely characterized by the application of generic object recognition techniques and the lack of use of domain knowledge. In contrast, our work approaches coin recognition in much the same way as a human expert would: by identifying the emperor universally shown on the obverse. To this end we develop a deep convolutional network, carefully crafted for what is effectively a specific instance of profile face recognition. No less importantly, we also address a major methodological flaw of previous research which is, as we explain in detail, insufficiently systematic and rigorous, and mired with confounding factors. Lastly, we introduce three carefully collected and annotated data sets, and using these demonstrate the effectiveness of the proposed approach which is shown to exceed the performance of the state of the art by approximately an order of magnitude.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"248 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128136182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Particle Filter Based Probabilistic Forced Alignment for Continuous Gesture Recognition 基于粒子滤波的连续手势识别概率强制对齐
Pub Date : 2017-10-29 DOI: 10.1109/ICCVW.2017.364
Necati Cihan Camgöz, Simon Hadfield, R. Bowden
In this paper, we propose a novel particle filter based probabilistic forced alignment approach for training spatiotemporal deep neural networks using weak border level annotations. The proposed method jointly learns to localize and recognize isolated instances in continuous streams. This is done by drawing training volumes from a prior distribution of likely regions and training a discriminative 3D-CNN from this data. The classifier is then used to calculate the posterior distribution by scoring the training examples and using this as the prior for the next sampling stage. We apply the proposed approach to the challenging task of large-scale user-independent continuous gesture recognition. We evaluate the performance on the popular ChaLearn 2016 Continuous Gesture Recognition (ConGD) dataset. Our method surpasses state-of-the-art results by obtaining 0.3646 and 0.3744 Mean Jaccard Index Score on the validation and test sets of ConGD, respectively. Furthermore, we participated in the ChaLearn 2017 Continuous Gesture Recognition Challenge and was ranked 3rd. It should be noted that our method is learner independent, it can be easily combined with other approaches.
在本文中,我们提出了一种基于粒子滤波的概率强制对齐方法,用于使用弱边界级标注训练时空深度神经网络。该方法联合学习对连续流中的孤立实例进行定位和识别。这是通过从可能区域的先验分布中绘制训练量并从该数据中训练判别3D-CNN来完成的。然后使用分类器通过对训练样本进行评分来计算后验分布,并将其用作下一个采样阶段的先验。我们将提出的方法应用于具有挑战性的大规模用户独立连续手势识别任务。我们在流行的ChaLearn 2016连续手势识别(cond)数据集上评估了性能。我们的方法超越了最先进的结果,在cond的验证集和测试集上分别获得了0.3646和0.3744的平均Jaccard指数得分。此外,我们还参加了ChaLearn 2017连续手势识别挑战赛,并获得了第三名。值得注意的是,我们的方法是独立于学习者的,它可以很容易地与其他方法相结合。
{"title":"Particle Filter Based Probabilistic Forced Alignment for Continuous Gesture Recognition","authors":"Necati Cihan Camgöz, Simon Hadfield, R. Bowden","doi":"10.1109/ICCVW.2017.364","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.364","url":null,"abstract":"In this paper, we propose a novel particle filter based probabilistic forced alignment approach for training spatiotemporal deep neural networks using weak border level annotations. The proposed method jointly learns to localize and recognize isolated instances in continuous streams. This is done by drawing training volumes from a prior distribution of likely regions and training a discriminative 3D-CNN from this data. The classifier is then used to calculate the posterior distribution by scoring the training examples and using this as the prior for the next sampling stage. We apply the proposed approach to the challenging task of large-scale user-independent continuous gesture recognition. We evaluate the performance on the popular ChaLearn 2016 Continuous Gesture Recognition (ConGD) dataset. Our method surpasses state-of-the-art results by obtaining 0.3646 and 0.3744 Mean Jaccard Index Score on the validation and test sets of ConGD, respectively. Furthermore, we participated in the ChaLearn 2017 Continuous Gesture Recognition Challenge and was ranked 3rd. It should be noted that our method is learner independent, it can be easily combined with other approaches.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125991351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Propagation of Orientation Uncertainty of 3D Rigid Object to Its Points 三维刚体方向不确定性向其点的传播
Pub Date : 2017-10-29 DOI: 10.1109/ICCVW.2017.255
M. Franaszek, G. Cheok
If a CAD model of a rigid object is available, the location of any point on an object can be derived from the measured 6DOF pose of the object. However, the uncertainty of the measured pose propagates to the uncertainty of the point in an anisotropic way. We investigate this propagation for a class of systems that determine an object pose by using point-based rigid body registration. For such systems, the uncertainty in the location of the points used for registration propagates to the pose uncertainty. We find that for different poses of the object, the direction corresponding to the smallest propagated uncertainty remains relatively unchanged in the object's local frame, regardless of object pose. We show that this direction may be closely approximated by the moment of inertia axis which is based on the configuration of the fiducials. We use existing theory of rigid-body registration to explain the experimental results, discuss the limitations of the theory and practical implications of our findings.
如果一个刚体物体的CAD模型是可用的,那么物体上任何一点的位置都可以从物体的测量6DOF位姿中得到。然而,被测姿态的不确定性以各向异性的方式传播到点的不确定性。我们研究了一类通过使用基于点的刚体配准来确定物体姿态的系统的这种传播。对于这样的系统,用于配准的点的位置的不确定性传播到姿态的不确定性。我们发现,对于物体的不同姿态,无论物体的姿态如何,最小传播不确定性所对应的方向在物体的局部帧中保持相对不变。我们证明了这个方向可以用基于基准结构的转动惯量轴来近似。我们利用现有的刚体配准理论来解释实验结果,讨论了理论的局限性和我们的研究结果的实际意义。
{"title":"Propagation of Orientation Uncertainty of 3D Rigid Object to Its Points","authors":"M. Franaszek, G. Cheok","doi":"10.1109/ICCVW.2017.255","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.255","url":null,"abstract":"If a CAD model of a rigid object is available, the location of any point on an object can be derived from the measured 6DOF pose of the object. However, the uncertainty of the measured pose propagates to the uncertainty of the point in an anisotropic way. We investigate this propagation for a class of systems that determine an object pose by using point-based rigid body registration. For such systems, the uncertainty in the location of the points used for registration propagates to the pose uncertainty. We find that for different poses of the object, the direction corresponding to the smallest propagated uncertainty remains relatively unchanged in the object's local frame, regardless of object pose. We show that this direction may be closely approximated by the moment of inertia axis which is based on the configuration of the fiducials. We use existing theory of rigid-body registration to explain the experimental results, discuss the limitations of the theory and practical implications of our findings.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131873501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
BEHAVE — Behavioral Analysis of Visual Events for Assisted Living Scenarios 辅助生活场景中视觉事件的行为分析
Pub Date : 2017-10-28 DOI: 10.1109/ICCVW.2017.160
Jonas Vlasselaer, C. Crispim, F. Brémond, Anton Dries
This paper proposes BEHAVE, a person-centered pipeline for probabilistic event recognition. The proposed pipeline firstly detects the set of people in a video frame, then it searches for correspondences between people in the current and previous frames (i.e., people tracking). Finally, event recognition is carried for each person using probabilistic logic models (PLMs, ProbLog2 language). PLMs represent interactions among people, home appliances and semantic regions. They also enable one to assess the probability of an event given noisy observations of the real world. BEHAVE was evaluated on the task of online (non-clipped videos) and open-set event recognition (e.g., target events plus none class) on video recordings of seniors carrying out daily tasks. Results have shown that BEHAVE improves event recognition accuracy by handling missed and partially satisfied logic models. Future work will investigate how to extend PLMs to represent temporal relations among events.
本文提出了一种以人为中心的概率事件识别管道。提出的管道首先检测视频帧中的一组人,然后搜索当前帧和前一帧中的人之间的对应关系(即人跟踪)。最后,使用概率逻辑模型(PLMs, ProbLog2语言)对每个人进行事件识别。plm表示人、家电和语义区域之间的交互。它们还使人们能够在对现实世界进行嘈杂观察的情况下评估事件发生的概率。对老年人日常任务的视频记录进行在线(非剪辑视频)和开放集事件识别(如目标事件加无课堂)任务评估。结果表明,通过处理缺失和部分满足的逻辑模型,BEHAVE提高了事件识别的准确性。未来的工作将研究如何扩展plm来表示事件之间的时间关系。
{"title":"BEHAVE — Behavioral Analysis of Visual Events for Assisted Living Scenarios","authors":"Jonas Vlasselaer, C. Crispim, F. Brémond, Anton Dries","doi":"10.1109/ICCVW.2017.160","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.160","url":null,"abstract":"This paper proposes BEHAVE, a person-centered pipeline for probabilistic event recognition. The proposed pipeline firstly detects the set of people in a video frame, then it searches for correspondences between people in the current and previous frames (i.e., people tracking). Finally, event recognition is carried for each person using probabilistic logic models (PLMs, ProbLog2 language). PLMs represent interactions among people, home appliances and semantic regions. They also enable one to assess the probability of an event given noisy observations of the real world. BEHAVE was evaluated on the task of online (non-clipped videos) and open-set event recognition (e.g., target events plus none class) on video recordings of seniors carrying out daily tasks. Results have shown that BEHAVE improves event recognition accuracy by handling missed and partially satisfied logic models. Future work will investigate how to extend PLMs to represent temporal relations among events.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122806533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Spatially-Variant Kernel for Optical Flow Under Low Signal-to-Noise Ratios Application to Microscopy 低信噪比下光流空间变核在显微镜中的应用
Pub Date : 2017-10-23 DOI: 10.1109/ICCVW.2017.12
Denis Fortun, N. Debroux, C. Kervrann
Local and global approaches can be identified as the two main classes of optical flow estimation methods. In this paper, we propose a framework to combine the advantages of these two principles, namely robustness to noise of the local approach and discontinuity preservation of the global approach. This is particularly crucial in biological imaging, where the noise produced by microscopes is one of the main issues for optical flow estimation. The idea is to adapt spatially the local support of the local parametric constraint in the combined local-global model [6]. To this end, we jointly estimate the motion field and the parameters of the spatial support. We apply our approach to the case of Gaussian filtering, and we derive efficient minimization schemes for usual data terms. The estimation of a spatially varying standard deviation map prevents from the smoothing of motion discontinuities, while ensuring robustness to noise. We validate our method for a standard model and demonstrate how a baseline approach with pixel-wise data term can be improved when integrated in our framework. The method is evaluated on the Middlebury benchmark with ground truth and on real fluorescence microscopy data.
局部法和光流估计方法可以分为两大类。在本文中,我们提出了一个框架来结合这两个原则的优点,即局部方法的对噪声的鲁棒性和全局方法的不连续性保持。这在生物成像中尤其重要,因为显微镜产生的噪声是光流估计的主要问题之一。其思想是在空间上适应局部参数约束在组合局部-全局模型中的局部支持[6]。为此,我们对运动场和空间支撑参数进行了联合估计。我们将我们的方法应用于高斯滤波的情况,并为通常的数据项推导出有效的最小化方案。对空间变化标准差图的估计防止了运动不连续的平滑,同时保证了对噪声的鲁棒性。我们在标准模型中验证了我们的方法,并演示了在集成到我们的框架中时,如何改进具有像素级数据项的基线方法。该方法是在米德尔伯里基准与地面真理和真实的荧光显微镜数据进行评估。
{"title":"Spatially-Variant Kernel for Optical Flow Under Low Signal-to-Noise Ratios Application to Microscopy","authors":"Denis Fortun, N. Debroux, C. Kervrann","doi":"10.1109/ICCVW.2017.12","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.12","url":null,"abstract":"Local and global approaches can be identified as the two main classes of optical flow estimation methods. In this paper, we propose a framework to combine the advantages of these two principles, namely robustness to noise of the local approach and discontinuity preservation of the global approach. This is particularly crucial in biological imaging, where the noise produced by microscopes is one of the main issues for optical flow estimation. The idea is to adapt spatially the local support of the local parametric constraint in the combined local-global model [6]. To this end, we jointly estimate the motion field and the parameters of the spatial support. We apply our approach to the case of Gaussian filtering, and we derive efficient minimization schemes for usual data terms. The estimation of a spatially varying standard deviation map prevents from the smoothing of motion discontinuities, while ensuring robustness to noise. We validate our method for a standard model and demonstrate how a baseline approach with pixel-wise data term can be improved when integrated in our framework. The method is evaluated on the Middlebury benchmark with ground truth and on real fluorescence microscopy data.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123657543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human Action Recognition: Pose-Based Attention Draws Focus to Hands 人类动作识别:基于姿势的注意力将焦点吸引到手上
Pub Date : 2017-10-23 DOI: 10.1109/ICCVW.2017.77
Fabien Baradel, Christian Wolf, J. Mille
We propose a new spatio-temporal attention based mechanism for human action recognition able to automatically attend to most important human hands and detect the most discriminative moments in an action. Attention is handled in a recurrent manner employing Recurrent Neural Network (RNN) and is fully-differentiable. In contrast to standard soft-attention based mechanisms, our approach does not use the hidden RNN state as input to the attention model. Instead, attention distributions are drawn using external information: human articulated pose. We performed an extensive ablation study to show the strengths of this approach and we particularly studied the conditioning aspect of the attention mechanism. We evaluate the method on the largest currently available human action recognition dataset, NTU-RGB+D, and report state-of-the-art results. Another advantage of our model are certains aspects of explanability, as the spatial and temporal attention distributions at test time allow to study and verify on which parts of the input data the method focuses.
我们提出了一种新的基于时空注意的人类动作识别机制,该机制能够自动关注人类最重要的手,并检测动作中最具判别性的时刻。注意力是用循环神经网络(RNN)以循环的方式处理的,并且是完全可微的。与标准的基于软注意的机制相比,我们的方法不使用隐藏的RNN状态作为注意模型的输入。相反,注意力分布是通过外部信息绘制的:人类的关节姿势。我们进行了广泛的消融研究来显示这种方法的优势,我们特别研究了注意机制的条件反射方面。我们在目前最大的人类动作识别数据集NTU-RGB+D上评估了该方法,并报告了最新的结果。我们的模型的另一个优点是某些方面的可解释性,因为测试时的空间和时间注意力分布允许研究和验证该方法关注的输入数据的哪些部分。
{"title":"Human Action Recognition: Pose-Based Attention Draws Focus to Hands","authors":"Fabien Baradel, Christian Wolf, J. Mille","doi":"10.1109/ICCVW.2017.77","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.77","url":null,"abstract":"We propose a new spatio-temporal attention based mechanism for human action recognition able to automatically attend to most important human hands and detect the most discriminative moments in an action. Attention is handled in a recurrent manner employing Recurrent Neural Network (RNN) and is fully-differentiable. In contrast to standard soft-attention based mechanisms, our approach does not use the hidden RNN state as input to the attention model. Instead, attention distributions are drawn using external information: human articulated pose. We performed an extensive ablation study to show the strengths of this approach and we particularly studied the conditioning aspect of the attention mechanism. We evaluate the method on the largest currently available human action recognition dataset, NTU-RGB+D, and report state-of-the-art results. Another advantage of our model are certains aspects of explanability, as the spatial and temporal attention distributions at test time allow to study and verify on which parts of the input data the method focuses.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129745182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 91
Exploiting the Complementarity of Audio and Visual Data in Multi-speaker Tracking 在多说话人跟踪中利用视听数据的互补性
Pub Date : 2017-10-23 DOI: 10.1109/ICCVW.2017.60
Yutong Ban, Laurent Girin, Xavier Alameda-Pineda, R. Horaud
Multi-speaker tracking is a central problem in human-robot interaction. In this context, exploiting auditory and visual information is gratifying and challenging at the same time. Gratifying because the complementary nature of auditory and visual information allows us to be more robust against noise and outliers than unimodal approaches. Challenging because how to properly fuse auditory and visual information for multi-speaker tracking is far from being a solved problem. In this paper we propose a probabilistic generative model that tracks multiple speakers by jointly exploiting auditory and visual features in their own representation spaces. Importantly, the method is robust to missing data and is therefore able to track even when observations from one of the modalities are absent. Quantitative and qualitative results on the AVDIAR dataset are reported.
多说话人跟踪是人机交互中的一个核心问题。在这种情况下,利用听觉和视觉信息是令人满意的,同时也是具有挑战性的。之所以令人满意,是因为听觉和视觉信息的互补性使我们比单模态方法更能抵御噪音和异常值。具有挑战性是因为如何正确地融合听觉和视觉信息来进行多说话人跟踪是一个远未解决的问题。在本文中,我们提出了一种概率生成模型,该模型通过在各自的表示空间中共同利用听觉和视觉特征来跟踪多个说话者。重要的是,该方法对缺失数据具有鲁棒性,因此即使在缺少其中一种模式的观测时也能够进行跟踪。报告了AVDIAR数据集的定量和定性结果。
{"title":"Exploiting the Complementarity of Audio and Visual Data in Multi-speaker Tracking","authors":"Yutong Ban, Laurent Girin, Xavier Alameda-Pineda, R. Horaud","doi":"10.1109/ICCVW.2017.60","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.60","url":null,"abstract":"Multi-speaker tracking is a central problem in human-robot interaction. In this context, exploiting auditory and visual information is gratifying and challenging at the same time. Gratifying because the complementary nature of auditory and visual information allows us to be more robust against noise and outliers than unimodal approaches. Challenging because how to properly fuse auditory and visual information for multi-speaker tracking is far from being a solved problem. In this paper we propose a probabilistic generative model that tracks multiple speakers by jointly exploiting auditory and visual features in their own representation spaces. Importantly, the method is robust to missing data and is therefore able to track even when observations from one of the modalities are absent. Quantitative and qualitative results on the AVDIAR dataset are reported.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127732121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Registration of RGB and Thermal Point Clouds Generated by Structure From Motion 由运动结构生成的RGB和热点云的配准
Pub Date : 2017-10-22 DOI: 10.1109/ICCVW.2017.57
Trong Phuc Truong, M. Yamaguchi, Shohei Mori, Vincent Nozick, H. Saito
Thermal imaging has become a valuable tool in various fields for remote sensing and can provide relevant information to perform object recognition or classification. In this paper, we present an automated method to obtain a 3D model fusing data from a visible and a thermal camera. The RGB and thermal point clouds are generated independently by structure from motion. The registration process includes a normalization of the point cloud scale, a global registration based on calibration data and the output of the structure from motion, and a fine registration employing a variant of the Iterative Closest Point optimization. Experimental results demonstrate the accuracy and robustness of the overall process.
热成像技术已成为遥感技术应用于各个领域的重要工具,可以为目标识别或分类提供相关信息。在本文中,我们提出了一种自动获取三维模型的方法,该方法融合了可见光和热像仪的数据。RGB和热点云是由结构和运动独立产生的。配准过程包括点云尺度的归一化,基于校准数据和运动输出的结构的全局配准,以及采用迭代最近点优化的变体的精细配准。实验结果证明了整个过程的准确性和鲁棒性。
{"title":"Registration of RGB and Thermal Point Clouds Generated by Structure From Motion","authors":"Trong Phuc Truong, M. Yamaguchi, Shohei Mori, Vincent Nozick, H. Saito","doi":"10.1109/ICCVW.2017.57","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.57","url":null,"abstract":"Thermal imaging has become a valuable tool in various fields for remote sensing and can provide relevant information to perform object recognition or classification. In this paper, we present an automated method to obtain a 3D model fusing data from a visible and a thermal camera. The RGB and thermal point clouds are generated independently by structure from motion. The registration process includes a normalization of the point cloud scale, a global registration based on calibration data and the output of the structure from motion, and a fine registration employing a variant of the Iterative Closest Point optimization. Experimental results demonstrate the accuracy and robustness of the overall process.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123671369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
A Handcrafted Normalized-Convolution Network for Texture Classification 纹理分类的手工归一化卷积网络
Pub Date : 2017-10-22 DOI: 10.1109/ICCVW.2017.149
Ngoc-Son Vu, Vu-Lam Nguyen, P. Gosselin
In this paper, we propose a Handcrafted Normalized-Convolution Network (NmzNet) for efficient texture classification. NmzNet is implemented by a three-layer normalized convolution network, which computes successive normalized convolution with a predefined filter bank (Gabor filter bank) and modulus non-linearities. Coefficients from different layers are aggregated by Fisher Vector aggregation to form the final discriminative features. The results of experimental evaluation on three texture datasets UIUC, KTH-TIPS-2a, and KTH-TIPS-2b indicate that our proposed approach achieves the good classification rate compared with other handcrafted methods. The results additionally indicate that only a marginal difference exists between the best classification rate of recent frontiers CNN and that of the proposed method on the experimented datasets.
在本文中,我们提出了一种用于有效纹理分类的手工规范化卷积网络(NmzNet)。NmzNet由三层归一化卷积网络实现,该网络使用预定义的滤波器组(Gabor滤波器组)和非线性模量计算连续归一化卷积。采用Fisher向量聚合法对各层系数进行聚合,形成最终的判别特征。在UIUC、KTH-TIPS-2a和KTH-TIPS-2b三个纹理数据集上的实验评估结果表明,与其他手工制作方法相比,我们的方法取得了较好的分类率。结果还表明,在实验数据集上,最新前沿CNN的最佳分类率与本文方法的最佳分类率仅存在微小差异。
{"title":"A Handcrafted Normalized-Convolution Network for Texture Classification","authors":"Ngoc-Son Vu, Vu-Lam Nguyen, P. Gosselin","doi":"10.1109/ICCVW.2017.149","DOIUrl":"https://doi.org/10.1109/ICCVW.2017.149","url":null,"abstract":"In this paper, we propose a Handcrafted Normalized-Convolution Network (NmzNet) for efficient texture classification. NmzNet is implemented by a three-layer normalized convolution network, which computes successive normalized convolution with a predefined filter bank (Gabor filter bank) and modulus non-linearities. Coefficients from different layers are aggregated by Fisher Vector aggregation to form the final discriminative features. The results of experimental evaluation on three texture datasets UIUC, KTH-TIPS-2a, and KTH-TIPS-2b indicate that our proposed approach achieves the good classification rate compared with other handcrafted methods. The results additionally indicate that only a marginal difference exists between the best classification rate of recent frontiers CNN and that of the proposed method on the experimented datasets.","PeriodicalId":149766,"journal":{"name":"2017 IEEE International Conference on Computer Vision Workshops (ICCVW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128927015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2017 IEEE International Conference on Computer Vision Workshops (ICCVW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1