首页 > 最新文献

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

英文 中文
TenSR: Multi-dimensional Tensor Sparse Representation TenSR:多维张量稀疏表示
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.637
Na Qi, Yunhui Shi, Xiaoyan Sun, Baocai Yin
The conventional sparse model relies on data representation in the form of vectors. It represents the vector-valued or vectorized one dimensional (1D) version of an signal as a highly sparse linear combination of basis atoms from a large dictionary. The 1D modeling, though simple, ignores the inherent structure and breaks the local correlation inside multidimensional (MD) signals. It also dramatically increases the demand of memory as well as computational resources especially when dealing with high dimensional signals. In this paper, we propose a new sparse model TenSR based on tensor for MD data representation along with the corresponding MD sparse coding and MD dictionary learning algorithms. The proposed TenSR model is able to well approximate the structure in each mode inherent in MD signals with a series of adaptive separable structure dictionaries via dictionary learning. The proposed MD sparse coding algorithm by proximal method further reduces the computational cost significantly. Experimental results with real world MD signals, i.e. 3D Multi-spectral images, show the proposed TenSR greatly reduces both the computational and memory costs with competitive performance in comparison with the state-of-the-art sparse representation methods. We believe our proposed TenSR model is a promising way to empower the sparse representation especially for large scale high order signals.
传统的稀疏模型依赖于以向量形式表示的数据。它将信号的向量值或向量化的一维(1D)版本表示为来自大字典的基原子的高度稀疏线性组合。一维建模虽然简单,但忽略了信号的固有结构,破坏了多维信号内部的局部相关性。它还极大地增加了对内存和计算资源的需求,特别是在处理高维信号时。本文提出了一种新的基于张量的MD数据表示稀疏模型TenSR,并给出了相应的MD稀疏编码和MD字典学习算法。该模型通过字典学习,利用一系列自适应可分离结构字典,能够很好地逼近MD信号中各模态的固有结构。本文提出的基于近邻法的MD稀疏编码算法进一步显著降低了计算量。对真实MD信号(即3D多光谱图像)的实验结果表明,与目前最先进的稀疏表示方法相比,所提出的TenSR大大降低了计算和存储成本,并具有竞争力的性能。我们相信我们提出的TenSR模型是一种很有前途的方法来增强稀疏表示,特别是对于大规模的高阶信号。
{"title":"TenSR: Multi-dimensional Tensor Sparse Representation","authors":"Na Qi, Yunhui Shi, Xiaoyan Sun, Baocai Yin","doi":"10.1109/CVPR.2016.637","DOIUrl":"https://doi.org/10.1109/CVPR.2016.637","url":null,"abstract":"The conventional sparse model relies on data representation in the form of vectors. It represents the vector-valued or vectorized one dimensional (1D) version of an signal as a highly sparse linear combination of basis atoms from a large dictionary. The 1D modeling, though simple, ignores the inherent structure and breaks the local correlation inside multidimensional (MD) signals. It also dramatically increases the demand of memory as well as computational resources especially when dealing with high dimensional signals. In this paper, we propose a new sparse model TenSR based on tensor for MD data representation along with the corresponding MD sparse coding and MD dictionary learning algorithms. The proposed TenSR model is able to well approximate the structure in each mode inherent in MD signals with a series of adaptive separable structure dictionaries via dictionary learning. The proposed MD sparse coding algorithm by proximal method further reduces the computational cost significantly. Experimental results with real world MD signals, i.e. 3D Multi-spectral images, show the proposed TenSR greatly reduces both the computational and memory costs with competitive performance in comparison with the state-of-the-art sparse representation methods. We believe our proposed TenSR model is a promising way to empower the sparse representation especially for large scale high order signals.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"17 1","pages":"5916-5925"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80197391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Temporal Action Detection Using a Statistical Language Model 使用统计语言模型的时间动作检测
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.341
Alexander Richard, Juergen Gall
While current approaches to action recognition on presegmented video clips already achieve high accuracies, temporal action detection is still far from comparably good results. Automatically locating and classifying the relevant action segments in videos of varying lengths proves to be a challenging task. We propose a novel method for temporal action detection including statistical length and language modeling to represent temporal and contextual structure. Our approach aims at globally optimizing the joint probability of three components, a length and language model and a discriminative action model, without making intermediate decisions. The problem of finding the most likely action sequence and the corresponding segment boundaries in an exponentially large search space is addressed by dynamic programming. We provide an extensive evaluation of each model component on Thumos 14, a large action detection dataset, and report state-of-the-art results on three datasets.
虽然目前对预分割视频片段的动作识别方法已经达到了很高的精度,但时间动作检测仍然远没有达到相当好的效果。在不同长度的视频中自动定位和分类相关的动作片段是一项具有挑战性的任务。我们提出了一种新的时间动作检测方法,包括统计长度和语言建模来表示时间和上下文结构。我们的方法旨在全局优化三个组成部分的联合概率,一个长度和语言模型和一个判别行为模型,而不做中间决策。用动态规划方法解决了在指数级搜索空间中寻找最可能的动作序列和相应的段边界的问题。我们在Thumos 14(一个大型动作检测数据集)上对每个模型组件进行了广泛的评估,并在三个数据集上报告了最新的结果。
{"title":"Temporal Action Detection Using a Statistical Language Model","authors":"Alexander Richard, Juergen Gall","doi":"10.1109/CVPR.2016.341","DOIUrl":"https://doi.org/10.1109/CVPR.2016.341","url":null,"abstract":"While current approaches to action recognition on presegmented video clips already achieve high accuracies, temporal action detection is still far from comparably good results. Automatically locating and classifying the relevant action segments in videos of varying lengths proves to be a challenging task. We propose a novel method for temporal action detection including statistical length and language modeling to represent temporal and contextual structure. Our approach aims at globally optimizing the joint probability of three components, a length and language model and a discriminative action model, without making intermediate decisions. The problem of finding the most likely action sequence and the corresponding segment boundaries in an exponentially large search space is addressed by dynamic programming. We provide an extensive evaluation of each model component on Thumos 14, a large action detection dataset, and report state-of-the-art results on three datasets.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"3131-3140"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84331490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 205
Object Tracking via Dual Linear Structured SVM and Explicit Feature Map 基于双线性结构化支持向量机和显式特征映射的目标跟踪
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.462
J. Ning, Jimei Yang, Shaojie Jiang, Lei Zhang, Ming-Hsuan Yang
Structured support vector machine (SSVM) based methods have demonstrated encouraging performance in recent object tracking benchmarks. However, the complex and expensive optimization limits their deployment in real-world applications. In this paper, we present a simple yet efficient dual linear SSVM (DLSSVM) algorithm to enable fast learning and execution during tracking. By analyzing the dual variables, we propose a primal classifier update formula where the learning step size is computed in closed form. This online learning method significantly improves the robustness of the proposed linear SSVM with lower computational cost. Second, we approximate the intersection kernel for feature representations with an explicit feature map to further improve tracking performance. Finally, we extend the proposed DLSSVM tracker with multi-scale estimation to address the "drift" problem. Experimental results on large benchmark datasets with 50 and 100 video sequences show that the proposed DLSSVM tracking algorithm achieves state-of-the-art performance.
基于结构化支持向量机(SSVM)的方法在最近的目标跟踪基准测试中表现出令人鼓舞的性能。然而,复杂和昂贵的优化限制了它们在实际应用程序中的部署。在本文中,我们提出了一种简单而高效的双线性SSVM (DLSSVM)算法,以实现跟踪过程中的快速学习和执行。通过分析对偶变量,我们提出了一个原始分类器更新公式,其中学习步长以封闭形式计算。这种在线学习方法显著提高了所提出的线性SSVM的鲁棒性,且计算成本较低。其次,我们使用显式特征映射近似特征表示的交集核,以进一步提高跟踪性能。最后,我们用多尺度估计扩展了所提出的DLSSVM跟踪器,以解决“漂移”问题。在50和100个视频序列的大型基准数据集上的实验结果表明,所提出的DLSSVM跟踪算法达到了最先进的性能。
{"title":"Object Tracking via Dual Linear Structured SVM and Explicit Feature Map","authors":"J. Ning, Jimei Yang, Shaojie Jiang, Lei Zhang, Ming-Hsuan Yang","doi":"10.1109/CVPR.2016.462","DOIUrl":"https://doi.org/10.1109/CVPR.2016.462","url":null,"abstract":"Structured support vector machine (SSVM) based methods have demonstrated encouraging performance in recent object tracking benchmarks. However, the complex and expensive optimization limits their deployment in real-world applications. In this paper, we present a simple yet efficient dual linear SSVM (DLSSVM) algorithm to enable fast learning and execution during tracking. By analyzing the dual variables, we propose a primal classifier update formula where the learning step size is computed in closed form. This online learning method significantly improves the robustness of the proposed linear SSVM with lower computational cost. Second, we approximate the intersection kernel for feature representations with an explicit feature map to further improve tracking performance. Finally, we extend the proposed DLSSVM tracker with multi-scale estimation to address the \"drift\" problem. Experimental results on large benchmark datasets with 50 and 100 video sequences show that the proposed DLSSVM tracking algorithm achieves state-of-the-art performance.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"21 1","pages":"4266-4274"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85006706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 225
Rolling Rotations for Recognizing Human Actions from 3D Skeletal Data 从三维骨骼数据中识别人体动作的滚动旋转
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.484
Raviteja Vemulapalli, R. Chellappa
Recently, skeleton-based human action recognition has been receiving significant attention from various research communities due to the availability of depth sensors and real-time depth-based 3D skeleton estimation algorithms. In this work, we use rolling maps for recognizing human actions from 3D skeletal data. The rolling map is a well-defined mathematical concept that has not been explored much by the vision community. First, we represent each skeleton using the relative 3D rotations between various body parts. Since 3D rotations are members of the special orthogonal group SO3, our skeletal representation becomes a point in the Lie group SO3 × ... × SO3, which is also a Riemannian manifold. Then, using this representation, we model human actions as curves in this Lie group. Since classification of curves in this non-Euclidean space is a difficult task, we unwrap the action curves onto the Lie algebra so3 × ... × so3 (which is a vector space) by combining the logarithm map with rolling maps, and perform classification in the Lie algebra. Experimental results on three action datasets show that the proposed approach performs equally well or better when compared to state-of-the-art.
近年来,由于深度传感器和基于深度的实时三维骨骼估计算法的可用性,基于骨骼的人体动作识别受到了各个研究团体的极大关注。在这项工作中,我们使用滚动地图从3D骨骼数据中识别人类行为。滚动地图是一个定义良好的数学概念,视觉社区还没有对其进行过多的探索。首先,我们使用不同身体部位之间的相对3D旋转来表示每个骨骼。由于三维旋转是特殊正交群SO3的成员,我们的骨架表示成为李群SO3 ×…xso3,它也是一个黎曼流形。然后,使用这种表示,我们将人类行为建模为李群中的曲线。由于在非欧几里得空间中曲线的分类是一项困难的任务,我们将作用曲线展开到李代数so3 ×…xso3(这是一个向量空间)通过将对数映射与滚动映射相结合,并在李代数中进行分类。在三个动作数据集上的实验结果表明,与最先进的方法相比,所提出的方法表现同样好或更好。
{"title":"Rolling Rotations for Recognizing Human Actions from 3D Skeletal Data","authors":"Raviteja Vemulapalli, R. Chellappa","doi":"10.1109/CVPR.2016.484","DOIUrl":"https://doi.org/10.1109/CVPR.2016.484","url":null,"abstract":"Recently, skeleton-based human action recognition has been receiving significant attention from various research communities due to the availability of depth sensors and real-time depth-based 3D skeleton estimation algorithms. In this work, we use rolling maps for recognizing human actions from 3D skeletal data. The rolling map is a well-defined mathematical concept that has not been explored much by the vision community. First, we represent each skeleton using the relative 3D rotations between various body parts. Since 3D rotations are members of the special orthogonal group SO3, our skeletal representation becomes a point in the Lie group SO3 × ... × SO3, which is also a Riemannian manifold. Then, using this representation, we model human actions as curves in this Lie group. Since classification of curves in this non-Euclidean space is a difficult task, we unwrap the action curves onto the Lie algebra so3 × ... × so3 (which is a vector space) by combining the logarithm map with rolling maps, and perform classification in the Lie algebra. Experimental results on three action datasets show that the proposed approach performs equally well or better when compared to state-of-the-art.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"19 1","pages":"4471-4479"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78080726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 166
Learning Reconstruction-Based Remote Gaze Estimation 基于学习重构的远程凝视估计
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.375
Pei Yu, Jiahuan Zhou, Ying Wu
It is a challenging problem to accurately estimate gazes from low-resolution eye images that do not provide fine and detailed features for eyes. Existing methods attempt to establish the mapping between the visual appearance space to the gaze space. Different from the direct regression approach, the reconstruction-based approach represents appearance and gaze via local linear reconstruction in their own spaces. A common treatment is to use the same local reconstruction in the two spaces, i.e., the reconstruction weights in the appearance space are transferred to the gaze space for gaze reconstruction. However, this questionable treatment is taken for granted but has never been justified, leading to significant errors in gaze estimation. This paper is focused on the study of this fundamental issue. It shows that the distance metric in the appearance space needs to be adjusted, before the same reconstruction can be used. A novel method is proposed to learn the metric, such that the affinity structure of the appearance space under this new metric is as close as possible to the affinity structure of the gaze space under the normal Euclidean metric. Furthermore, the local affinity structure invariance is utilized to further regularize the solution to the reconstruction weights, so as to obtain a more robust and accurate solution. Effectiveness of the proposed method is validated and demonstrated through extensive experiments on different subjects.
如何从低分辨率人眼图像中准确地估计眼球是一个具有挑战性的问题,因为这些图像不能提供人眼的精细细节特征。现有的方法试图建立视觉外观空间与注视空间之间的映射关系。与直接回归方法不同的是,基于重建的方法通过在各自的空间中进行局部线性重建来表示外观和凝视。一种常见的处理方法是在两个空间中使用相同的局部重构,即将外观空间中的重构权重转移到凝视空间中进行凝视重构。然而,这种有问题的处理被认为是理所当然的,但从未被证明是合理的,这导致了凝视估计的重大错误。本文就是针对这一根本性问题进行研究的。这表明,在使用相同的重建之前,需要对外观空间中的距离度量进行调整。提出了一种新的度量学习方法,使外观空间在新度量下的亲和力结构尽可能接近于凝视空间在标准欧几里得度量下的亲和力结构。此外,利用局部亲和结构不变性对重构权值的解进行进一步正则化,从而得到更加鲁棒和精确的解。通过不同对象的大量实验,验证了该方法的有效性。
{"title":"Learning Reconstruction-Based Remote Gaze Estimation","authors":"Pei Yu, Jiahuan Zhou, Ying Wu","doi":"10.1109/CVPR.2016.375","DOIUrl":"https://doi.org/10.1109/CVPR.2016.375","url":null,"abstract":"It is a challenging problem to accurately estimate gazes from low-resolution eye images that do not provide fine and detailed features for eyes. Existing methods attempt to establish the mapping between the visual appearance space to the gaze space. Different from the direct regression approach, the reconstruction-based approach represents appearance and gaze via local linear reconstruction in their own spaces. A common treatment is to use the same local reconstruction in the two spaces, i.e., the reconstruction weights in the appearance space are transferred to the gaze space for gaze reconstruction. However, this questionable treatment is taken for granted but has never been justified, leading to significant errors in gaze estimation. This paper is focused on the study of this fundamental issue. It shows that the distance metric in the appearance space needs to be adjusted, before the same reconstruction can be used. A novel method is proposed to learn the metric, such that the affinity structure of the appearance space under this new metric is as close as possible to the affinity structure of the gaze space under the normal Euclidean metric. Furthermore, the local affinity structure invariance is utilized to further regularize the solution to the reconstruction weights, so as to obtain a more robust and accurate solution. Effectiveness of the proposed method is validated and demonstrated through extensive experiments on different subjects.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"89 1","pages":"3447-3455"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72865592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Robust Multilinear Model Learning Framework for 3D Faces 三维人脸的鲁棒多线性模型学习框架
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.531
Timo Bolkart, S. Wuhrer
Multilinear models are widely used to represent the statistical variations of 3D human faces as they decouple shape changes due to identity and expression. Existing methods to learn a multilinear face model degrade if not every person is captured in every expression, if face scans are noisy or partially occluded, if expressions are erroneously labeled, or if the vertex correspondence is inaccurate. These limitations impose requirements on the training data that disqualify large amounts of available 3D face data from being usable to learn a multilinear model. To overcome this, we introduce the first framework to robustly learn a multilinear model from 3D face databases with missing data, corrupt data, wrong semantic correspondence, and inaccurate vertex correspondence. To achieve this robustness to erroneous training data, our framework jointly learns a multilinear model and fixes the data. We evaluate our framework on two publicly available 3D face databases, and show that our framework achieves a data completion accuracy that is comparable to state-of-the-art tensor completion methods. Our method reconstructs corrupt data more accurately than state-of-the-art methods, and improves the quality of the learned model significantly for erroneously labeled expressions.
多线性模型被广泛用于表示三维人脸的统计变化,因为它可以解耦由于身份和表情引起的形状变化。如果没有在每个表情中捕捉到每个人,如果面部扫描有噪声或部分遮挡,如果表情被错误标记,或者顶点对应不准确,那么现有的学习多线性人脸模型的方法就会下降。这些限制对训练数据提出了要求,使大量可用的3D人脸数据无法用于学习多线性模型。为了克服这个问题,我们引入了第一个框架,从具有缺失数据、损坏数据、错误语义对应和不准确顶点对应的3D人脸数据库中鲁棒学习多线性模型。为了实现对错误训练数据的鲁棒性,我们的框架共同学习一个多线性模型并固定数据。我们在两个公开可用的3D人脸数据库上评估了我们的框架,并表明我们的框架实现了与最先进的张量补全方法相当的数据补全精度。我们的方法比目前最先进的方法更准确地重建损坏的数据,并显著提高了学习模型的质量。
{"title":"A Robust Multilinear Model Learning Framework for 3D Faces","authors":"Timo Bolkart, S. Wuhrer","doi":"10.1109/CVPR.2016.531","DOIUrl":"https://doi.org/10.1109/CVPR.2016.531","url":null,"abstract":"Multilinear models are widely used to represent the statistical variations of 3D human faces as they decouple shape changes due to identity and expression. Existing methods to learn a multilinear face model degrade if not every person is captured in every expression, if face scans are noisy or partially occluded, if expressions are erroneously labeled, or if the vertex correspondence is inaccurate. These limitations impose requirements on the training data that disqualify large amounts of available 3D face data from being usable to learn a multilinear model. To overcome this, we introduce the first framework to robustly learn a multilinear model from 3D face databases with missing data, corrupt data, wrong semantic correspondence, and inaccurate vertex correspondence. To achieve this robustness to erroneous training data, our framework jointly learns a multilinear model and fixes the data. We evaluate our framework on two publicly available 3D face databases, and show that our framework achieves a data completion accuracy that is comparable to state-of-the-art tensor completion methods. Our method reconstructs corrupt data more accurately than state-of-the-art methods, and improves the quality of the learned model significantly for erroneously labeled expressions.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"29 1","pages":"4911-4919"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73127906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Discriminative Invariant Kernel Features: A Bells-and-Whistles-Free Approach to Unsupervised Face Recognition and Pose Estimation 判别不变核特征:一种无监督人脸识别和姿态估计的无噪声方法
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.603
Dipan K. Pal, Felix Juefei-Xu, M. Savvides
We propose an explicitly discriminative and 'simple' approach to generate invariance to nuisance transformations modeled as unitary. In practice, the approach works well to handle non-unitary transformations as well. Our theoretical results extend the reach of a recent theory of invariance to discriminative and kernelized features based on unitary kernels. As a special case, a single common framework can be used to generate subject-specific pose-invariant features for face recognition and vice-versa for pose estimation. We show that our main proposed method (DIKF) can perform well under very challenging large-scale semisynthetic face matching and pose estimation protocols with unaligned faces using no landmarking whatsoever. We additionally benchmark on CMU MPIE and outperform previous work in almost all cases on off-angle face matching while we are on par with the previous state-of-the-art on the LFW unsupervised and image-restricted protocols, without any low-level image descriptors other than raw-pixels.
我们提出了一个明确的判别和“简单”的方法来生成不变性的麻烦转换建模为统一。在实践中,该方法也能很好地处理非酉变换。我们的理论结果将最近的不变性理论扩展到基于酉核的判别和核化特征。作为一种特殊情况,可以使用一个通用框架来生成人脸识别的特定对象的姿势不变特征,反之亦然,用于姿势估计。我们表明,我们提出的主要方法(DIKF)可以在非常具有挑战性的大规模半合成人脸匹配和不使用任何地标的未对齐人脸姿态估计协议下表现良好。我们还在CMU MPIE上进行了基准测试,并且在几乎所有情况下,在非角度人脸匹配方面都优于之前的工作,同时我们在LFW无监督和图像限制协议上与之前的最先进技术相当,除了原始像素之外没有任何低级图像描述符。
{"title":"Discriminative Invariant Kernel Features: A Bells-and-Whistles-Free Approach to Unsupervised Face Recognition and Pose Estimation","authors":"Dipan K. Pal, Felix Juefei-Xu, M. Savvides","doi":"10.1109/CVPR.2016.603","DOIUrl":"https://doi.org/10.1109/CVPR.2016.603","url":null,"abstract":"We propose an explicitly discriminative and 'simple' approach to generate invariance to nuisance transformations modeled as unitary. In practice, the approach works well to handle non-unitary transformations as well. Our theoretical results extend the reach of a recent theory of invariance to discriminative and kernelized features based on unitary kernels. As a special case, a single common framework can be used to generate subject-specific pose-invariant features for face recognition and vice-versa for pose estimation. We show that our main proposed method (DIKF) can perform well under very challenging large-scale semisynthetic face matching and pose estimation protocols with unaligned faces using no landmarking whatsoever. We additionally benchmark on CMU MPIE and outperform previous work in almost all cases on off-angle face matching while we are on par with the previous state-of-the-art on the LFW unsupervised and image-restricted protocols, without any low-level image descriptors other than raw-pixels.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"545 1","pages":"5590-5599"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76621960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Recurrent Convolutional Network for Video-Based Person Re-identification 基于视频的人物再识别的循环卷积网络
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.148
Niall McLaughlin, J. M. D. Rincón, P. Miller
In this paper we propose a novel recurrent neural network architecture for video-based person re-identification. Given the video sequence of a person, features are extracted from each frame using a convolutional neural network that incorporates a recurrent final layer, which allows information to flow between time-steps. The features from all timesteps are then combined using temporal pooling to give an overall appearance feature for the complete sequence. The convolutional network, recurrent layer, and temporal pooling layer, are jointly trained to act as a feature extractor for video-based re-identification using a Siamese network architecture. Our approach makes use of colour and optical flow information in order to capture appearance and motion information which is useful for video re-identification. Experiments are conduced on the iLIDS-VID and PRID-2011 datasets to show that this approach outperforms existing methods of video-based re-identification.
本文提出了一种新的基于视频的人物再识别递归神经网络结构。给定一个人的视频序列,使用包含循环最后层的卷积神经网络从每帧中提取特征,这允许信息在时间步之间流动。然后使用时间池将来自所有时间步骤的特征组合起来,以给出完整序列的总体外观特征。卷积网络、循环层和时间池化层被联合训练,作为使用暹罗网络架构的基于视频的再识别的特征提取器。我们的方法利用颜色和光流信息来捕捉外观和运动信息,这对视频的重新识别很有用。在iLIDS-VID和PRID-2011数据集上进行的实验表明,该方法优于现有的基于视频的再识别方法。
{"title":"Recurrent Convolutional Network for Video-Based Person Re-identification","authors":"Niall McLaughlin, J. M. D. Rincón, P. Miller","doi":"10.1109/CVPR.2016.148","DOIUrl":"https://doi.org/10.1109/CVPR.2016.148","url":null,"abstract":"In this paper we propose a novel recurrent neural network architecture for video-based person re-identification. Given the video sequence of a person, features are extracted from each frame using a convolutional neural network that incorporates a recurrent final layer, which allows information to flow between time-steps. The features from all timesteps are then combined using temporal pooling to give an overall appearance feature for the complete sequence. The convolutional network, recurrent layer, and temporal pooling layer, are jointly trained to act as a feature extractor for video-based re-identification using a Siamese network architecture. Our approach makes use of colour and optical flow information in order to capture appearance and motion information which is useful for video re-identification. Experiments are conduced on the iLIDS-VID and PRID-2011 datasets to show that this approach outperforms existing methods of video-based re-identification.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"-1 1","pages":"1325-1334"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81329621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 523
From Bows to Arrows: Rolling Shutter Rectification of Urban Scenes 从弓到箭:城市场景卷帘门整风
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.303
Vijay Rengarajan, A. Rajagopalan, R. Aravind
The rule of perspectivity that 'straight-lines-mustremain-straight' is easily inflected in CMOS cameras by distortions introduced by motion. Lines can be rendered as curves due to the row-wise exposure mechanism known as rolling shutter (RS). We solve the problem of correcting distortions arising from handheld cameras due to RS effect from a single image free from motion blur with special relevance to urban scenes. We develop a procedure to extract prominent curves from the RS image since this is essential for deciphering the varying row-wise motion. We pose an optimization problem with line desirability costs based on straightness, angle, and length, to resolve the geometric ambiguities while estimating the camera motion based on a rotation-only model assuming known camera intrinsic matrix. Finally, we rectify the RS image based on the estimated camera trajectory using inverse mapping. We show rectification results for RS images captured using mobile phone cameras. We also compare our single image method against existing video and nonblind RS rectification methods that typically require multiple images.
在CMOS相机中,“直线必须保持直线”的透视规则很容易被运动带来的扭曲所影响。由于被称为滚动快门(RS)的逐行曝光机制,线条可以呈现为曲线。我们解决了在没有运动模糊的单幅图像中对手持相机由于RS效应而产生的畸变进行校正的问题,与城市场景特别相关。我们开发了一个程序,以提取突出的曲线从RS图像,因为这是必要的破译变化的行向运动。我们提出了一个基于直线度,角度和长度的线期望成本优化问题,以解决几何模糊性,同时基于仅旋转模型估计相机运动,假设已知相机固有矩阵。最后,根据估计的相机轨迹,利用逆映射对RS图像进行校正。我们展示了使用手机相机拍摄的RS图像的校正结果。我们还将我们的单图像方法与现有的通常需要多幅图像的视频和非盲RS校正方法进行了比较。
{"title":"From Bows to Arrows: Rolling Shutter Rectification of Urban Scenes","authors":"Vijay Rengarajan, A. Rajagopalan, R. Aravind","doi":"10.1109/CVPR.2016.303","DOIUrl":"https://doi.org/10.1109/CVPR.2016.303","url":null,"abstract":"The rule of perspectivity that 'straight-lines-mustremain-straight' is easily inflected in CMOS cameras by distortions introduced by motion. Lines can be rendered as curves due to the row-wise exposure mechanism known as rolling shutter (RS). We solve the problem of correcting distortions arising from handheld cameras due to RS effect from a single image free from motion blur with special relevance to urban scenes. We develop a procedure to extract prominent curves from the RS image since this is essential for deciphering the varying row-wise motion. We pose an optimization problem with line desirability costs based on straightness, angle, and length, to resolve the geometric ambiguities while estimating the camera motion based on a rotation-only model assuming known camera intrinsic matrix. Finally, we rectify the RS image based on the estimated camera trajectory using inverse mapping. We show rectification results for RS images captured using mobile phone cameras. We also compare our single image method against existing video and nonblind RS rectification methods that typically require multiple images.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"74 1","pages":"2773-2781"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76169177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Efficient Temporal Sequence Comparison and Classification Using Gram Matrix Embeddings on a Riemannian Manifold 黎曼流形上基于Gram矩阵嵌入的有效时间序列比较与分类
Pub Date : 2016-06-27 DOI: 10.1109/CVPR.2016.487
Xikang Zhang, Yin Wang, Mengran Gou, M. Sznaier, O. Camps
In this paper we propose a new framework to compare and classify temporal sequences. The proposed approach captures the underlying dynamics of the data while avoiding expensive estimation procedures, making it suitable to process large numbers of sequences. The main idea is to first embed the sequences into a Riemannian manifold by using positive definite regularized Gram matrices of their Hankelets. The advantages of the this approach are: 1) it allows for using non-Euclidean similarity functions on the Positive Definite matrix manifold, which capture better the underlying geometry than directly comparing the sequences or their Hankel matrices, and 2) Gram matrices inherit desirable properties from the underlying Hankel matrices: their rank measure the complexity of the underlying dynamics, and the order and coefficients of the associated regressive models are invariant to affine transformations and varying initial conditions. The benefits of this approach are illustrated with extensive experiments in 3D action recognition using 3D joints sequences. In spite of its simplicity, the performance of this approach is competitive or better than using state-of-art approaches for this problem. Further, these results hold across a variety of metrics, supporting the idea that the improvement stems from the embedding itself, rather than from using one of these metrics.
在本文中,我们提出了一个新的框架来比较和分类时间序列。所提出的方法捕获了数据的潜在动态,同时避免了昂贵的估计过程,使其适合处理大量序列。主要思想是首先用正定正则化格拉姆矩阵将序列嵌入到黎曼流形中。这种方法的优点是:1)它允许在正定矩阵流形上使用非欧几里得相似函数,这比直接比较序列或它们的汉克尔矩阵更好地捕获底层几何,2)Gram矩阵继承了底层汉克尔矩阵的理想性质。它们的秩衡量了潜在动力学的复杂性,相关回归模型的阶数和系数对仿射变换和不同的初始条件是不变的。这种方法的好处是通过广泛的实验在三维动作识别使用三维关节序列说明。尽管它很简单,但这种方法的性能与使用最先进的方法相比具有竞争力或更好。此外,这些结果跨越了各种度量标准,支持改进源于嵌入本身的想法,而不是使用这些度量标准中的一个。
{"title":"Efficient Temporal Sequence Comparison and Classification Using Gram Matrix Embeddings on a Riemannian Manifold","authors":"Xikang Zhang, Yin Wang, Mengran Gou, M. Sznaier, O. Camps","doi":"10.1109/CVPR.2016.487","DOIUrl":"https://doi.org/10.1109/CVPR.2016.487","url":null,"abstract":"In this paper we propose a new framework to compare and classify temporal sequences. The proposed approach captures the underlying dynamics of the data while avoiding expensive estimation procedures, making it suitable to process large numbers of sequences. The main idea is to first embed the sequences into a Riemannian manifold by using positive definite regularized Gram matrices of their Hankelets. The advantages of the this approach are: 1) it allows for using non-Euclidean similarity functions on the Positive Definite matrix manifold, which capture better the underlying geometry than directly comparing the sequences or their Hankel matrices, and 2) Gram matrices inherit desirable properties from the underlying Hankel matrices: their rank measure the complexity of the underlying dynamics, and the order and coefficients of the associated regressive models are invariant to affine transformations and varying initial conditions. The benefits of this approach are illustrated with extensive experiments in 3D action recognition using 3D joints sequences. In spite of its simplicity, the performance of this approach is competitive or better than using state-of-art approaches for this problem. Further, these results hold across a variety of metrics, supporting the idea that the improvement stems from the embedding itself, rather than from using one of these metrics.","PeriodicalId":6515,"journal":{"name":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"7 1","pages":"4498-4507"},"PeriodicalIF":0.0,"publicationDate":"2016-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87168567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
期刊
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1