首页 > 最新文献

2013 IEEE Conference on Computer Vision and Pattern Recognition最新文献

英文 中文
First-Person Activity Recognition: What Are They Doing to Me? 第一人称活动识别:他们对我做了什么?
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.352
M. Ryoo, L. Matthies
This paper discusses the problem of recognizing interaction-level human activities from a first-person viewpoint. The goal is to enable an observer (e.g., a robot or a wearable camera) to understand 'what activity others are performing to it' from continuous video inputs. These include friendly interactions such as 'a person hugging the observer' as well as hostile interactions like 'punching the observer' or 'throwing objects to the observer', whose videos involve a large amount of camera ego-motion caused by physical interactions. The paper investigates multi-channel kernels to integrate global and local motion information, and presents a new activity learning/recognition methodology that explicitly considers temporal structures displayed in first-person activity videos. In our experiments, we not only show classification results with segmented videos, but also confirm that our new approach is able to detect activities from continuous videos reliably.
本文讨论了从第一人称视角识别交互级人类活动的问题。目标是使观察者(例如,机器人或可穿戴相机)能够从连续的视频输入中了解“其他人正在对它执行什么活动”。这些互动包括友好的互动,如“一个人拥抱观察者”,以及敌对的互动,如“殴打观察者”或“向观察者扔东西”,这些视频涉及大量由身体互动引起的摄像机自我运动。本文研究了多通道核函数来整合全局和局部运动信息,并提出了一种新的活动学习/识别方法,该方法明确考虑了第一人称活动视频中显示的时间结构。在我们的实验中,我们不仅展示了分割视频的分类结果,而且证实了我们的新方法能够可靠地从连续视频中检测出活动。
{"title":"First-Person Activity Recognition: What Are They Doing to Me?","authors":"M. Ryoo, L. Matthies","doi":"10.1109/CVPR.2013.352","DOIUrl":"https://doi.org/10.1109/CVPR.2013.352","url":null,"abstract":"This paper discusses the problem of recognizing interaction-level human activities from a first-person viewpoint. The goal is to enable an observer (e.g., a robot or a wearable camera) to understand 'what activity others are performing to it' from continuous video inputs. These include friendly interactions such as 'a person hugging the observer' as well as hostile interactions like 'punching the observer' or 'throwing objects to the observer', whose videos involve a large amount of camera ego-motion caused by physical interactions. The paper investigates multi-channel kernels to integrate global and local motion information, and presents a new activity learning/recognition methodology that explicitly considers temporal structures displayed in first-person activity videos. In our experiments, we not only show classification results with segmented videos, but also confirm that our new approach is able to detect activities from continuous videos reliably.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"116 1","pages":"2730-2737"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80421715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 287
A Fast Approximate AIB Algorithm for Distributional Word Clustering 分布式词聚类的快速近似AIB算法
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.78
Lei Wang, Jianjia Zhang, Luping Zhou, W. Li
Distributional word clustering merges the words having similar probability distributions to attain reliable parameter estimation, compact classification models and even better classification performance. Agglomerative Information Bottleneck (AIB) is one of the typical word clustering algorithms and has been applied to both traditional text classification and recent image recognition. Although enjoying theoretical elegance, AIB has one main issue on its computational efficiency, especially when clustering a large number of words. Different from existing solutions to this issue, we analyze the characteristics of its objective function-the loss of mutual information, and show that by merely using the ratio of word-class joint probabilities of each word, good candidate word pairs for merging can be easily identified. Based on this finding, we propose a fast approximate AIB algorithm and show that it can significantly improve the computational efficiency of AIB while well maintaining or even slightly increasing its classification performance. Experimental study on both text and image classification benchmark data sets shows that our algorithm can achieve more than 100 times speedup on large real data sets over the state-of-the-art method.
分布式词聚类将具有相似概率分布的词进行合并,从而获得可靠的参数估计、紧凑的分类模型和更好的分类性能。聚类信息瓶颈(AIB)是一种典型的词聚类算法,在传统的文本分类和最近的图像识别中都得到了应用。尽管AIB在理论上很优雅,但它在计算效率上有一个主要问题,特别是在对大量单词进行聚类时。与现有方法不同的是,本文分析了其目标函数互信息损失的特点,并表明仅使用每个词的词类联合概率的比值就可以很容易地识别出用于合并的候选词对。基于这一发现,我们提出了一种快速近似AIB算法,并表明该算法在保持AIB分类性能良好甚至略有提高的同时,可以显著提高AIB的计算效率。在文本和图像分类基准数据集上的实验研究表明,我们的算法在大型真实数据集上的速度比目前最先进的方法提高了100倍以上。
{"title":"A Fast Approximate AIB Algorithm for Distributional Word Clustering","authors":"Lei Wang, Jianjia Zhang, Luping Zhou, W. Li","doi":"10.1109/CVPR.2013.78","DOIUrl":"https://doi.org/10.1109/CVPR.2013.78","url":null,"abstract":"Distributional word clustering merges the words having similar probability distributions to attain reliable parameter estimation, compact classification models and even better classification performance. Agglomerative Information Bottleneck (AIB) is one of the typical word clustering algorithms and has been applied to both traditional text classification and recent image recognition. Although enjoying theoretical elegance, AIB has one main issue on its computational efficiency, especially when clustering a large number of words. Different from existing solutions to this issue, we analyze the characteristics of its objective function-the loss of mutual information, and show that by merely using the ratio of word-class joint probabilities of each word, good candidate word pairs for merging can be easily identified. Based on this finding, we propose a fast approximate AIB algorithm and show that it can significantly improve the computational efficiency of AIB while well maintaining or even slightly increasing its classification performance. Experimental study on both text and image classification benchmark data sets shows that our algorithm can achieve more than 100 times speedup on large real data sets over the state-of-the-art method.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"58 1","pages":"556-563"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80504444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Detection Evolution with Multi-order Contextual Co-occurrence 多阶上下文共现的检测进化
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.235
Guang Chen, Yuanyuan Ding, Jing Xiao, T. Han
Context has been playing an increasingly important role to improve the object detection performance. In this paper we propose an effective representation, Multi-Order Contextual co-Occurrence (MOCO), to implicitly model the high level context using solely detection responses from a baseline object detector. The so-called (1st-order) context feature is computed as a set of randomized binary comparisons on the response map of the baseline object detector. The statistics of the 1st-order binary context features are further calculated to construct a high order co-occurrence descriptor. Combining the MOCO feature with the original image feature, we can evolve the baseline object detector to a stronger context aware detector. With the updated detector, we can continue the evolution till the contextual improvements saturate. Using the successful deformable-part-model detector [13] as the baseline detector, we test the proposed MOCO evolution framework on the PASCAL VOC 2007 dataset [8] and Caltech pedestrian dataset [7]: The proposed MOCO detector outperforms all known state-of-the-art approaches, contextually boosting deformable part models (ver. 5) [13] by 3.3% in mean average precision on the PASCAL 2007 dataset. For the Caltech pedestrian dataset, our method further reduces the log-average miss rate from 48% to 46% and the miss rate at 1 FPPI from 25% to 23%, compared with the best prior art [6].
上下文在提高目标检测性能方面发挥着越来越重要的作用。在本文中,我们提出了一种有效的表示,即多阶上下文共现(MOCO),它仅使用来自基线对象检测器的检测响应来隐式地建模高级上下文。所谓的(一阶)上下文特征被计算为基线目标检测器响应图上的一组随机二进制比较。进一步计算了一阶二元上下文特征的统计量,构造了一个高阶共现描述符。将MOCO特征与原始图像特征相结合,我们可以将基线目标检测器进化为更强的上下文感知检测器。有了更新的检测器,我们可以继续进化,直到上下文改进饱和。使用成功的可变形零件模型检测器[13]作为基线检测器,我们在PASCAL VOC 2007数据集[8]和Caltech行人数据集[7]上测试了所提出的MOCO进化框架:所提出的MOCO检测器优于所有已知的最先进的方法,在上下文中增强了可变形零件模型(版本1)。5)[13]在PASCAL 2007数据集上的平均精度提高了3.3%。对于加州理工学院行人数据集,与最佳现有技术[6]相比,我们的方法进一步将对数平均缺失率从48%降低到46%,将1 FPPI的缺失率从25%降低到23%。
{"title":"Detection Evolution with Multi-order Contextual Co-occurrence","authors":"Guang Chen, Yuanyuan Ding, Jing Xiao, T. Han","doi":"10.1109/CVPR.2013.235","DOIUrl":"https://doi.org/10.1109/CVPR.2013.235","url":null,"abstract":"Context has been playing an increasingly important role to improve the object detection performance. In this paper we propose an effective representation, Multi-Order Contextual co-Occurrence (MOCO), to implicitly model the high level context using solely detection responses from a baseline object detector. The so-called (1st-order) context feature is computed as a set of randomized binary comparisons on the response map of the baseline object detector. The statistics of the 1st-order binary context features are further calculated to construct a high order co-occurrence descriptor. Combining the MOCO feature with the original image feature, we can evolve the baseline object detector to a stronger context aware detector. With the updated detector, we can continue the evolution till the contextual improvements saturate. Using the successful deformable-part-model detector [13] as the baseline detector, we test the proposed MOCO evolution framework on the PASCAL VOC 2007 dataset [8] and Caltech pedestrian dataset [7]: The proposed MOCO detector outperforms all known state-of-the-art approaches, contextually boosting deformable part models (ver. 5) [13] by 3.3% in mean average precision on the PASCAL 2007 dataset. For the Caltech pedestrian dataset, our method further reduces the log-average miss rate from 48% to 46% and the miss rate at 1 FPPI from 25% to 23%, compared with the best prior art [6].","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"4 1","pages":"1798-1805"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81398816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 93
Supervised Semantic Gradient Extraction Using Linear-Time Optimization 基于线性时间优化的监督语义梯度提取
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.364
Shulin Yang, Jue Wang, L. Shapiro
This paper proposes a new supervised semantic edge and gradient extraction approach, which allows the user to roughly scribble over the desired region to extract semantically-dominant and coherent edges in it. Our approach first extracts low-level edge lets (small edge clusters) from the input image as primitives and build a graph upon them, by jointly considering both the geometric and appearance compatibility of edge lets. Given the characteristics of the graph, it cannot be effectively optimized by commonly-used energy minimization tools such as graph cuts. We thus propose an efficient linear algorithm for precise graph optimization, by taking advantage of the special structure of the graph. %Optimal parameter settings of the model are learnt from a dataset. Objective evaluations show that the proposed method significantly outperforms previous semantic edge detection algorithms. Finally, we demonstrate the effectiveness of the system in various image editing tasks.
本文提出了一种新的有监督的语义边缘和梯度提取方法,该方法允许用户在期望的区域上粗略地涂写,以提取其中的语义优势和连贯边缘。我们的方法首先从输入图像中提取低级边缘let(小边缘簇)作为基元,并通过共同考虑边缘let的几何和外观兼容性在其上构建图。考虑到图的特性,常用的能量最小化工具(如图割)无法对图进行有效优化。因此,我们利用图的特殊结构,提出了一种有效的线性算法来进行精确的图优化。模型的最优参数设置是从数据集中学习的。客观评价表明,该方法明显优于现有的语义边缘检测算法。最后,我们验证了该系统在各种图像编辑任务中的有效性。
{"title":"Supervised Semantic Gradient Extraction Using Linear-Time Optimization","authors":"Shulin Yang, Jue Wang, L. Shapiro","doi":"10.1109/CVPR.2013.364","DOIUrl":"https://doi.org/10.1109/CVPR.2013.364","url":null,"abstract":"This paper proposes a new supervised semantic edge and gradient extraction approach, which allows the user to roughly scribble over the desired region to extract semantically-dominant and coherent edges in it. Our approach first extracts low-level edge lets (small edge clusters) from the input image as primitives and build a graph upon them, by jointly considering both the geometric and appearance compatibility of edge lets. Given the characteristics of the graph, it cannot be effectively optimized by commonly-used energy minimization tools such as graph cuts. We thus propose an efficient linear algorithm for precise graph optimization, by taking advantage of the special structure of the graph. %Optimal parameter settings of the model are learnt from a dataset. Objective evaluations show that the proposed method significantly outperforms previous semantic edge detection algorithms. Finally, we demonstrate the effectiveness of the system in various image editing tasks.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"45 1","pages":"2826-2833"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81441017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
City-Scale Change Detection in Cadastral 3D Models Using Images 基于图像的地籍三维模型城市尺度变化检测
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.22
Aparna Taneja, Luca Ballan, M. Pollefeys
In this paper, we propose a method to detect changes in the geometry of a city using panoramic images captured by a car driving around the city. We designed our approach to account for all the challenges involved in a large scale application of change detection, such as, inaccuracies in the input geometry, errors in the geo-location data of the images, as well as, the limited amount of information due to sparse imagery. We evaluated our approach on an area of 6 square kilometers inside a city, using 3420 images downloaded from Google Street View. These images besides being publicly available, are also a good example of panoramic images captured with a driving vehicle, and hence demonstrating all the possible challenges resulting from such an acquisition. We also quantitatively compared the performance of our approach with respect to a ground truth, as well as to prior work. This evaluation shows that our approach outperforms the current state of the art.
在本文中,我们提出了一种方法,利用在城市周围行驶的汽车捕获的全景图像来检测城市几何形状的变化。我们设计了我们的方法来解释大规模应用变化检测所涉及的所有挑战,例如,输入几何形状的不准确,图像地理位置数据的错误,以及由于稀疏图像而导致的信息量有限。我们在一个城市内6平方公里的区域评估了我们的方法,使用了从谷歌街景下载的3420张图像。这些图像除了可以公开获取外,也是驾驶车辆拍摄的全景图像的一个很好的例子,因此展示了此类获取可能带来的所有挑战。我们还定量地比较了我们的方法相对于一个基本真理的性能,以及之前的工作。这一评估表明,我们的方法优于目前最先进的方法。
{"title":"City-Scale Change Detection in Cadastral 3D Models Using Images","authors":"Aparna Taneja, Luca Ballan, M. Pollefeys","doi":"10.1109/CVPR.2013.22","DOIUrl":"https://doi.org/10.1109/CVPR.2013.22","url":null,"abstract":"In this paper, we propose a method to detect changes in the geometry of a city using panoramic images captured by a car driving around the city. We designed our approach to account for all the challenges involved in a large scale application of change detection, such as, inaccuracies in the input geometry, errors in the geo-location data of the images, as well as, the limited amount of information due to sparse imagery. We evaluated our approach on an area of 6 square kilometers inside a city, using 3420 images downloaded from Google Street View. These images besides being publicly available, are also a good example of panoramic images captured with a driving vehicle, and hence demonstrating all the possible challenges resulting from such an acquisition. We also quantitatively compared the performance of our approach with respect to a ground truth, as well as to prior work. This evaluation shows that our approach outperforms the current state of the art.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"19 1","pages":"113-120"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89607666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 78
Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis 物理上可信的3D场景跟踪:单演员假设
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.9
Nikolaos Kyriazis, Antonis A. Argyros
In several hand-object(s) interaction scenarios, the change in the objects' state is a direct consequence of the hand's motion. This has a straightforward representation in Newtonian dynamics. We present the first approach that exploits this observation to perform model-based 3D tracking of a table-top scene comprising passive objects and an active hand. Our forward modelling of 3D hand-object(s) interaction regards both the appearance and the physical state of the scene and is parameterized over the hand motion (26 DoFs) between two successive instants in time. We demonstrate that our approach manages to track the 3D pose of all objects and the 3D pose and articulation of the hand by only searching for the parameters of the hand motion. In the proposed framework, covert scene state is inferred by connecting it to the overt state, through the incorporation of physics. Thus, our tracking approach treats a variety of challenging observability issues in a principled manner, without the need to resort to heuristics.
在一些手-物体交互场景中,物体状态的变化是手运动的直接结果。这在牛顿动力学中有一个直接的表示。我们提出了第一种方法,利用这种观察来执行基于模型的3D跟踪桌面场景,包括被动对象和主动手。我们对三维手-物交互的前向建模考虑了场景的外观和物理状态,并在两个连续瞬间之间的手运动(26自由度)上进行了参数化。我们证明了我们的方法能够通过仅搜索手部运动的参数来跟踪所有物体的3D姿态以及手部的3D姿态和关节。在提出的框架中,隐蔽场景状态是通过将其连接到公开状态来推断的,通过结合物理学。因此,我们的跟踪方法以原则性的方式处理各种具有挑战性的可观察性问题,而不需要诉诸启发式。
{"title":"Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis","authors":"Nikolaos Kyriazis, Antonis A. Argyros","doi":"10.1109/CVPR.2013.9","DOIUrl":"https://doi.org/10.1109/CVPR.2013.9","url":null,"abstract":"In several hand-object(s) interaction scenarios, the change in the objects' state is a direct consequence of the hand's motion. This has a straightforward representation in Newtonian dynamics. We present the first approach that exploits this observation to perform model-based 3D tracking of a table-top scene comprising passive objects and an active hand. Our forward modelling of 3D hand-object(s) interaction regards both the appearance and the physical state of the scene and is parameterized over the hand motion (26 DoFs) between two successive instants in time. We demonstrate that our approach manages to track the 3D pose of all objects and the 3D pose and articulation of the hand by only searching for the parameters of the hand motion. In the proposed framework, covert scene state is inferred by connecting it to the overt state, through the incorporation of physics. Thus, our tracking approach treats a variety of challenging observability issues in a principled manner, without the need to resort to heuristics.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"62 1","pages":"9-16"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90369191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 70
Sparse Quantization for Patch Description 补丁描述的稀疏量化
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.366
X. Boix, Michael Gygli, G. Roig, L. Gool
The representation of local image patches is crucial for the good performance and efficiency of many vision tasks. Patch descriptors have been designed to generalize towards diverse variations, depending on the application, as well as the desired compromise between accuracy and efficiency. We present a novel formulation of patch description, that serves such issues well. Sparse quantization lies at its heart. This allows for efficient encodings, leading to powerful, novel binary descriptors, yet also to the generalization of existing descriptors like SIFT or BRIEF. We demonstrate the capabilities of our formulation for both key point matching and image classification. Our binary descriptors achieve state-of-the-art results for two key point matching benchmarks, namely those by Brown and Mikolajczyk. For image classification, we propose new descriptors, that perform similar to SIFT on Caltech101 and PASCAL VOC07.
局部图像块的表示对于许多视觉任务的良好性能和效率至关重要。补丁描述符被设计成根据不同的应用,以及在准确性和效率之间期望的折衷来概括不同的变化。我们提出了一种新的补丁描述公式,可以很好地解决这些问题。稀疏量化是其核心。这允许有效的编码,从而产生强大的、新颖的二进制描述符,同时也实现了现有描述符(如SIFT或BRIEF)的一般化。我们演示了我们的公式在关键点匹配和图像分类方面的能力。我们的二进制描述符实现了两个关键点匹配基准的最先进的结果,即Brown和Mikolajczyk的那些。对于图像分类,我们提出了新的描述符,其性能与Caltech101和PASCAL VOC07上的SIFT相似。
{"title":"Sparse Quantization for Patch Description","authors":"X. Boix, Michael Gygli, G. Roig, L. Gool","doi":"10.1109/CVPR.2013.366","DOIUrl":"https://doi.org/10.1109/CVPR.2013.366","url":null,"abstract":"The representation of local image patches is crucial for the good performance and efficiency of many vision tasks. Patch descriptors have been designed to generalize towards diverse variations, depending on the application, as well as the desired compromise between accuracy and efficiency. We present a novel formulation of patch description, that serves such issues well. Sparse quantization lies at its heart. This allows for efficient encodings, leading to powerful, novel binary descriptors, yet also to the generalization of existing descriptors like SIFT or BRIEF. We demonstrate the capabilities of our formulation for both key point matching and image classification. Our binary descriptors achieve state-of-the-art results for two key point matching benchmarks, namely those by Brown and Mikolajczyk. For image classification, we propose new descriptors, that perform similar to SIFT on Caltech101 and PASCAL VOC07.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"46 1","pages":"2842-2849"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89957834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories 基于特征回归的类生成模型用于目标类别姿态估计
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.103
Michele Fenzi, L. Leal-Taixé, B. Rosenhahn, J. Ostermann
In this paper, we propose a method for learning a class representation that can return a continuous value for the pose of an unknown class instance using only 2D data and weak 3D labeling information. Our method is based on generative feature models, i.e., regression functions learned from local descriptors of the same patch collected under different viewpoints. The individual generative models are then clustered in order to create class generative models which form the class representation. At run-time, the pose of the query image is estimated in a maximum a posteriori fashion by combining the regression functions belonging to the matching clusters. We evaluate our approach on the EPFL car dataset and the Pointing'04 face dataset. Experimental results show that our method outperforms by 10% the state-of-the-art in the first dataset and by 9% in the second.
在本文中,我们提出了一种学习类表示的方法,该方法可以仅使用2D数据和弱3D标记信息返回未知类实例姿态的连续值。我们的方法是基于生成特征模型,即从不同视点下收集的同一patch的局部描述符中学习到的回归函数。然后对各个生成模型进行聚类,以创建形成类表示的类生成模型。在运行时,通过组合属于匹配簇的回归函数,以最大后验方式估计查询图像的姿态。我们在EPFL汽车数据集和point04人脸数据集上评估了我们的方法。实验结果表明,我们的方法在第一个数据集中比最先进的数据集高出10%,在第二个数据集中高出9%。
{"title":"Class Generative Models Based on Feature Regression for Pose Estimation of Object Categories","authors":"Michele Fenzi, L. Leal-Taixé, B. Rosenhahn, J. Ostermann","doi":"10.1109/CVPR.2013.103","DOIUrl":"https://doi.org/10.1109/CVPR.2013.103","url":null,"abstract":"In this paper, we propose a method for learning a class representation that can return a continuous value for the pose of an unknown class instance using only 2D data and weak 3D labeling information. Our method is based on generative feature models, i.e., regression functions learned from local descriptors of the same patch collected under different viewpoints. The individual generative models are then clustered in order to create class generative models which form the class representation. At run-time, the pose of the query image is estimated in a maximum a posteriori fashion by combining the regression functions belonging to the matching clusters. We evaluate our approach on the EPFL car dataset and the Pointing'04 face dataset. Experimental results show that our method outperforms by 10% the state-of-the-art in the first dataset and by 9% in the second.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"71 1","pages":"755-762"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85211272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Studying Relationships between Human Gaze, Description, and Computer Vision 研究人类凝视、描述和计算机视觉之间的关系
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.101
Kiwon Yun, Yifan Peng, D. Samaras, G. Zelinsky, Tamara L. Berg
We posit that user behavior during natural viewing of images contains an abundance of information about the content of images as well as information related to user intent and user defined content importance. In this paper, we conduct experiments to better understand the relationship between images, the eye movements people make while viewing images, and how people construct natural language to describe images. We explore these relationships in the context of two commonly used computer vision datasets. We then further relate human cues with outputs of current visual recognition systems and demonstrate prototype applications for gaze-enabled detection and annotation.
我们假设用户在自然观看图像时的行为包含了大量关于图像内容的信息,以及与用户意图和用户定义的内容重要性相关的信息。在本文中,我们通过实验来更好地理解图像之间的关系,人们在观看图像时的眼球运动,以及人们如何构建自然语言来描述图像。我们在两个常用的计算机视觉数据集的背景下探讨这些关系。然后,我们进一步将人类线索与当前视觉识别系统的输出联系起来,并演示了基于凝视的检测和注释的原型应用程序。
{"title":"Studying Relationships between Human Gaze, Description, and Computer Vision","authors":"Kiwon Yun, Yifan Peng, D. Samaras, G. Zelinsky, Tamara L. Berg","doi":"10.1109/CVPR.2013.101","DOIUrl":"https://doi.org/10.1109/CVPR.2013.101","url":null,"abstract":"We posit that user behavior during natural viewing of images contains an abundance of information about the content of images as well as information related to user intent and user defined content importance. In this paper, we conduct experiments to better understand the relationship between images, the eye movements people make while viewing images, and how people construct natural language to describe images. We explore these relationships in the context of two commonly used computer vision datasets. We then further relate human cues with outputs of current visual recognition systems and demonstrate prototype applications for gaze-enabled detection and annotation.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"10 1","pages":"739-746"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84478752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 88
Efficient Detector Adaptation for Object Detection in a Video 高效检测器自适应视频中目标检测
Pub Date : 2013-06-23 DOI: 10.1109/CVPR.2013.418
Pramod Sharma, R. Nevatia
In this work, we present a novel and efficient detector adaptation method which improves the performance of an offline trained classifier (baseline classifier) by adapting it to new test datasets. We address two critical aspects of adaptation methods: generalizability and computational efficiency. We propose an adaptation method, which can be applied to various baseline classifiers and is computationally efficient also. For a given test video, we collect online samples in an unsupervised manner and train a random fern adaptive classifier. The adaptive classifier improves precision of the baseline classifier by validating the obtained detection responses from baseline classifier as correct detections or false alarms. Experiments demonstrate generalizability, computational efficiency and effectiveness of our method, as we compare our method with state of the art approaches for the problem of human detection and show good performance with high computational efficiency on two different baseline classifiers.
在这项工作中,我们提出了一种新颖有效的检测器自适应方法,该方法通过使离线训练的分类器(基线分类器)适应新的测试数据集来提高其性能。我们解决了适应方法的两个关键方面:概括性和计算效率。我们提出了一种自适应方法,该方法可以应用于各种基线分类器,并且计算效率也很高。对于给定的测试视频,我们以无监督的方式在线收集样本并训练随机蕨类植物自适应分类器。自适应分类器通过将基线分类器获得的检测响应验证为正确检测或假警报来提高基线分类器的精度。实验证明了我们的方法的可泛化性、计算效率和有效性,因为我们将我们的方法与人类检测问题的最新方法进行了比较,并在两个不同的基线分类器上显示出良好的性能和高计算效率。
{"title":"Efficient Detector Adaptation for Object Detection in a Video","authors":"Pramod Sharma, R. Nevatia","doi":"10.1109/CVPR.2013.418","DOIUrl":"https://doi.org/10.1109/CVPR.2013.418","url":null,"abstract":"In this work, we present a novel and efficient detector adaptation method which improves the performance of an offline trained classifier (baseline classifier) by adapting it to new test datasets. We address two critical aspects of adaptation methods: generalizability and computational efficiency. We propose an adaptation method, which can be applied to various baseline classifiers and is computationally efficient also. For a given test video, we collect online samples in an unsupervised manner and train a random fern adaptive classifier. The adaptive classifier improves precision of the baseline classifier by validating the obtained detection responses from baseline classifier as correct detections or false alarms. Experiments demonstrate generalizability, computational efficiency and effectiveness of our method, as we compare our method with state of the art approaches for the problem of human detection and show good performance with high computational efficiency on two different baseline classifiers.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"721 1","pages":"3254-3261"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83484946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
期刊
2013 IEEE Conference on Computer Vision and Pattern Recognition
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1