首页 > 最新文献

2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)最新文献

英文 中文
Recursive estimation of generative models of video 视频生成模型的递归估计
Nemanja Petrović, A. Ivanovic, N. Jojic
In this paper we present a generative model and learning procedure for unsupervised video clustering into scenes. The work addresses two important problems: realistic modeling of the sources of variability in the video and fast transformation invariant frame clustering. We suggest a solution to the problem of computationally intensive learning in this model by combining the recursive model estimation, fast inference, and on-line learning. Thus, we achieve real time frame clustering performance. Novel aspects of this method include an algorithm for the clustering of Gaussian mixtures, and the fast computation of the KL divergence between two mixtures of Gaussians. The efficiency and the performance of clustering and KL approximation methods are demonstrated. We also present novel video browsing tool based on the visualization of the variables in the generative model.
本文提出了一种无监督视频聚类的生成模型和学习过程。该工作解决了两个重要问题:视频中可变性源的逼真建模和快速变换不变帧聚类。我们提出了一种将递归模型估计、快速推理和在线学习相结合的方法来解决该模型中计算密集型学习的问题。因此,我们实现了实时帧聚类性能。该方法的新颖之处包括高斯混合聚类算法,以及两个高斯混合间KL散度的快速计算。证明了聚类和KL近似方法的效率和性能。我们还提出了一种新的基于生成模型中变量可视化的视频浏览工具。
{"title":"Recursive estimation of generative models of video","authors":"Nemanja Petrović, A. Ivanovic, N. Jojic","doi":"10.1109/CVPR.2006.248","DOIUrl":"https://doi.org/10.1109/CVPR.2006.248","url":null,"abstract":"In this paper we present a generative model and learning procedure for unsupervised video clustering into scenes. The work addresses two important problems: realistic modeling of the sources of variability in the video and fast transformation invariant frame clustering. We suggest a solution to the problem of computationally intensive learning in this model by combining the recursive model estimation, fast inference, and on-line learning. Thus, we achieve real time frame clustering performance. Novel aspects of this method include an algorithm for the clustering of Gaussian mixtures, and the fast computation of the KL divergence between two mixtures of Gaussians. The efficiency and the performance of clustering and KL approximation methods are demonstrated. We also present novel video browsing tool based on the visualization of the variables in the generative model.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124121065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Structure from Motion with Known Camera Positions 结构从运动与已知的相机位置
R. Carceroni, Ankita Kumar, Kostas Daniilidis
The wide availability of GPS sensors is changing the landscape in the applications of structure from motion techniques for localization. In this paper, we study the problem of estimating camera orientations from multiple views, given the positions of the viewpoints in a world coordinate system and a set of point correspondences across the views. Given three or more views, the above problem has a finite number of solutions for three or more point correspondences. Given six or more views, the problem has a finite number of solutions for just two or more points. In the three-view case, we show the necessary and sufficient conditions for the three essential matrices to be consistent with a set of known baselines. We also introduce a method to recover the absolute orientations of three views in world coordinates from their essential matrices. To refine these estimates we perform a least-squares minimization on the group cross product SO(3) × SO(3) × SO(3). We report experiments on synthetic data and on data from the ICCV2005 Computer Vision Contest.
GPS传感器的广泛应用正在改变结构从运动技术到定位的应用格局。本文研究了在给定视点在世界坐标系中的位置和视点间的一组对应点的情况下,从多个视点估计相机方向的问题。给定三个或更多视图,上述问题对于三个或更多点对应具有有限个数的解。给定六个或更多的视图,该问题只有两个或更多点的有限数量的解决方案。在三视图的情况下,我们展示了三个基本矩阵与一组已知基线一致的必要和充分条件。我们还介绍了一种从世界坐标的三个视图的基本矩阵中恢复其绝对方向的方法。为了改进这些估计,我们对群外积SO(3) × SO(3) × SO(3)执行最小二乘最小化。我们报告了合成数据和ICCV2005计算机视觉竞赛数据的实验。
{"title":"Structure from Motion with Known Camera Positions","authors":"R. Carceroni, Ankita Kumar, Kostas Daniilidis","doi":"10.1109/CVPR.2006.296","DOIUrl":"https://doi.org/10.1109/CVPR.2006.296","url":null,"abstract":"The wide availability of GPS sensors is changing the landscape in the applications of structure from motion techniques for localization. In this paper, we study the problem of estimating camera orientations from multiple views, given the positions of the viewpoints in a world coordinate system and a set of point correspondences across the views. Given three or more views, the above problem has a finite number of solutions for three or more point correspondences. Given six or more views, the problem has a finite number of solutions for just two or more points. In the three-view case, we show the necessary and sufficient conditions for the three essential matrices to be consistent with a set of known baselines. We also introduce a method to recover the absolute orientations of three views in world coordinates from their essential matrices. To refine these estimates we perform a least-squares minimization on the group cross product SO(3) × SO(3) × SO(3). We report experiments on synthetic data and on data from the ICCV2005 Computer Vision Contest.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127917884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Multiscale Nonlinear Diffusion and Shock Filter for Ultrasound Image Enhancement 超声图像增强的多尺度非线性扩散和冲击滤波
Fan Zhang, Y. Yoo, Yongmin Kim, Lichen Zhang, L. M. Koh
A new noise reduction and edge enhancement method, i.e., Laplacian pyramid-based nonlinear diffusion and shock filter (LPNDSF), is proposed for medical ultrasound imaging. In the proposed LPNDSF, a coupled nonlinear diffusion and shock filter process is applied in Laplacian pyramid domain of an image, to remove speckle and enhance edges simultaneously. The performance of the proposed method was evaluated on a phantom and a real ultrasound image. In the phantom study, we obtained an average gain of 0.55 and 1.11 in contrast-to-noise ratio compared to the speckle reducing anisotropic diffusion (SRAD) and nonlinear coherent diffusion (NCD), respectively. Also, the proposed LPNDSF showed clearer boundaries on the phantom and the real ultrasound image. These preliminary results indicate that the proposed LPNDSF can effectively reduce speckle noise while enhancing image edges for retaining subtle features.
提出了一种新的医学超声成像降噪和边缘增强方法,即基于拉普拉斯金字塔的非线性扩散和冲击滤波(LPNDSF)。在该算法中,在图像的拉普拉斯金字塔域采用非线性扩散和冲击耦合滤波处理,同时去除斑点和增强边缘。在仿真和真实超声图像上对该方法的性能进行了评价。在模体研究中,与散斑减少各向异性扩散(SRAD)和非线性相干扩散(NCD)相比,我们分别获得了0.55和1.11的平均增益。此外,所提出的LPNDSF在虚影和真实超声图像上的边界更清晰。这些初步结果表明,所提出的LPNDSF可以有效地降低散斑噪声,同时增强图像边缘以保留细微特征。
{"title":"Multiscale Nonlinear Diffusion and Shock Filter for Ultrasound Image Enhancement","authors":"Fan Zhang, Y. Yoo, Yongmin Kim, Lichen Zhang, L. M. Koh","doi":"10.1109/CVPR.2006.203","DOIUrl":"https://doi.org/10.1109/CVPR.2006.203","url":null,"abstract":"A new noise reduction and edge enhancement method, i.e., Laplacian pyramid-based nonlinear diffusion and shock filter (LPNDSF), is proposed for medical ultrasound imaging. In the proposed LPNDSF, a coupled nonlinear diffusion and shock filter process is applied in Laplacian pyramid domain of an image, to remove speckle and enhance edges simultaneously. The performance of the proposed method was evaluated on a phantom and a real ultrasound image. In the phantom study, we obtained an average gain of 0.55 and 1.11 in contrast-to-noise ratio compared to the speckle reducing anisotropic diffusion (SRAD) and nonlinear coherent diffusion (NCD), respectively. Also, the proposed LPNDSF showed clearer boundaries on the phantom and the real ultrasound image. These preliminary results indicate that the proposed LPNDSF can effectively reduce speckle noise while enhancing image edges for retaining subtle features.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128414186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Image Denoising with Shrinkage and Redundant Representations 基于收缩和冗余表示的图像去噪
Michael Elad, Boaz Matalon, M. Zibulevsky
Shrinkage is a well known and appealing denoising technique. The use of shrinkage is known to be optimal for Gaussian white noise, provided that the sparsity on the signal’s representation is enforced using a unitary transform. Still, shrinkage is also practiced successfully with nonunitary, and even redundant representations. In this paper we shed some light on this behavior. We show that simple shrinkage could be interpreted as the first iteration of an algorithm that solves the basis pursuit denoising (BPDN) problem. Thus, this work leads to a novel iterative shrinkage algorithm that can be considered as an effective pursuit method. We demonstrate this algorithm, both on synthetic data, and for the image denoising problem, where we learn the image prior parameters directly from the given image. The results in both cases are superior to several popular alternatives.
收缩是一种众所周知且吸引人的去噪技术。已知使用收缩是高斯白噪声的最佳选择,前提是使用酉变换强制执行信号表示的稀疏性。尽管如此,对于非单一的,甚至冗余的表示,收缩也可以成功地实践。在本文中,我们阐明了这种行为。我们表明,简单的收缩可以解释为解决基追踪去噪(BPDN)问题的算法的第一次迭代。因此,这项工作导致了一种新的迭代收缩算法,可以被认为是一种有效的追踪方法。我们在合成数据和图像去噪问题上演示了该算法,其中我们直接从给定图像中学习图像先验参数。这两种方法的结果都优于几种流行的替代方法。
{"title":"Image Denoising with Shrinkage and Redundant Representations","authors":"Michael Elad, Boaz Matalon, M. Zibulevsky","doi":"10.1109/CVPR.2006.143","DOIUrl":"https://doi.org/10.1109/CVPR.2006.143","url":null,"abstract":"Shrinkage is a well known and appealing denoising technique. The use of shrinkage is known to be optimal for Gaussian white noise, provided that the sparsity on the signal’s representation is enforced using a unitary transform. Still, shrinkage is also practiced successfully with nonunitary, and even redundant representations. In this paper we shed some light on this behavior. We show that simple shrinkage could be interpreted as the first iteration of an algorithm that solves the basis pursuit denoising (BPDN) problem. Thus, this work leads to a novel iterative shrinkage algorithm that can be considered as an effective pursuit method. We demonstrate this algorithm, both on synthetic data, and for the image denoising problem, where we learn the image prior parameters directly from the given image. The results in both cases are superior to several popular alternatives.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128898530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 106
Multiple Face Model of Hybrid Fourier Feature for Large Face Image Set 大型人脸图像集的混合傅里叶特征多人脸模型
Wonjun Hwang, Gyu-tae Park, Jongha Lee, S. Kee
The face recognition system based on the only single classifier considering the restricted information can not guarantee the generality and superiority of performances in a real situation. To challenge such problems, we propose the hybrid Fourier features extracted from different frequency bands and multiple face models. The hybrid Fourier feature comprises three different Fourier domains; merged real and imaginary components, Fourier spectrum and phase angle. When deriving Fourier features from three Fourier domains, we define three different frequency bandwidths, so that additional complementary features can be obtained. After this, they are individually classified by Linear Discriminant Analysis. This approach makes possible analyzing a face image from the various viewpoints to recognize identities. Moreover, we propose multiple face models based on different eye positions with a same image size, and it contributes to increasing the performance of the proposed system. We evaluated this proposed system using the Face Recognition Grand Challenge (FRGC) experimental protocols known as the largest data sets available. Experimental results on FRGC version 2.0 data sets has proven that the proposed method shows better verification rates than the baseline of FRGC on 2D frontal face images under various situations such as illumination changes, expression changes, and time elapses.
基于单一分类器的人脸识别系统考虑到有限的信息,不能保证在真实情况下性能的通用性和优越性。为了解决这些问题,我们提出了从不同频带和多个人脸模型中提取混合傅里叶特征。混合傅里叶特征包括三个不同的傅里叶域;合并实虚分量,傅里叶频谱和相位角。当从三个傅里叶域中导出傅里叶特征时,我们定义了三个不同的频率带宽,以便获得额外的互补特征。然后,分别用线性判别分析对它们进行分类。这种方法使得从不同角度分析人脸图像来识别身份成为可能。此外,我们提出了基于相同图像大小的不同眼睛位置的多个人脸模型,这有助于提高系统的性能。我们使用人脸识别大挑战(FRGC)实验协议评估了该系统,该实验协议被称为可用的最大数据集。在FRGC 2.0版本数据集上的实验结果表明,在光照变化、表情变化、时间流逝等多种情况下,该方法对二维正面人脸图像的验证率优于FRGC基线。
{"title":"Multiple Face Model of Hybrid Fourier Feature for Large Face Image Set","authors":"Wonjun Hwang, Gyu-tae Park, Jongha Lee, S. Kee","doi":"10.1109/CVPR.2006.201","DOIUrl":"https://doi.org/10.1109/CVPR.2006.201","url":null,"abstract":"The face recognition system based on the only single classifier considering the restricted information can not guarantee the generality and superiority of performances in a real situation. To challenge such problems, we propose the hybrid Fourier features extracted from different frequency bands and multiple face models. The hybrid Fourier feature comprises three different Fourier domains; merged real and imaginary components, Fourier spectrum and phase angle. When deriving Fourier features from three Fourier domains, we define three different frequency bandwidths, so that additional complementary features can be obtained. After this, they are individually classified by Linear Discriminant Analysis. This approach makes possible analyzing a face image from the various viewpoints to recognize identities. Moreover, we propose multiple face models based on different eye positions with a same image size, and it contributes to increasing the performance of the proposed system. We evaluated this proposed system using the Face Recognition Grand Challenge (FRGC) experimental protocols known as the largest data sets available. Experimental results on FRGC version 2.0 data sets has proven that the proposed method shows better verification rates than the baseline of FRGC on 2D frontal face images under various situations such as illumination changes, expression changes, and time elapses.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130891878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 57
Fast Variational Segmentation using Partial Extremal Initialization 使用部分极值初始化的快速变分分割
J. E. Solem, N. C. Overgaard, Markus Persson, A. Heyden
In this paper we consider region-based variational segmentation of two- and three-dimensional images by the minimization of functionals whose fidelity term is the quotient of two integrals. Users often refrain from quotient functionals, even when they seem to be the most natural choice, probably because the corresponding gradient descent PDEs are nonlocal and hence require the computation of global properties. Here it is shown how this problem may be overcome by employing the structure of the Euler-Lagrange equation of the fidelity term to construct a good initialization for the gradient descent PDE, which will then converge rapidly to the desired (local) minimum. The initializer is found by making a one-dimensional search among the level sets of a function related to the fidelity term, picking the level set which minimizes the segmentation functional. This partial extremal initialization is tested on a medical segmentation problem with velocity- and intensity data from MR images. In this particular application, the partial extremal initialization speeds up the segmentation by two orders of magnitude compared to straight forward gradient descent.
本文研究了一种基于区域的二维和三维图像的变分分割方法,其保真度项为两个积分的商。用户经常避免使用商函数,即使它们看起来是最自然的选择,这可能是因为相应的梯度下降偏微分方程是非局部的,因此需要计算全局属性。这里展示了如何通过使用保真度项的欧拉-拉格朗日方程的结构来构造梯度下降PDE的良好初始化来克服这个问题,然后该初始化将迅速收敛到所需的(局部)最小值。初始化器是通过在与保真度项相关的函数的水平集中进行一维搜索来找到的,选择最小化分割函数的水平集。在MR图像的速度和强度数据的医学分割问题上,对这种部分极值初始化进行了测试。在这个特殊的应用程序中,与直接梯度下降相比,部分极值初始化将分割速度提高了两个数量级。
{"title":"Fast Variational Segmentation using Partial Extremal Initialization","authors":"J. E. Solem, N. C. Overgaard, Markus Persson, A. Heyden","doi":"10.1109/CVPR.2006.120","DOIUrl":"https://doi.org/10.1109/CVPR.2006.120","url":null,"abstract":"In this paper we consider region-based variational segmentation of two- and three-dimensional images by the minimization of functionals whose fidelity term is the quotient of two integrals. Users often refrain from quotient functionals, even when they seem to be the most natural choice, probably because the corresponding gradient descent PDEs are nonlocal and hence require the computation of global properties. Here it is shown how this problem may be overcome by employing the structure of the Euler-Lagrange equation of the fidelity term to construct a good initialization for the gradient descent PDE, which will then converge rapidly to the desired (local) minimum. The initializer is found by making a one-dimensional search among the level sets of a function related to the fidelity term, picking the level set which minimizes the segmentation functional. This partial extremal initialization is tested on a medical segmentation problem with velocity- and intensity data from MR images. In this particular application, the partial extremal initialization speeds up the segmentation by two orders of magnitude compared to straight forward gradient descent.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131021735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Diffusion Distance for Histogram Comparison 直方图比较的扩散距离
Haibin Ling, K. Okada
In this paper we propose diffusion distance, a new dissimilarity measure between histogram-based descriptors. We define the difference between two histograms to be a temperature field. We then study the relationship between histogram similarity and a diffusion process, showing how diffusion handles deformation as well as quantization effects. As a result, the diffusion distance is derived as the sum of dissimilarities over scales. Being a cross-bin histogram distance, the diffusion distance is robust to deformation, lighting change and noise in histogram-based local descriptors. In addition, it enjoys linear computational complexity which significantly improves previously proposed cross-bin distances with quadratic complexity or higher. We tested the proposed approach on both shape recognition and interest point matching tasks using several multi-dimensional histogram-based descriptors including shape context, SIFT, and spin images. In all experiments, the diffusion distance performs excellently in both accuracy and efficiency in comparison with other state-of-the-art distance measures. In particular, it performs as accurately as the Earth Mover’s Distance with much greater efficiency.
本文提出了一种新的基于直方图的描述符之间不相似度度量方法——扩散距离。我们将两个直方图之间的差定义为温度场。然后,我们研究了直方图相似性和扩散过程之间的关系,展示了扩散如何处理变形以及量化效果。因此,扩散距离推导为不同尺度的不相似度之和。在基于直方图的局部描述符中,扩散距离作为一个跨bin直方图距离,对变形、光照变化和噪声具有鲁棒性。此外,它具有线性计算复杂度,这大大提高了先前提出的二次复杂度或更高的跨库距离。我们使用几种基于多维直方图的描述符(包括形状上下文、SIFT和旋转图像)在形状识别和兴趣点匹配任务上测试了所提出的方法。在所有实验中,与其他最先进的距离测量方法相比,扩散距离在准确性和效率方面都表现出色。特别是,它的计算精度与地球移动距离一样,而且效率要高得多。
{"title":"Diffusion Distance for Histogram Comparison","authors":"Haibin Ling, K. Okada","doi":"10.1109/CVPR.2006.99","DOIUrl":"https://doi.org/10.1109/CVPR.2006.99","url":null,"abstract":"In this paper we propose diffusion distance, a new dissimilarity measure between histogram-based descriptors. We define the difference between two histograms to be a temperature field. We then study the relationship between histogram similarity and a diffusion process, showing how diffusion handles deformation as well as quantization effects. As a result, the diffusion distance is derived as the sum of dissimilarities over scales. Being a cross-bin histogram distance, the diffusion distance is robust to deformation, lighting change and noise in histogram-based local descriptors. In addition, it enjoys linear computational complexity which significantly improves previously proposed cross-bin distances with quadratic complexity or higher. We tested the proposed approach on both shape recognition and interest point matching tasks using several multi-dimensional histogram-based descriptors including shape context, SIFT, and spin images. In all experiments, the diffusion distance performs excellently in both accuracy and efficiency in comparison with other state-of-the-art distance measures. In particular, it performs as accurately as the Earth Mover’s Distance with much greater efficiency.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129278327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 270
Recognition of Composite Human Activities through Context-Free Grammar Based Representation 基于上下文无关语法表示的复合人类活动识别
M. Ryoo, J. Aggarwal
This paper describes a general methodology for automated recognition of complex human activities. The methodology uses a context-free grammar (CFG) based representation scheme to represent composite actions and interactions. The CFG-based representation enables us to formally define complex human activities based on simple actions or movements. Human activities are classified into three categories: atomic action, composite action, and interaction. Our system is not only able to represent complex human activities formally, but also able to recognize represented actions and interactions with high accuracy. Image sequences are processed to extract poses and gestures. Based on gestures, the system detects actions and interactions occurring in a sequence of image frames. Our results show that the system is able to represent composite actions and interactions naturally. The system was tested to represent and recognize eight types of interactions: approach, depart, point, shake-hands, hug, punch, kick, and push. The experiments show that the system can recognize sequences of represented composite actions and interactions with a high recognition rate.
本文描述了复杂人类活动自动识别的一般方法。该方法使用基于上下文无关语法(CFG)的表示方案来表示复合动作和交互。基于cfg的表示使我们能够基于简单的动作或运动正式定义复杂的人类活动。人类活动分为三大类:原子作用、复合作用和相互作用。我们的系统不仅能够形式化地表示复杂的人类活动,而且能够高精度地识别所表示的动作和交互。处理图像序列以提取姿势和手势。基于手势,系统检测一系列图像帧中发生的动作和交互。结果表明,该系统能够自然地表示复合动作和交互。经过测试,该系统可以表示和识别八种类型的交互:接近、离开、指向、握手、拥抱、打拳、踢脚和推。实验结果表明,该系统能够以较高的识别率识别复合动作和交互序列。
{"title":"Recognition of Composite Human Activities through Context-Free Grammar Based Representation","authors":"M. Ryoo, J. Aggarwal","doi":"10.1109/CVPR.2006.242","DOIUrl":"https://doi.org/10.1109/CVPR.2006.242","url":null,"abstract":"This paper describes a general methodology for automated recognition of complex human activities. The methodology uses a context-free grammar (CFG) based representation scheme to represent composite actions and interactions. The CFG-based representation enables us to formally define complex human activities based on simple actions or movements. Human activities are classified into three categories: atomic action, composite action, and interaction. Our system is not only able to represent complex human activities formally, but also able to recognize represented actions and interactions with high accuracy. Image sequences are processed to extract poses and gestures. Based on gestures, the system detects actions and interactions occurring in a sequence of image frames. Our results show that the system is able to represent composite actions and interactions naturally. The system was tested to represent and recognize eight types of interactions: approach, depart, point, shake-hands, hug, punch, kick, and push. The experiments show that the system can recognize sequences of represented composite actions and interactions with a high recognition rate.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124417063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 288
Tunable Kernels for Tracking 用于跟踪的可调内核
Vasu Parameswaran, Visvanathan Ramesh, Imad Zoghlami
We present a tunable representation for tracking that simultaneously encodes appearance and geometry in a manner that enables the use of mean-shift iterations for tracking. The classic formulation of the tracking problem using mean-shift iterations encodes spatial information very loosely (i.e. using radially symmetric kernels). A problem with such a formulation is that it becomes easy for the tracker to get confused with other objects having the same feature distribution but different spatial configurations of features. Subsequent approaches have addressed this issue but not to the degree of generality required for tracking specific classes of objects and motions (e.g. humans walking). In this paper, we formulate the tracking problem in a manner that encodes the spatial configuration of features along with their density and yet retains robustness to spatial deformations and feature density variations. The encoding of spatial configuration is done using a set of kernels whose parameters can be optimized for a given class of objects and motions, off-line. The formulation enables the use of meanshift iterations and runs in real-time. We demonstrate better tracking results on synthetic and real image sequences as compared to the original mean-shift tracker.
我们提出了一种可调的跟踪表示,它同时以一种允许使用mean-shift迭代进行跟踪的方式编码外观和几何形状。使用均值移位迭代的跟踪问题的经典公式非常松散地编码空间信息(即使用径向对称核)。这种公式的一个问题是,跟踪器很容易与具有相同特征分布但特征空间配置不同的其他对象混淆。随后的方法已经解决了这个问题,但没有达到跟踪特定类别的物体和运动(例如人类行走)所需的通用性程度。在本文中,我们以一种对特征的空间配置及其密度进行编码的方式来表述跟踪问题,同时保持对空间变形和特征密度变化的鲁棒性。空间配置的编码是使用一组内核完成的,这些内核的参数可以针对给定的对象和运动类进行离线优化。该公式允许使用meanshift迭代并实时运行。与原始均值移位跟踪器相比,我们在合成和真实图像序列上展示了更好的跟踪结果。
{"title":"Tunable Kernels for Tracking","authors":"Vasu Parameswaran, Visvanathan Ramesh, Imad Zoghlami","doi":"10.1109/CVPR.2006.317","DOIUrl":"https://doi.org/10.1109/CVPR.2006.317","url":null,"abstract":"We present a tunable representation for tracking that simultaneously encodes appearance and geometry in a manner that enables the use of mean-shift iterations for tracking. The classic formulation of the tracking problem using mean-shift iterations encodes spatial information very loosely (i.e. using radially symmetric kernels). A problem with such a formulation is that it becomes easy for the tracker to get confused with other objects having the same feature distribution but different spatial configurations of features. Subsequent approaches have addressed this issue but not to the degree of generality required for tracking specific classes of objects and motions (e.g. humans walking). In this paper, we formulate the tracking problem in a manner that encodes the spatial configuration of features along with their density and yet retains robustness to spatial deformations and feature density variations. The encoding of spatial configuration is done using a set of kernels whose parameters can be optimized for a given class of objects and motions, off-line. The formulation enables the use of meanshift iterations and runs in real-time. We demonstrate better tracking results on synthetic and real image sequences as compared to the original mean-shift tracker.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128866321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Supervised Learning of Edges and Object Boundaries 边缘和对象边界的监督学习
Piotr Dollár, Z. Tu, Serge J. Belongie
Edge detection is one of the most studied problems in computer vision, yet it remains a very challenging task. It is difficult since often the decision for an edge cannot be made purely based on low level cues such as gradient, instead we need to engage all levels of information, low, middle, and high, in order to decide where to put edges. In this paper we propose a novel supervised learning algorithm for edge and object boundary detection which we refer to as Boosted Edge Learning or BEL for short. A decision of an edge point is made independently at each location in the image; a very large aperture is used providing significant context for each decision. In the learning stage, the algorithm selects and combines a large number of features across different scales in order to learn a discriminative model using an extended version of the Probabilistic Boosting Tree classification algorithm. The learning based framework is highly adaptive and there are no parameters to tune. We show applications for edge detection in a number of specific image domains as well as on natural images. We test on various datasets including the Berkeley dataset and the results obtained are very good.
边缘检测是计算机视觉中研究最多的问题之一,但它仍然是一个非常具有挑战性的任务。这是很困难的,因为对于边缘的决定不能完全基于低层次的线索,如梯度,相反,我们需要参与所有层次的信息,低,中,高,以决定放置边缘的位置。在本文中,我们提出了一种新的边缘和目标边界检测的监督学习算法,我们将其称为增强边缘学习(boosting edge learning,简称BEL)。在图像的每个位置独立地确定边缘点;使用非常大的孔径为每个决定提供重要的背景。在学习阶段,算法选择并组合大量不同尺度的特征,使用扩展版本的概率提升树分类算法来学习判别模型。基于学习的框架具有很高的适应性,没有参数需要调整。我们展示了边缘检测在许多特定图像域以及自然图像上的应用。我们在包括Berkeley数据集在内的各种数据集上进行了测试,得到了很好的结果。
{"title":"Supervised Learning of Edges and Object Boundaries","authors":"Piotr Dollár, Z. Tu, Serge J. Belongie","doi":"10.1109/CVPR.2006.298","DOIUrl":"https://doi.org/10.1109/CVPR.2006.298","url":null,"abstract":"Edge detection is one of the most studied problems in computer vision, yet it remains a very challenging task. It is difficult since often the decision for an edge cannot be made purely based on low level cues such as gradient, instead we need to engage all levels of information, low, middle, and high, in order to decide where to put edges. In this paper we propose a novel supervised learning algorithm for edge and object boundary detection which we refer to as Boosted Edge Learning or BEL for short. A decision of an edge point is made independently at each location in the image; a very large aperture is used providing significant context for each decision. In the learning stage, the algorithm selects and combines a large number of features across different scales in order to learn a discriminative model using an extended version of the Probabilistic Boosting Tree classification algorithm. The learning based framework is highly adaptive and there are no parameters to tune. We show applications for edge detection in a number of specific image domains as well as on natural images. We test on various datasets including the Berkeley dataset and the results obtained are very good.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127558247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 491
期刊
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1