IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision最新文献

英文中文

Play type recognition in real-world football video 在现实世界的足球视频发挥类型识别

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision

Pub Date : 2014-06-23 DOI: 10.1109/WACV.2014.6836040

Sheng Chen, Zhongyuan Feng, Qingkai Lu, Behrooz Mahasseni, Trevor Fiez, Alan Fern, S. Todorovic

This paper presents a vision system for recognizing the sequence of plays in amateur videos of American football games (e.g. offense, defense, kickoff, punt, etc). The system is aimed at reducing user effort in annotating football videos, which are posted on a web service used by over 13,000 high school, college, and professional football teams. Recognizing football plays is particularly challenging in the context of such a web service, due to the huge variations across videos, in terms of camera viewpoint, motion, distance from the field, as well as amateur camerawork quality, and lighting conditions, among other factors. Given a sequence of videos, where each shows a particular play of a football game, we first run noisy play-level detectors on every video. Then, we integrate responses of the play-level detectors with global game-level reasoning which accounts for statistical knowledge about football games. Our empirical results on more than 1450 videos from 10 diverse football games show that our approach is quite effective, and close to being usable in a real-world setting.

本文提出了一种用于识别美式橄榄球比赛业余视频中的比赛顺序(如进攻、防守、开球、踢球等)的视觉系统。该系统旨在减少用户注释足球视频的工作量，这些视频发布在超过13,000个高中，大学和职业足球队使用的网络服务上。在这样一个网络服务的背景下，识别足球比赛尤其具有挑战性，因为视频之间存在巨大的差异，包括摄像机视点、运动、与场地的距离、业余摄影质量和照明条件等因素。给定一个视频序列，其中每个视频都显示一种特定的足球比赛，我们首先在每个视频上运行嘈杂的比赛水平检测器。然后，我们将比赛级检测器的响应与全局比赛级推理相结合，这说明了足球比赛的统计知识。我们对来自10场不同足球比赛的1450多个视频的实证结果表明，我们的方法非常有效，并且接近于在现实世界中使用。

{"title":"Play type recognition in real-world football video","authors":"Sheng Chen, Zhongyuan Feng, Qingkai Lu, Behrooz Mahasseni, Trevor Fiez, Alan Fern, S. Todorovic","doi":"10.1109/WACV.2014.6836040","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836040","url":null,"abstract":"This paper presents a vision system for recognizing the sequence of plays in amateur videos of American football games (e.g. offense, defense, kickoff, punt, etc). The system is aimed at reducing user effort in annotating football videos, which are posted on a web service used by over 13,000 high school, college, and professional football teams. Recognizing football plays is particularly challenging in the context of such a web service, due to the huge variations across videos, in terms of camera viewpoint, motion, distance from the field, as well as amateur camerawork quality, and lighting conditions, among other factors. Given a sequence of videos, where each shows a particular play of a football game, we first run noisy play-level detectors on every video. Then, we integrate responses of the play-level detectors with global game-level reasoning which accounts for statistical knowledge about football games. Our empirical results on more than 1450 videos from 10 diverse football games show that our approach is quite effective, and close to being usable in a real-world setting.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"1 1","pages":"652-659"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83172964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Model-based anthropometry: Predicting measurements from 3D human scans in multiple poses 基于模型的人体测量学:从多种姿势的3D人体扫描预测测量

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision

Pub Date : 2014-03-24 DOI: 10.1109/WACV.2014.6836115

Aggeliki Tsoli, M. Loper, Michael J. Black

Extracting anthropometric or tailoring measurements from 3D human body scans is important for applications such as virtual try-on, custom clothing, and online sizing. Existing commercial solutions identify anatomical landmarks on high-resolution 3D scans and then compute distances or circumferences on the scan. Landmark detection is sensitive to acquisition noise (e.g. holes) and these methods require subjects to adopt a specific pose. In contrast, we propose a solution we call model-based anthropometry. We fit a deformable 3D body model to scan data in one or more poses; this model-based fitting is robust to scan noise. This brings the scan into registration with a database of registered body scans. Then, we extract features from the registered model (rather than from the scan); these include, limb lengths, circumferences, and statistical features of global shape. Finally, we learn a mapping from these features to measurements using regularized linear regression. We perform an extensive evaluation using the CAESAR dataset and demonstrate that the accuracy of our method outperforms state-of-the-art methods.

从3D人体扫描中提取人体测量或剪裁测量对于虚拟试穿，定制服装和在线尺寸等应用非常重要。现有的商业解决方案在高分辨率3D扫描中识别解剖地标，然后计算扫描的距离或周长。地标检测对采集噪声(如孔洞)很敏感，这些方法需要受试者采取特定的姿势。相反，我们提出了一种解决方案，我们称之为基于模型的人体测量学。我们拟合一个可变形的3D身体模型来扫描一个或多个姿势的数据;这种基于模型的拟合对扫描噪声具有较强的鲁棒性。这将扫描与已注册的身体扫描数据库进行注册。然后，我们从注册模型中提取特征(而不是从扫描中);这些包括肢体长度、周长和全局形状的统计特征。最后，我们使用正则化线性回归学习从这些特征到测量的映射。我们使用CAESAR数据集进行了广泛的评估，并证明我们的方法的准确性优于最先进的方法。

{"title":"Model-based anthropometry: Predicting measurements from 3D human scans in multiple poses","authors":"Aggeliki Tsoli, M. Loper, Michael J. Black","doi":"10.1109/WACV.2014.6836115","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836115","url":null,"abstract":"Extracting anthropometric or tailoring measurements from 3D human body scans is important for applications such as virtual try-on, custom clothing, and online sizing. Existing commercial solutions identify anatomical landmarks on high-resolution 3D scans and then compute distances or circumferences on the scan. Landmark detection is sensitive to acquisition noise (e.g. holes) and these methods require subjects to adopt a specific pose. In contrast, we propose a solution we call model-based anthropometry. We fit a deformable 3D body model to scan data in one or more poses; this model-based fitting is robust to scan noise. This brings the scan into registration with a database of registered body scans. Then, we extract features from the registered model (rather than from the scan); these include, limb lengths, circumferences, and statistical features of global shape. Finally, we learn a mapping from these features to measurements using regularized linear regression. We perform an extensive evaluation using the CAESAR dataset and demonstrate that the accuracy of our method outperforms state-of-the-art methods.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"2 1","pages":"83-90"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/WACV.2014.6836115","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72522246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

Repeated constrained sparse coding with partial dictionaries for hyperspectral unmixing 高光谱解混的部分字典重复约束稀疏编码

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision

Pub Date : 2014-03-24 DOI: 10.1109/WACV.2014.6836001

Naveed Akhtar, F. Shafait, A. Mian

Hyperspectral images obtained from remote sensing platforms have limited spatial resolution. Thus, each spectra measured at a pixel is usually a mixture of many pure spectral signatures (endmembers) corresponding to different materials on the ground. Hyperspectral unmixing aims at separating these mixed spectra into its constituent end-members. We formulate hyperspectral unmixing as a constrained sparse coding (CSC) problem where unmixing is performed with the help of a library of pure spectral signatures under positivity and summation constraints. We propose two different methods that perform CSC repeatedly over the hyperspectral data. However, the first method, Repeated-CSC (RCSC), systematically neglects a few spectral bands of the data each time it performs the sparse coding. Whereas the second method, Repeated Spectral Derivative (RSD), takes the spectral derivative of the data before the sparse coding stage. The spectral derivative is taken such that it is not operated on a few selected bands. Experiments on simulated and real hyperspectral data and comparison with existing state of the art show that the proposed methods achieve significantly higher accuracy. Our results demonstrate the overall robustness of RCSC to noise and better performance of RSD at high signal to noise ratio.

从遥感平台获得的高光谱图像具有有限的空间分辨率。因此，在一个像素处测量的每个光谱通常是许多纯光谱特征(端元)的混合物，对应于地面上不同的材料。高光谱分解的目的是将这些混合光谱分离成其组成端元。我们将高光谱解调表述为一个约束稀疏编码(CSC)问题，其中解调是借助纯光谱签名库在正约束和求和约束下进行的。我们提出了两种不同的方法对高光谱数据重复执行CSC。然而，第一种方法，即repeat - csc (RCSC)，在每次进行稀疏编码时，系统地忽略了数据的几个频谱带。而第二种方法，即重复谱导数(RSD)，在稀疏编码阶段之前对数据进行谱导数。光谱导数是这样取的，它不是在几个选定的波段上操作。在模拟和真实高光谱数据上的实验以及与现有技术水平的比较表明，所提出的方法具有较高的精度。我们的研究结果证明了RCSC对噪声的整体鲁棒性，并且在高信噪比下RSD具有更好的性能。

{"title":"Repeated constrained sparse coding with partial dictionaries for hyperspectral unmixing","authors":"Naveed Akhtar, F. Shafait, A. Mian","doi":"10.1109/WACV.2014.6836001","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836001","url":null,"abstract":"Hyperspectral images obtained from remote sensing platforms have limited spatial resolution. Thus, each spectra measured at a pixel is usually a mixture of many pure spectral signatures (endmembers) corresponding to different materials on the ground. Hyperspectral unmixing aims at separating these mixed spectra into its constituent end-members. We formulate hyperspectral unmixing as a constrained sparse coding (CSC) problem where unmixing is performed with the help of a library of pure spectral signatures under positivity and summation constraints. We propose two different methods that perform CSC repeatedly over the hyperspectral data. However, the first method, Repeated-CSC (RCSC), systematically neglects a few spectral bands of the data each time it performs the sparse coding. Whereas the second method, Repeated Spectral Derivative (RSD), takes the spectral derivative of the data before the sparse coding stage. The spectral derivative is taken such that it is not operated on a few selected bands. Experiments on simulated and real hyperspectral data and comparison with existing state of the art show that the proposed methods achieve significantly higher accuracy. Our results demonstrate the overall robustness of RCSC to noise and better performance of RSD at high signal to noise ratio.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"44 1","pages":"953-960"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79335744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Segmentation and matching: Towards a robust object detection system 分割与匹配:实现鲁棒目标检测系统

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision

Pub Date : 2014-03-24 DOI: 10.1109/WACV.2014.6836082

Jing Huang, Suya You

This paper focuses on detecting parts in laser-scanned data of a cluttered industrial scene. To achieve the goal, we propose a robust object detection system based on segmentation and matching, as well as an adaptive segmentation algorithm and an efficient pose extraction algorithm based on correspondence filtering. We also propose an overlapping-based criterion that exploits more information of the original point cloud than the number-of-matching criterion that only considers key-points. Experiments show how each component works and the results demonstrate the performance of our system compared to the state of the art.

本文的重点是在一个杂乱的工业场景的激光扫描数据中检测零件。为了实现这一目标，我们提出了一种基于分割和匹配的鲁棒目标检测系统，以及一种自适应分割算法和一种基于对应滤波的高效姿态提取算法。我们还提出了一种基于重叠的准则，该准则比只考虑关键点的匹配数准则利用了原始点云的更多信息。实验显示了每个组件是如何工作的，结果显示了我们的系统与最先进的系统相比的性能。

引用次数: 4

Introspective semantic segmentation 内省语义分割

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision

Pub Date : 2014-03-24 DOI: 10.1109/WACV.2014.6836032

Gautam Singh, J. Kosecka

Traditional approaches for semantic segmentation work in a supervised setting assuming a fixed number of semantic categories and require sufficiently large training sets. The performance of various approaches is often reported in terms of average per pixel class accuracy and global accuracy of the final labeling. When applying the learned models in the practical settings on large amounts of unlabeled data, possibly containing previously unseen categories, it is important to properly quantify their performance by measuring a classifier's introspective capability. We quantify the confidence of the region classifiers in the context of a non-parametric k-nearest neighbor (k-NN) framework for semantic segmentation by using the so called strangeness measure. The proposed measure is evaluated by introducing confidence based image ranking and showing its feasibility on a dataset containing a large number of previously unseen categories.

传统的语义分割方法在一个有监督的环境下工作，假设有固定数量的语义类别，并且需要足够大的训练集。各种方法的性能通常根据平均每像素类精度和最终标记的全局精度来报道。当将学习到的模型应用于大量未标记数据的实际设置时，可能包含以前未见过的类别，通过测量分类器的内省能力来适当地量化它们的性能是很重要的。我们通过使用所谓的陌生度度量来量化区域分类器在非参数k-近邻(k-NN)语义分割框架中的置信度。通过引入基于置信度的图像排序并在包含大量以前未见过的类别的数据集上显示其可行性来评估所提出的度量。

引用次数: 3

GPU-accelerated and efficient multi-view triangulation for scene reconstruction gpu加速和高效的场景重建多视图三角剖分

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision

Pub Date : 2014-03-24 DOI: 10.1109/WACV.2014.6836117

J. Mak, Mauricio Hess-Flores, S. Recker, John Douglas Owens, K. Joy

This paper presents a framework for GPU-accelerated N-view triangulation in multi-view reconstruction that improves processing time and final reprojection error with respect to methods in the literature. The framework uses an algorithm based on optimizing an angular error-based L1 cost function and it is shown how adaptive gradient descent can be applied for convergence. The triangulation algorithm is mapped onto the GPU and two approaches for parallelization are compared: one thread per track and one thread block per track. The better performing approach depends on the number of tracks and the lengths of the tracks in the dataset. Furthermore, the algorithm uses statistical sampling based on confidence levels to successfully reduce the quantity of feature track positions needed to triangulate an entire track. Sampling aids in load balancing for the GPU's SIMD architecture and for exploiting the GPU's memory hierarchy. When compared to a serial implementation, a typical performance increase of 3-4× can be achieved on a 4-core CPU. On a GPU, large track numbers are favorable and an increase of up to 40× can be achieved. Results on real and synthetic data prove that reprojection errors are similar to the best performing current triangulation methods but costing only a fraction of the computation time, allowing for efficient and accurate triangulation of large scenes.

本文提出了一个gpu加速的多视图重建n视图三角测量框架，与文献中的方法相比，该框架改善了处理时间和最终重投影误差。该框架使用一种基于优化基于角度误差的L1代价函数的算法，并展示了如何将自适应梯度下降应用于收敛。将三角测量算法映射到GPU上，并比较了两种并行化方法:每个轨道一个线程和每个轨道一个线程块。性能更好的方法取决于数据集中曲目的数量和曲目的长度。此外，该算法使用基于置信水平的统计采样，成功地减少了三角测量整个轨迹所需的特征轨迹位置的数量。采样有助于GPU的SIMD架构和利用GPU的内存层次结构的负载平衡。与串行实现相比，在4核CPU上可以实现3-4倍的典型性能提升。在GPU上，大的轨道数是有利的，可以实现高达40倍的增长。在真实数据和合成数据上的结果证明，重投影误差与目前性能最好的三角测量方法相似，但只花费一小部分计算时间，可以实现高效准确的大场景三角测量。

{"title":"GPU-accelerated and efficient multi-view triangulation for scene reconstruction","authors":"J. Mak, Mauricio Hess-Flores, S. Recker, John Douglas Owens, K. Joy","doi":"10.1109/WACV.2014.6836117","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836117","url":null,"abstract":"This paper presents a framework for GPU-accelerated N-view triangulation in multi-view reconstruction that improves processing time and final reprojection error with respect to methods in the literature. The framework uses an algorithm based on optimizing an angular error-based L1 cost function and it is shown how adaptive gradient descent can be applied for convergence. The triangulation algorithm is mapped onto the GPU and two approaches for parallelization are compared: one thread per track and one thread block per track. The better performing approach depends on the number of tracks and the lengths of the tracks in the dataset. Furthermore, the algorithm uses statistical sampling based on confidence levels to successfully reduce the quantity of feature track positions needed to triangulate an entire track. Sampling aids in load balancing for the GPU's SIMD architecture and for exploiting the GPU's memory hierarchy. When compared to a serial implementation, a typical performance increase of 3-4× can be achieved on a 4-core CPU. On a GPU, large track numbers are favorable and an increase of up to 40× can be achieved. Results on real and synthetic data prove that reprojection errors are similar to the best performing current triangulation methods but costing only a fraction of the computation time, allowing for efficient and accurate triangulation of large scenes.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"110 1","pages":"61-68"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88247327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Understanding and analyzing a large collection of archived swimming videos 理解和分析大量存档的游泳视频

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision

Pub Date : 2014-03-24 DOI: 10.1109/WACV.2014.6836037

Long Sha, P. Lucey, S. Sridharan, S. Morgan, D. Pease

In elite sports, nearly all performances are captured on video. Despite the massive amounts of video that has been captured in this domain over the last 10-15 years, most of it remains in an “unstructured” or “raw” form, meaning it can only be viewed or manually annotated/tagged with higher-level event labels which is time consuming and subjective. As such, depending on the detail or depth of annotation, the value of the collected repositories of archived data is minimal as it does not lend itself to large-scale analysis and retrieval. One such example is swimming, where each race of a swimmer is captured on a camcorder and in-addition to the split-times (i.e., the time it takes for each lap), stroke rate and stroke-lengths are manually annotated. In this paper, we propose a vision-based system which effectively “digitizes” a large collection of archived swimming races by estimating the location of the swimmer in each frame, as well as detecting the stroke rate. As the videos are captured from moving hand-held cameras which are located at different positions and angles, we show our hierarchical-based approach to tracking the swimmer and their different parts is robust to these issues and allows us to accurately estimate the swimmer location and stroke rates.

在精英运动中，几乎所有的表演都用视频记录下来。尽管在过去的10-15年里，在这个领域捕获了大量的视频，但其中大部分仍然是“非结构化”或“原始”形式，这意味着它只能被观看或手动注释/标记更高级别的事件标签，这是耗时和主观的。因此，根据注释的细节或深度，收集的归档数据存储库的价值很小，因为它不适合大规模的分析和检索。一个这样的例子是游泳，游泳者的每一场比赛都被摄像机捕捉到，除了分段时间(即每圈所花费的时间)，划水速度和划水长度都被手动注释。在本文中，我们提出了一个基于视觉的系统，该系统通过估计游泳者在每帧中的位置，以及检测泳姿率，有效地将大量存档的游泳比赛“数字化”。由于视频是从位于不同位置和角度的移动手持摄像机捕获的，我们展示了基于层次的方法来跟踪游泳者及其不同部分对这些问题的鲁棒性，并使我们能够准确地估计游泳者的位置和泳姿率。

{"title":"Understanding and analyzing a large collection of archived swimming videos","authors":"Long Sha, P. Lucey, S. Sridharan, S. Morgan, D. Pease","doi":"10.1109/WACV.2014.6836037","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836037","url":null,"abstract":"In elite sports, nearly all performances are captured on video. Despite the massive amounts of video that has been captured in this domain over the last 10-15 years, most of it remains in an “unstructured” or “raw” form, meaning it can only be viewed or manually annotated/tagged with higher-level event labels which is time consuming and subjective. As such, depending on the detail or depth of annotation, the value of the collected repositories of archived data is minimal as it does not lend itself to large-scale analysis and retrieval. One such example is swimming, where each race of a swimmer is captured on a camcorder and in-addition to the split-times (i.e., the time it takes for each lap), stroke rate and stroke-lengths are manually annotated. In this paper, we propose a vision-based system which effectively “digitizes” a large collection of archived swimming races by estimating the location of the swimmer in each frame, as well as detecting the stroke rate. As the videos are captured from moving hand-held cameras which are located at different positions and angles, we show our hierarchical-based approach to tracking the swimmer and their different parts is robust to these issues and allows us to accurately estimate the swimmer location and stroke rates.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"29 1","pages":"674-681"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82766189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Real time action recognition using histograms of depth gradients and random decision forests 使用深度梯度直方图和随机决策森林的实时动作识别

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision

Pub Date : 2014-03-24 DOI: 10.1109/WACV.2014.6836044

H. Rahmani, A. Mahmood, D. Huynh, A. Mian

We propose an algorithm which combines the discriminative information from depth images as well as from 3D joint positions to achieve high action recognition accuracy. To avoid the suppression of subtle discriminative information and also to handle local occlusions, we compute a vector of many independent local features. Each feature encodes spatiotemporal variations of depth and depth gradients at a specific space-time location in the action volume. Moreover, we encode the dominant skeleton movements by computing a local 3D joint position difference histogram. For each joint, we compute a 3D space-time motion volume which we use as an importance indicator and incorporate in the feature vector for improved action discrimination. To retain only the discriminant features, we train a random decision forest (RDF). The proposed algorithm is evaluated on three standard datasets and compared with nine state-of-the-art algorithms. Experimental results show that, on the average, the proposed algorithm outperform all other algorithms in accuracy and have a processing speed of over 112 frames/second.

我们提出了一种结合深度图像和三维关节位置判别信息的动作识别算法，以达到较高的动作识别精度。为了避免对细微区别信息的抑制，同时也为了处理局部遮挡，我们计算了一个由许多独立的局部特征组成的向量。每个特征编码深度和深度梯度在动作体中特定时空位置的时空变化。此外，我们通过计算局部三维关节位置差直方图来编码主导骨骼运动。对于每个关节，我们计算一个三维时空运动体积，我们将其作为重要指标，并将其纳入特征向量中以改进动作识别。为了只保留判别特征，我们训练了一个随机决策森林(RDF)。该算法在三个标准数据集上进行了评估，并与九种最先进的算法进行了比较。实验结果表明，该算法的平均精度优于其他算法，处理速度超过112帧/秒。

{"title":"Real time action recognition using histograms of depth gradients and random decision forests","authors":"H. Rahmani, A. Mahmood, D. Huynh, A. Mian","doi":"10.1109/WACV.2014.6836044","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836044","url":null,"abstract":"We propose an algorithm which combines the discriminative information from depth images as well as from 3D joint positions to achieve high action recognition accuracy. To avoid the suppression of subtle discriminative information and also to handle local occlusions, we compute a vector of many independent local features. Each feature encodes spatiotemporal variations of depth and depth gradients at a specific space-time location in the action volume. Moreover, we encode the dominant skeleton movements by computing a local 3D joint position difference histogram. For each joint, we compute a 3D space-time motion volume which we use as an importance indicator and incorporate in the feature vector for improved action discrimination. To retain only the discriminant features, we train a random decision forest (RDF). The proposed algorithm is evaluated on three standard datasets and compared with nine state-of-the-art algorithms. Experimental results show that, on the average, the proposed algorithm outperform all other algorithms in accuracy and have a processing speed of over 112 frames/second.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"1 1","pages":"626-633"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88786006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 114

Efficient dense subspace clustering 高效密集子空间聚类

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision

Pub Date : 2014-03-24 DOI: 10.1109/WACV.2014.6836065

Pan Ji, M. Salzmann, Hongdong Li

In this paper, we tackle the problem of clustering data points drawn from a union of linear (or affine) subspaces. To this end, we introduce an efficient subspace clustering algorithm that estimates dense connections between the points lying in the same subspace. In particular, instead of following the standard compressive sensing approach, we formulate subspace clustering as a Frobenius norm minimization problem, which inherently yields denser con- nections between the data points. While in the noise-free case we rely on the self-expressiveness of the observations, in the presence of noise we simultaneously learn a clean dictionary to represent the data. Our formulation lets us address the subspace clustering problem efficiently. More specifically, the solution can be obtained in closed-form for outlier-free observations, and by performing a series of linear operations in the presence of outliers. Interestingly, we show that our Frobenius norm formulation shares the same solution as the popular nuclear norm minimization approach when the data is free of any noise, or, in the case of corrupted data, when a clean dictionary is learned. Our experimental evaluation on motion segmentation and face clustering demonstrates the benefits of our algorithm in terms of clustering accuracy and efficiency.

在本文中，我们解决了从线性(或仿射)子空间并集中提取数据点的聚类问题。为此，我们引入了一种有效的子空间聚类算法，该算法估计位于同一子空间中的点之间的密集连接。特别是，我们没有遵循标准的压缩感知方法，而是将子空间聚类表述为Frobenius范数最小化问题，该问题固有地产生数据点之间更密集的连接。而在无噪声的情况下，我们依赖于观察的自我表达，在存在噪声的情况下，我们同时学习一个干净的字典来表示数据。我们的公式使我们能够有效地解决子空间聚类问题。更具体地说，对于无异常值的观测值，通过在存在异常值的情况下执行一系列线性操作，可以以封闭形式获得解。有趣的是，我们表明，当数据没有任何噪声时，或者在数据损坏的情况下，当学习干净的字典时，我们的Frobenius范数公式与流行的核范数最小化方法共享相同的解决方案。我们对运动分割和人脸聚类的实验评估表明了我们的算法在聚类精度和效率方面的优势。

{"title":"Efficient dense subspace clustering","authors":"Pan Ji, M. Salzmann, Hongdong Li","doi":"10.1109/WACV.2014.6836065","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836065","url":null,"abstract":"In this paper, we tackle the problem of clustering data points drawn from a union of linear (or affine) subspaces. To this end, we introduce an efficient subspace clustering algorithm that estimates dense connections between the points lying in the same subspace. In particular, instead of following the standard compressive sensing approach, we formulate subspace clustering as a Frobenius norm minimization problem, which inherently yields denser con- nections between the data points. While in the noise-free case we rely on the self-expressiveness of the observations, in the presence of noise we simultaneously learn a clean dictionary to represent the data. Our formulation lets us address the subspace clustering problem efficiently. More specifically, the solution can be obtained in closed-form for outlier-free observations, and by performing a series of linear operations in the presence of outliers. Interestingly, we show that our Frobenius norm formulation shares the same solution as the popular nuclear norm minimization approach when the data is free of any noise, or, in the case of corrupted data, when a clean dictionary is learned. Our experimental evaluation on motion segmentation and face clustering demonstrates the benefits of our algorithm in terms of clustering accuracy and efficiency.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"49 1","pages":"461-468"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87395999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 129

Generalized feature learning and indexing for object localization and recognition 目标定位与识别的广义特征学习与索引

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision

Pub Date : 2014-03-24 DOI: 10.1109/WACV.2014.6836100

Ning Zhou, A. Angelova, Jianping Fan

This paper addresses a general feature indexing and retrieval scenario in which a set of features detected in the image can retrieve a relevant class of objects, or classes of objects. The main idea behind those features for general object retrieval is that they are capable of identifying and localizing some small regions or parts of the potential object. We propose a set of criteria which take advantage of the learned features to find regions in the image which likely belong to an object. We further use the features' localization capability to localize the full object of interest and its extents. The proposed approach improves the recognition performance and is very efficient. Moreover, it has the potential to be used in automatic image understanding or annotation since it can uncover regions where the objects can be found in an image.

本文解决了一个通用的特征索引和检索场景，在该场景中，在图像中检测到的一组特征可以检索相关的一类或几类对象。用于一般对象检索的这些特征背后的主要思想是，它们能够识别和定位潜在对象的一些小区域或部分。我们提出了一套标准，利用学习到的特征来寻找图像中可能属于物体的区域。我们进一步使用功能的定位能力来定位感兴趣的整个对象及其范围。该方法提高了识别性能，效率很高。此外，它还具有用于自动图像理解或注释的潜力，因为它可以发现图像中可以找到对象的区域。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀