首页 > 最新文献

Proceedings IEEE Workshop on Detection and Recognition of Events in Video最新文献

英文 中文
Segmentation and recognition of continuous human activity 连续人类活动的分割和识别
Pub Date : 2001-07-08 DOI: 10.1109/EVENT.2001.938863
Anjum Ali, J. Aggarwal
This paper presents a methodology for automatic segmentation and recognition of continuous human activity. We segment a continuous human activity into separate actions and correctly identify each action. The camera views the subject from the lateral view: there are no distinct breaks or pauses between the execution of different actions. We have no prior knowledge about the commencement or termination of each action. We compute the angles subtended by three major components of the body with the vertical axis, namely the torso, the upper component of the leg and the lower component of the leg. Using these three angles as a feature vector we classify frames into breakpoint and non-breakpoint frames. Breakpoints indicate an action's commencement or termination. We use single action sequences for the training data set. The test sequences, on the other hand are continuous sequences of human activity that consist of three or more actions in succession. The system has been tested on continuous activity sequences containing actions such as walking, sitting down, standing up, bending, getting up, squatting and rising. It detects the breakpoints and classifies the actions between them.
提出了一种对连续人体活动进行自动分割和识别的方法。我们将一个连续的人类活动分割成单独的动作,并正确地识别每个动作。相机从侧面视角观察主体:在不同动作的执行之间没有明显的中断或停顿。我们对每项诉讼的开始或终止均不知情。我们计算了身体的三个主要部分与垂直轴的夹角,即躯干,腿的上部分和腿的下部分。使用这三个角度作为特征向量,我们将帧分为断点帧和非断点帧。断点指示操作的开始或终止。我们使用单动作序列作为训练数据集。另一方面,测试序列是由三个或更多连续动作组成的人类活动的连续序列。该系统已经在连续的活动序列中进行了测试,包括走路、坐下、站起来、弯腰、起身、蹲起来和起身等动作。它检测断点并对它们之间的操作进行分类。
{"title":"Segmentation and recognition of continuous human activity","authors":"Anjum Ali, J. Aggarwal","doi":"10.1109/EVENT.2001.938863","DOIUrl":"https://doi.org/10.1109/EVENT.2001.938863","url":null,"abstract":"This paper presents a methodology for automatic segmentation and recognition of continuous human activity. We segment a continuous human activity into separate actions and correctly identify each action. The camera views the subject from the lateral view: there are no distinct breaks or pauses between the execution of different actions. We have no prior knowledge about the commencement or termination of each action. We compute the angles subtended by three major components of the body with the vertical axis, namely the torso, the upper component of the leg and the lower component of the leg. Using these three angles as a feature vector we classify frames into breakpoint and non-breakpoint frames. Breakpoints indicate an action's commencement or termination. We use single action sequences for the training data set. The test sequences, on the other hand are continuous sequences of human activity that consist of three or more actions in succession. The system has been tested on continuous activity sequences containing actions such as walking, sitting down, standing up, bending, getting up, squatting and rising. It detects the breakpoints and classifies the actions between them.","PeriodicalId":375539,"journal":{"name":"Proceedings IEEE Workshop on Detection and Recognition of Events in Video","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115748044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 204
Recognizing action events from multiple viewpoints 从多个视点识别动作事件
Pub Date : 2001-07-08 DOI: 10.1109/EVENT.2001.938868
T. Syeda-Mahmood, M. Alex O. Vasilescu, Saratendu Sethi
A first step towards an understanding of the semantic content in a video is the reliable detection and recognition of actions performed by objects. This is a difficult problem due to the enormous variability in an action's appearance when seen from different viewpoints and/or at different times. In this paper we address the recognition of actions by taking a novel approach that models actions as special types of 3D objects. Specifically, we observe that any action can be represented as a generalized cylinder, called the action cylinder. Reliable recognition is achieved by recovering the viewpoint transformation between the reference (model) and given action cylinders. A set of 8 corresponding points from time-wise corresponding cross-sections is shown to be sufficient to align the two cylinders under perspective projection. A surprising conclusion from visualizing actions as objects is that rigid, articulated, and nonrigid actions can all be modeled in a uniform framework.
理解视频中语义内容的第一步是对对象执行的动作进行可靠的检测和识别。这是一个困难的问题,因为当从不同的角度和/或在不同的时间观察时,动作的外观会有巨大的可变性。在本文中,我们通过采用一种新颖的方法将动作建模为特殊类型的3D对象来解决动作识别问题。具体地说,我们观察到任何作用都可以表示为一个广义的圆柱体,称为作用圆柱体。通过恢复参考(模型)与给定动作柱面之间的视点变换,实现了可靠的识别。在透视投影下,从时间方向上对应的横截面上得到的一组8个对应点足以使两个圆柱体对齐。将动作可视化为对象的一个令人惊讶的结论是,刚性、铰接和非刚性动作都可以在一个统一的框架中建模。
{"title":"Recognizing action events from multiple viewpoints","authors":"T. Syeda-Mahmood, M. Alex O. Vasilescu, Saratendu Sethi","doi":"10.1109/EVENT.2001.938868","DOIUrl":"https://doi.org/10.1109/EVENT.2001.938868","url":null,"abstract":"A first step towards an understanding of the semantic content in a video is the reliable detection and recognition of actions performed by objects. This is a difficult problem due to the enormous variability in an action's appearance when seen from different viewpoints and/or at different times. In this paper we address the recognition of actions by taking a novel approach that models actions as special types of 3D objects. Specifically, we observe that any action can be represented as a generalized cylinder, called the action cylinder. Reliable recognition is achieved by recovering the viewpoint transformation between the reference (model) and given action cylinders. A set of 8 corresponding points from time-wise corresponding cross-sections is shown to be sufficient to align the two cylinders under perspective projection. A surprising conclusion from visualizing actions as objects is that rigid, articulated, and nonrigid actions can all be modeled in a uniform framework.","PeriodicalId":375539,"journal":{"name":"Proceedings IEEE Workshop on Detection and Recognition of Events in Video","volume":"46 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132286511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 115
Temporal events in all dimensions and scales 所有维度和尺度的时间事件
Pub Date : 2001-07-08 DOI: 10.1109/EVENT.2001.938870
M. Slaney, D. Ponceleón, James Kaufman
This paper describes a new representation for the audio and visual information in a video signal. We use reduce the dimensionality of the signals with singular-value decomposition (SVD) or mel-frequency cepstral coefficients (MFCC). We apply these transforms to word, (word transcript, semantic space or latent semantic indexing), image (color histogram data) and audio (timbre) data. Using scale-space techniques we find large jumps in a video's path, which are evidence for events. We use these techniques to analyze the temporal properties of the audio and image data in a video. This analysis creates a hierarchical segmentation of the video, or a table-of-contents, from both audio and the image data.
本文提出了一种新的视频信号中视听信息的表示方法。我们使用奇异值分解(SVD)或梅尔频率倒谱系数(MFCC)对信号进行降维。我们将这些转换应用于单词(单词转录、语义空间或潜在语义索引)、图像(颜色直方图数据)和音频(音色)数据。使用尺度空间技术,我们发现视频路径中的大跳跃,这是事件的证据。我们使用这些技术来分析视频中音频和图像数据的时间属性。这种分析从音频和图像数据中创建了视频的分层分割,或内容表。
{"title":"Temporal events in all dimensions and scales","authors":"M. Slaney, D. Ponceleón, James Kaufman","doi":"10.1109/EVENT.2001.938870","DOIUrl":"https://doi.org/10.1109/EVENT.2001.938870","url":null,"abstract":"This paper describes a new representation for the audio and visual information in a video signal. We use reduce the dimensionality of the signals with singular-value decomposition (SVD) or mel-frequency cepstral coefficients (MFCC). We apply these transforms to word, (word transcript, semantic space or latent semantic indexing), image (color histogram data) and audio (timbre) data. Using scale-space techniques we find large jumps in a video's path, which are evidence for events. We use these techniques to analyze the temporal properties of the audio and image data in a video. This analysis creates a hierarchical segmentation of the video, or a table-of-contents, from both audio and the image data.","PeriodicalId":375539,"journal":{"name":"Proceedings IEEE Workshop on Detection and Recognition of Events in Video","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134312243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards a unified framework for tracking and analysis of human motion 朝着一个统一的框架来跟踪和分析人体运动
Pub Date : 2001-07-08 DOI: 10.1109/EVENT.2001.938865
N. Krahnstoever, M. Yeasin, Rajeev Sharma
We propose a framework for detecting, tracking and analyzing non-rigid motion based on learned motion patterns. The framework features an appearance based approach to represent the spatial information and hidden Markov models (HMM) to encode the temporal dynamics of the time varying visual patterns. The low level spatial feature extraction is fused with the temporal analysis, providing a unified spatio-temporal approach to common detection, tracking and classification problems. This is a promising approach for many classes of human motion patterns. Visual tracking is achieved by extracting the most probable sequence of target locations from a video stream using a combination of random sampling and the forward procedure from HMM theory. The method allows us to perform a set of important tasks such as activity recognition, gait-analysis and keyframe extraction. The efficacy of the method is shown on both natural and synthetic test sequences.
我们提出了一个基于学习运动模式的检测、跟踪和分析非刚性运动的框架。该框架采用基于外观的方法来表示空间信息,并使用隐马尔可夫模型(HMM)来编码时变视觉模式的时间动态。将低层次空间特征提取与时间分析相融合,为常见的检测、跟踪和分类问题提供统一的时空方法。这是一种很有前途的方法,适用于许多类型的人类运动模式。利用随机抽样和HMM理论的前向过程相结合,从视频流中提取最可能的目标位置序列,从而实现视觉跟踪。该方法允许我们执行一系列重要的任务,如活动识别,步态分析和关键帧提取。该方法的有效性在天然和合成测试序列上都得到了证明。
{"title":"Towards a unified framework for tracking and analysis of human motion","authors":"N. Krahnstoever, M. Yeasin, Rajeev Sharma","doi":"10.1109/EVENT.2001.938865","DOIUrl":"https://doi.org/10.1109/EVENT.2001.938865","url":null,"abstract":"We propose a framework for detecting, tracking and analyzing non-rigid motion based on learned motion patterns. The framework features an appearance based approach to represent the spatial information and hidden Markov models (HMM) to encode the temporal dynamics of the time varying visual patterns. The low level spatial feature extraction is fused with the temporal analysis, providing a unified spatio-temporal approach to common detection, tracking and classification problems. This is a promising approach for many classes of human motion patterns. Visual tracking is achieved by extracting the most probable sequence of target locations from a video stream using a combination of random sampling and the forward procedure from HMM theory. The method allows us to perform a set of important tasks such as activity recognition, gait-analysis and keyframe extraction. The efficacy of the method is shown on both natural and synthetic test sequences.","PeriodicalId":375539,"journal":{"name":"Proceedings IEEE Workshop on Detection and Recognition of Events in Video","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127569892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Hierarchical unsupervised learning of facial expression categories 面部表情分类的分层无监督学习
Pub Date : 2001-07-08 DOI: 10.1109/EVENT.2001.938872
J. Hoey
We consider the problem of unsupervised classification of temporal sequences of facial expressions in video. This problem arises in the design of an adaptive visual agent, which must be capable of identifying appropriate classes of visual events without supervision to effectively complete its tasks. We present a multilevel dynamic Bayesian network that learns the high-level dynamics of facial expressions simultaneously, with models of the expressions themselves. We show how the parameters of the model can be learned in a scalable and efficient way. We present preliminary results using real video data and a class of simulated dynamic event models. The results show that our model correctly classifies the input data comparably to a standard event classification approach, while also learning the high-level model parameters.
研究了视频中面部表情时间序列的无监督分类问题。这个问题出现在自适应视觉代理的设计中,它必须能够在没有监督的情况下识别适当类别的视觉事件,以有效地完成其任务。我们提出了一个多层动态贝叶斯网络,它可以同时学习面部表情的高级动态,并具有面部表情本身的模型。我们展示了如何以可扩展和有效的方式学习模型的参数。我们使用真实视频数据和一类模拟动态事件模型给出了初步结果。结果表明,与标准事件分类方法相比,我们的模型对输入数据进行了正确的分类,同时还学习了高级模型参数。
{"title":"Hierarchical unsupervised learning of facial expression categories","authors":"J. Hoey","doi":"10.1109/EVENT.2001.938872","DOIUrl":"https://doi.org/10.1109/EVENT.2001.938872","url":null,"abstract":"We consider the problem of unsupervised classification of temporal sequences of facial expressions in video. This problem arises in the design of an adaptive visual agent, which must be capable of identifying appropriate classes of visual events without supervision to effectively complete its tasks. We present a multilevel dynamic Bayesian network that learns the high-level dynamics of facial expressions simultaneously, with models of the expressions themselves. We show how the parameters of the model can be learned in a scalable and efficient way. We present preliminary results using real video data and a class of simulated dynamic event models. The results show that our model correctly classifies the input data comparably to a standard event classification approach, while also learning the high-level model parameters.","PeriodicalId":375539,"journal":{"name":"Proceedings IEEE Workshop on Detection and Recognition of Events in Video","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115794398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Detecting independently moving objects and their interactions in georeferenced airborne video 在地理参考机载视频中检测独立运动物体及其相互作用
Pub Date : 2001-07-08 DOI: 10.1109/EVENT.2001.938861
J. Burns
In airborne video, objects are tracked from a moving camera and often imaged at very low resolution. The camera movement makes it difficult to determine whether or not an object is in motion; the low-resolution imagery makes it difficult to classify the objects and their activities. When comparable, the object's georeferenced trajectory contains useful information for the solution of both of these problems. We describe a novel technique for detecting independent movement by analyzing georeferenced object motion relative to the trajectory of the camera. The method is demonstrated on over a hundred objects and parallax artifacts, and its performance is analyzed relative to difficult object behavior and camera model errors. We also describe a new method for classifying objects and events using features of georeferenced trajectories, such as duration of acceleration, measured at key phases of the events. These features, combined with the periodicity of the image motion, are successfully used classify events in the domain of person-vehicle interactions.
在机载视频中,物体由移动摄像机跟踪,通常以非常低的分辨率成像。相机的移动使得确定物体是否在运动变得困难;低分辨率的图像使得对物体及其活动进行分类变得困难。当比较时,目标的地理参考轨迹包含了解决这两个问题的有用信息。我们描述了一种通过分析相对于相机轨迹的地理参考物体运动来检测独立运动的新技术。在100多个目标和视差工件上进行了验证,并分析了该方法在困难目标行为和相机模型误差方面的性能。我们还描述了一种利用在事件关键阶段测量的地理参考轨迹特征(如加速度持续时间)对物体和事件进行分类的新方法。这些特征与图像运动的周期性相结合,成功地用于人-车交互领域的事件分类。
{"title":"Detecting independently moving objects and their interactions in georeferenced airborne video","authors":"J. Burns","doi":"10.1109/EVENT.2001.938861","DOIUrl":"https://doi.org/10.1109/EVENT.2001.938861","url":null,"abstract":"In airborne video, objects are tracked from a moving camera and often imaged at very low resolution. The camera movement makes it difficult to determine whether or not an object is in motion; the low-resolution imagery makes it difficult to classify the objects and their activities. When comparable, the object's georeferenced trajectory contains useful information for the solution of both of these problems. We describe a novel technique for detecting independent movement by analyzing georeferenced object motion relative to the trajectory of the camera. The method is demonstrated on over a hundred objects and parallax artifacts, and its performance is analyzed relative to difficult object behavior and camera model errors. We also describe a new method for classifying objects and events using features of georeferenced trajectories, such as duration of acceleration, measured at key phases of the events. These features, combined with the periodicity of the image motion, are successfully used classify events in the domain of person-vehicle interactions.","PeriodicalId":375539,"journal":{"name":"Proceedings IEEE Workshop on Detection and Recognition of Events in Video","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125347836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Hierarchical motion history images for recognizing human motion 用于人体运动识别的分层运动历史图像
Pub Date : 2001-07-08 DOI: 10.1109/EVENT.2001.938864
James W. Davis
There has been increasing interest in computer analysis and recognition of human motion. Previously we presented an efficient real-time approach for representing human motion using a compact "motion history image" (MHI). Recognition was achieved by statistically matching moment-based features. To address previous problems related to global analysis and limited recognition, we present a hierarchical extension to the original MHI framework to compute dense (local) motion flow directly from the MHI. A hierarchical partitioning of motions by speed in an MHI pyramid enables efficient calculation of image motions using fixed-size gradient operators. To characterize the resulting motion field, a polar histogram of motion orientations is described. The hierarchical MHI approach remains a computationally inexpensive method for analysis of human motions.
人们对计算机对人体运动的分析和识别越来越感兴趣。之前,我们提出了一种使用紧凑的“运动历史图像”(MHI)来表示人体运动的有效实时方法。通过统计匹配基于矩的特征来实现识别。为了解决先前与全局分析和有限识别相关的问题,我们提出了原始MHI框架的分层扩展,以直接从MHI计算密集(局部)运动流。在MHI金字塔中,通过速度对运动进行分层划分,可以使用固定大小的梯度算子有效地计算图像运动。为了描述产生的运动场,描述了运动方向的极直方图。分层MHI方法仍然是一种计算成本低廉的人体运动分析方法。
{"title":"Hierarchical motion history images for recognizing human motion","authors":"James W. Davis","doi":"10.1109/EVENT.2001.938864","DOIUrl":"https://doi.org/10.1109/EVENT.2001.938864","url":null,"abstract":"There has been increasing interest in computer analysis and recognition of human motion. Previously we presented an efficient real-time approach for representing human motion using a compact \"motion history image\" (MHI). Recognition was achieved by statistically matching moment-based features. To address previous problems related to global analysis and limited recognition, we present a hierarchical extension to the original MHI framework to compute dense (local) motion flow directly from the MHI. A hierarchical partitioning of motions by speed in an MHI pyramid enables efficient calculation of image motions using fixed-size gradient operators. To characterize the resulting motion field, a polar histogram of motion orientations is described. The hierarchical MHI approach remains a computationally inexpensive method for analysis of human motions.","PeriodicalId":375539,"journal":{"name":"Proceedings IEEE Workshop on Detection and Recognition of Events in Video","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130009457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 206
Content-based video retrieval by integrating spatio-temporal and stochastic recognition of events 结合时空和随机事件识别的基于内容的视频检索
Pub Date : 2001-07-08 DOI: 10.1109/EVENT.2001.938869
M. Petkovic, W. Jonker
As amounts of publicly available video data grow the need to query this data efficiently becomes significant. Consequently content-based retrieval of video data turns out to be a challenging and important problem. We address the specific aspect of inferring semantics automatically from raw video data. In particular, we introduce a new video data model that supports the integrated use of two different approaches for mapping low-level features to high-level concepts. Firstly, the model is extended with a rule-based approach that supports spatio-temporal formalization of high-level concepts, and then with a stochastic approach. Furthermore, results on real tennis video data are presented, demonstrating the validity of both approaches, as well us advantages of their integrated use.
随着公开视频数据量的增长,高效查询这些数据的需求变得非常重要。因此,基于内容的视频数据检索成为一个具有挑战性和重要意义的问题。我们解决了从原始视频数据自动推断语义的具体方面。特别是,我们引入了一个新的视频数据模型,该模型支持集成使用两种不同的方法来将低级特征映射到高级概念。首先,采用基于规则的方法扩展模型,支持高层概念的时空形式化,然后采用随机方法扩展模型。此外,在真实的网球视频数据上给出了结果,证明了两种方法的有效性,以及它们综合使用的优势。
{"title":"Content-based video retrieval by integrating spatio-temporal and stochastic recognition of events","authors":"M. Petkovic, W. Jonker","doi":"10.1109/EVENT.2001.938869","DOIUrl":"https://doi.org/10.1109/EVENT.2001.938869","url":null,"abstract":"As amounts of publicly available video data grow the need to query this data efficiently becomes significant. Consequently content-based retrieval of video data turns out to be a challenging and important problem. We address the specific aspect of inferring semantics automatically from raw video data. In particular, we introduce a new video data model that supports the integrated use of two different approaches for mapping low-level features to high-level concepts. Firstly, the model is extended with a rule-based approach that supports spatio-temporal formalization of high-level concepts, and then with a stochastic approach. Furthermore, results on real tennis video data are presented, demonstrating the validity of both approaches, as well us advantages of their integrated use.","PeriodicalId":375539,"journal":{"name":"Proceedings IEEE Workshop on Detection and Recognition of Events in Video","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129185863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 89
View-invariant representation and learning of human action 人类行为的视图不变表示和学习
Pub Date : 2001-07-08 DOI: 10.1109/EVENT.2001.938867
C. Rao, M. Shah
Automatically understanding human actions from video sequences is a very challenging problem. This involves the extraction of relevant visual information from a video sequence, representation of that information in a suitable form, and interpretation of visual information for the purpose of recognition and learning. We first present a view-invariant representation of action consisting of dynamic instants and intervals, which is computed using spatiotemporal curvature of a trajectory. This representation is then used by our system to learn human actions without any training. The system automatically segments video into individual actions, and computes a view-invariant representation for each action. The system is able to incrementally, learn different actions starting with no model. It is able to discover different instances of the same action performed by different people, and in different viewpoints. In order to validate our approach, we present results on video clips in which roughly 50 actions were performed by five different people in different viewpoints. Our system performed impressively by correctly interpreting most actions.
从视频序列中自动理解人类行为是一个非常具有挑战性的问题。这包括从视频序列中提取相关的视觉信息,以适当的形式表示这些信息,以及为识别和学习的目的解释视觉信息。我们首先提出了一种由动态瞬间和动态间隔组成的动作的视图不变表示,它是使用轨迹的时空曲率计算的。然后,我们的系统使用这种表示来学习人类的行为,而无需任何训练。系统自动将视频分割成单个动作,并计算每个动作的视图不变表示。系统能够在没有模型的情况下逐步学习不同的动作。它能够发现不同的人以不同的视角执行同一动作的不同实例。为了验证我们的方法,我们展示了视频剪辑的结果,其中五个不同的人以不同的视角执行了大约50个动作。我们的系统通过正确地解释大多数动作而表现得令人印象深刻。
{"title":"View-invariant representation and learning of human action","authors":"C. Rao, M. Shah","doi":"10.1109/EVENT.2001.938867","DOIUrl":"https://doi.org/10.1109/EVENT.2001.938867","url":null,"abstract":"Automatically understanding human actions from video sequences is a very challenging problem. This involves the extraction of relevant visual information from a video sequence, representation of that information in a suitable form, and interpretation of visual information for the purpose of recognition and learning. We first present a view-invariant representation of action consisting of dynamic instants and intervals, which is computed using spatiotemporal curvature of a trajectory. This representation is then used by our system to learn human actions without any training. The system automatically segments video into individual actions, and computes a view-invariant representation for each action. The system is able to incrementally, learn different actions starting with no model. It is able to discover different instances of the same action performed by different people, and in different viewpoints. In order to validate our approach, we present results on video clips in which roughly 50 actions were performed by five different people in different viewpoints. Our system performed impressively by correctly interpreting most actions.","PeriodicalId":375539,"journal":{"name":"Proceedings IEEE Workshop on Detection and Recognition of Events in Video","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122603456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Foreground segmentation using adaptive mixture models in color and depth 前景分割使用自适应混合模型在颜色和深度
Pub Date : 2001-07-08 DOI: 10.1109/EVENT.2001.938860
M. Harville, G. Gordon, J. Woodfill
Segmentation of novel or dynamic objects in a scene, often referred to as "background subtraction" or foreground segmentation", is a critical early in step in most computer vision applications in domains such as surveillance and human-computer interaction. All previously described, real-time methods fail to handle properly one or more common phenomena, such as global illumination changes, shadows, inter-reflections, similarity of foreground color to background and non-static backgrounds (e.g. active video displays or trees waving in the wind). The advent of hardware and software for real-time computation of depth imagery makes better approaches possible. We propose a method for modeling the background that uses per-pixel, time-adaptive, Gaussian mixtures in the combined input space of depth and luminance-invariant color. This combination in itself is novel, but we further improve it by introducing the ideas of (1) modulating the background model learning rate based on scene activity, and (2) making color-based segmentation criteria dependent on depth observations. Our experiments show that the method possesses much greater robustness to problematic phenomena than the prior state-of-the-art, without sacrificing real-time performance, making it well-suited for a wide range of practical applications in video event detection and recognition.
场景中新物体或动态物体的分割,通常被称为“背景减去”或“前景分割”,是大多数计算机视觉应用领域(如监视和人机交互)的关键早期步骤。所有先前描述的实时方法都不能正确处理一个或多个常见现象,例如全局照明变化、阴影、相互反射、前景颜色与背景和非静态背景的相似性(例如活动视频显示或风中摇曳的树木)。用于深度图像实时计算的硬件和软件的出现使更好的方法成为可能。我们提出了一种在深度和亮度不变颜色的组合输入空间中使用逐像素、时间自适应、高斯混合的背景建模方法。这种组合本身是新颖的,但我们通过引入(1)基于场景活动调节背景模型学习率和(2)基于深度观察的基于颜色的分割标准的思想进一步改进了它。我们的实验表明,该方法在不牺牲实时性能的情况下,对问题现象具有比现有技术更强的鲁棒性,使其非常适合视频事件检测和识别中的广泛实际应用。
{"title":"Foreground segmentation using adaptive mixture models in color and depth","authors":"M. Harville, G. Gordon, J. Woodfill","doi":"10.1109/EVENT.2001.938860","DOIUrl":"https://doi.org/10.1109/EVENT.2001.938860","url":null,"abstract":"Segmentation of novel or dynamic objects in a scene, often referred to as \"background subtraction\" or foreground segmentation\", is a critical early in step in most computer vision applications in domains such as surveillance and human-computer interaction. All previously described, real-time methods fail to handle properly one or more common phenomena, such as global illumination changes, shadows, inter-reflections, similarity of foreground color to background and non-static backgrounds (e.g. active video displays or trees waving in the wind). The advent of hardware and software for real-time computation of depth imagery makes better approaches possible. We propose a method for modeling the background that uses per-pixel, time-adaptive, Gaussian mixtures in the combined input space of depth and luminance-invariant color. This combination in itself is novel, but we further improve it by introducing the ideas of (1) modulating the background model learning rate based on scene activity, and (2) making color-based segmentation criteria dependent on depth observations. Our experiments show that the method possesses much greater robustness to problematic phenomena than the prior state-of-the-art, without sacrificing real-time performance, making it well-suited for a wide range of practical applications in video event detection and recognition.","PeriodicalId":375539,"journal":{"name":"Proceedings IEEE Workshop on Detection and Recognition of Events in Video","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131836431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 264
期刊
Proceedings IEEE Workshop on Detection and Recognition of Events in Video
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1