人类行为的视图不变表示和学习

Proceedings IEEE Workshop on Detection and Recognition of Events in Video Pub Date : 2001-07-08 DOI:10.1109/EVENT.2001.938867

C. Rao, M. Shah

{"title":"人类行为的视图不变表示和学习","authors":"C. Rao, M. Shah","doi":"10.1109/EVENT.2001.938867","DOIUrl":null,"url":null,"abstract":"Automatically understanding human actions from video sequences is a very challenging problem. This involves the extraction of relevant visual information from a video sequence, representation of that information in a suitable form, and interpretation of visual information for the purpose of recognition and learning. We first present a view-invariant representation of action consisting of dynamic instants and intervals, which is computed using spatiotemporal curvature of a trajectory. This representation is then used by our system to learn human actions without any training. The system automatically segments video into individual actions, and computes a view-invariant representation for each action. The system is able to incrementally, learn different actions starting with no model. It is able to discover different instances of the same action performed by different people, and in different viewpoints. In order to validate our approach, we present results on video clips in which roughly 50 actions were performed by five different people in different viewpoints. Our system performed impressively by correctly interpreting most actions.","PeriodicalId":375539,"journal":{"name":"Proceedings IEEE Workshop on Detection and Recognition of Events in Video","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"View-invariant representation and learning of human action\",\"authors\":\"C. Rao, M. Shah\",\"doi\":\"10.1109/EVENT.2001.938867\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatically understanding human actions from video sequences is a very challenging problem. This involves the extraction of relevant visual information from a video sequence, representation of that information in a suitable form, and interpretation of visual information for the purpose of recognition and learning. We first present a view-invariant representation of action consisting of dynamic instants and intervals, which is computed using spatiotemporal curvature of a trajectory. This representation is then used by our system to learn human actions without any training. The system automatically segments video into individual actions, and computes a view-invariant representation for each action. The system is able to incrementally, learn different actions starting with no model. It is able to discover different instances of the same action performed by different people, and in different viewpoints. In order to validate our approach, we present results on video clips in which roughly 50 actions were performed by five different people in different viewpoints. Our system performed impressively by correctly interpreting most actions.\",\"PeriodicalId\":375539,\"journal\":{\"name\":\"Proceedings IEEE Workshop on Detection and Recognition of Events in Video\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings IEEE Workshop on Detection and Recognition of Events in Video\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EVENT.2001.938867\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE Workshop on Detection and Recognition of Events in Video","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EVENT.2001.938867","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

从视频序列中自动理解人类行为是一个非常具有挑战性的问题。这包括从视频序列中提取相关的视觉信息，以适当的形式表示这些信息，以及为识别和学习的目的解释视觉信息。我们首先提出了一种由动态瞬间和动态间隔组成的动作的视图不变表示，它是使用轨迹的时空曲率计算的。然后，我们的系统使用这种表示来学习人类的行为，而无需任何训练。系统自动将视频分割成单个动作，并计算每个动作的视图不变表示。系统能够在没有模型的情况下逐步学习不同的动作。它能够发现不同的人以不同的视角执行同一动作的不同实例。为了验证我们的方法，我们展示了视频剪辑的结果，其中五个不同的人以不同的视角执行了大约50个动作。我们的系统通过正确地解释大多数动作而表现得令人印象深刻。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

View-invariant representation and learning of human action

Automatically understanding human actions from video sequences is a very challenging problem. This involves the extraction of relevant visual information from a video sequence, representation of that information in a suitable form, and interpretation of visual information for the purpose of recognition and learning. We first present a view-invariant representation of action consisting of dynamic instants and intervals, which is computed using spatiotemporal curvature of a trajectory. This representation is then used by our system to learn human actions without any training. The system automatically segments video into individual actions, and computes a view-invariant representation for each action. The system is able to incrementally, learn different actions starting with no model. It is able to discover different instances of the same action performed by different people, and in different viewpoints. In order to validate our approach, we present results on video clips in which roughly 50 actions were performed by five different people in different viewpoints. Our system performed impressively by correctly interpreting most actions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings IEEE Workshop on Detection and Recognition of Events in Video

自引率

0.00%

发文量

期刊最新文献

Multimodal 3-D tracking and event detection via the particle filter Segmentation and recognition of continuous human activity Hierarchical unsupervised learning of facial expression categories View-invariant representation and learning of human action Detecting independently moving objects and their interactions in georeferenced airborne video