{"title":"人类行为的视图不变表示和学习","authors":"C. Rao, M. Shah","doi":"10.1109/EVENT.2001.938867","DOIUrl":null,"url":null,"abstract":"Automatically understanding human actions from video sequences is a very challenging problem. This involves the extraction of relevant visual information from a video sequence, representation of that information in a suitable form, and interpretation of visual information for the purpose of recognition and learning. We first present a view-invariant representation of action consisting of dynamic instants and intervals, which is computed using spatiotemporal curvature of a trajectory. This representation is then used by our system to learn human actions without any training. The system automatically segments video into individual actions, and computes a view-invariant representation for each action. The system is able to incrementally, learn different actions starting with no model. It is able to discover different instances of the same action performed by different people, and in different viewpoints. In order to validate our approach, we present results on video clips in which roughly 50 actions were performed by five different people in different viewpoints. Our system performed impressively by correctly interpreting most actions.","PeriodicalId":375539,"journal":{"name":"Proceedings IEEE Workshop on Detection and Recognition of Events in Video","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"View-invariant representation and learning of human action\",\"authors\":\"C. Rao, M. Shah\",\"doi\":\"10.1109/EVENT.2001.938867\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatically understanding human actions from video sequences is a very challenging problem. This involves the extraction of relevant visual information from a video sequence, representation of that information in a suitable form, and interpretation of visual information for the purpose of recognition and learning. We first present a view-invariant representation of action consisting of dynamic instants and intervals, which is computed using spatiotemporal curvature of a trajectory. This representation is then used by our system to learn human actions without any training. The system automatically segments video into individual actions, and computes a view-invariant representation for each action. The system is able to incrementally, learn different actions starting with no model. It is able to discover different instances of the same action performed by different people, and in different viewpoints. In order to validate our approach, we present results on video clips in which roughly 50 actions were performed by five different people in different viewpoints. Our system performed impressively by correctly interpreting most actions.\",\"PeriodicalId\":375539,\"journal\":{\"name\":\"Proceedings IEEE Workshop on Detection and Recognition of Events in Video\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings IEEE Workshop on Detection and Recognition of Events in Video\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EVENT.2001.938867\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE Workshop on Detection and Recognition of Events in Video","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EVENT.2001.938867","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
View-invariant representation and learning of human action
Automatically understanding human actions from video sequences is a very challenging problem. This involves the extraction of relevant visual information from a video sequence, representation of that information in a suitable form, and interpretation of visual information for the purpose of recognition and learning. We first present a view-invariant representation of action consisting of dynamic instants and intervals, which is computed using spatiotemporal curvature of a trajectory. This representation is then used by our system to learn human actions without any training. The system automatically segments video into individual actions, and computes a view-invariant representation for each action. The system is able to incrementally, learn different actions starting with no model. It is able to discover different instances of the same action performed by different people, and in different viewpoints. In order to validate our approach, we present results on video clips in which roughly 50 actions were performed by five different people in different viewpoints. Our system performed impressively by correctly interpreting most actions.