{"title":"Textual description of human activities by tracking head and hand motions","authors":"A. Kojima, Takeshi Tamura, K. Fukunaga","doi":"10.1109/ICPR.2002.1048491","DOIUrl":null,"url":null,"abstract":"We propose a method for describing human activities from video images by tracking human skin regions: facial and hand regions. To detect skin regions robustly, three kinds of probabilistic information are extracted and integrated using Dempster-Shafer theory. The main difficulty in transforming video images into textual descriptions is bridging the semantic gap between them. By associating visual features of head and hand motion with natural language concepts, appropriate syntactic components such as verbs, objects, etc. are determined and translated into natural language.","PeriodicalId":159502,"journal":{"name":"Object recognition supported by user interaction for service robots","volume":"651 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Object recognition supported by user interaction for service robots","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPR.2002.1048491","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
We propose a method for describing human activities from video images by tracking human skin regions: facial and hand regions. To detect skin regions robustly, three kinds of probabilistic information are extracted and integrated using Dempster-Shafer theory. The main difficulty in transforming video images into textual descriptions is bridging the semantic gap between them. By associating visual features of head and hand motion with natural language concepts, appropriate syntactic components such as verbs, objects, etc. are determined and translated into natural language.