{"title":"Real-Time 3D Human Pose Estimation from Monocular View with Applications to Event Detection and Video Gaming","authors":"Shian-Ru Ke, Liang-Jia Zhu, Jenq-Neng Hwang, Hung-I Pai, Kung-Ming Lan, C. Liao","doi":"10.1109/AVSS.2010.80","DOIUrl":null,"url":null,"abstract":"We present an effective real-time approach forautomatically estimating 3D human body poses frommonocular video sequences. In this approach, human bodyis automatically detected from video sequence, then imagefeatures such as silhouette, edge and color are extractedand integrated to infer 3D human poses by iterativelyminimizing the cost function defined between 2D featuresderived from the projected 3D model and those extractedfrom video sequence. In addition, 2D locations of head,hands, and feet are tracked to facilitate 3D tracking. Whentracking failure happens, the approach can detect andrecover from failures quickly. Finally, the efficiency androbustness of the proposed approach is shown in two realapplications: human event detection and video gaming.","PeriodicalId":415758,"journal":{"name":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","volume":"16 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS.2010.80","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31
Abstract
We present an effective real-time approach forautomatically estimating 3D human body poses frommonocular video sequences. In this approach, human bodyis automatically detected from video sequence, then imagefeatures such as silhouette, edge and color are extractedand integrated to infer 3D human poses by iterativelyminimizing the cost function defined between 2D featuresderived from the projected 3D model and those extractedfrom video sequence. In addition, 2D locations of head,hands, and feet are tracked to facilitate 3D tracking. Whentracking failure happens, the approach can detect andrecover from failures quickly. Finally, the efficiency androbustness of the proposed approach is shown in two realapplications: human event detection and video gaming.