{"title":"A multi-modal gesture recognition system in a Human-Robot Interaction scenario","authors":"Zhi Li, R. Jarvis","doi":"10.1109/ROSE.2009.5355984","DOIUrl":null,"url":null,"abstract":"Recognition of non-verbal gestures is essential for robots to understand a user's state and intention in a Human-Robot Interaction (HRI) scenario. In this paper a multi-modal system is proposed to recognize a user's hand gestures and estimate body poses from the robot's viewpoint only. A range camera is employed to derive the depth data at a high frame rate. Depth data is useful for image segmentation, objects detection and localization in 3D spaces. A pair of stereo cameras is used to sense the user's head gestures and eye gaze direction, which provide useful information about the user's attention direction. Both hand shapes and hand trajectories are recognized. Full configurations of body poses are estimated using a model-based algorithm. Poses are tracked by a Particle Filter method, and refined by a gradient-based searching method in the neighborhood of the particles which have top largest weights.","PeriodicalId":107220,"journal":{"name":"2009 IEEE International Workshop on Robotic and Sensors Environments","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Workshop on Robotic and Sensors Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROSE.2009.5355984","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
Recognition of non-verbal gestures is essential for robots to understand a user's state and intention in a Human-Robot Interaction (HRI) scenario. In this paper a multi-modal system is proposed to recognize a user's hand gestures and estimate body poses from the robot's viewpoint only. A range camera is employed to derive the depth data at a high frame rate. Depth data is useful for image segmentation, objects detection and localization in 3D spaces. A pair of stereo cameras is used to sense the user's head gestures and eye gaze direction, which provide useful information about the user's attention direction. Both hand shapes and hand trajectories are recognized. Full configurations of body poses are estimated using a model-based algorithm. Poses are tracked by a Particle Filter method, and refined by a gradient-based searching method in the neighborhood of the particles which have top largest weights.