{"title":"人机交互场景中的多模态手势识别系统","authors":"Zhi Li, R. Jarvis","doi":"10.1109/ROSE.2009.5355984","DOIUrl":null,"url":null,"abstract":"Recognition of non-verbal gestures is essential for robots to understand a user's state and intention in a Human-Robot Interaction (HRI) scenario. In this paper a multi-modal system is proposed to recognize a user's hand gestures and estimate body poses from the robot's viewpoint only. A range camera is employed to derive the depth data at a high frame rate. Depth data is useful for image segmentation, objects detection and localization in 3D spaces. A pair of stereo cameras is used to sense the user's head gestures and eye gaze direction, which provide useful information about the user's attention direction. Both hand shapes and hand trajectories are recognized. Full configurations of body poses are estimated using a model-based algorithm. Poses are tracked by a Particle Filter method, and refined by a gradient-based searching method in the neighborhood of the particles which have top largest weights.","PeriodicalId":107220,"journal":{"name":"2009 IEEE International Workshop on Robotic and Sensors Environments","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"A multi-modal gesture recognition system in a Human-Robot Interaction scenario\",\"authors\":\"Zhi Li, R. Jarvis\",\"doi\":\"10.1109/ROSE.2009.5355984\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recognition of non-verbal gestures is essential for robots to understand a user's state and intention in a Human-Robot Interaction (HRI) scenario. In this paper a multi-modal system is proposed to recognize a user's hand gestures and estimate body poses from the robot's viewpoint only. A range camera is employed to derive the depth data at a high frame rate. Depth data is useful for image segmentation, objects detection and localization in 3D spaces. A pair of stereo cameras is used to sense the user's head gestures and eye gaze direction, which provide useful information about the user's attention direction. Both hand shapes and hand trajectories are recognized. Full configurations of body poses are estimated using a model-based algorithm. Poses are tracked by a Particle Filter method, and refined by a gradient-based searching method in the neighborhood of the particles which have top largest weights.\",\"PeriodicalId\":107220,\"journal\":{\"name\":\"2009 IEEE International Workshop on Robotic and Sensors Environments\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE International Workshop on Robotic and Sensors Environments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROSE.2009.5355984\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Workshop on Robotic and Sensors Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROSE.2009.5355984","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A multi-modal gesture recognition system in a Human-Robot Interaction scenario
Recognition of non-verbal gestures is essential for robots to understand a user's state and intention in a Human-Robot Interaction (HRI) scenario. In this paper a multi-modal system is proposed to recognize a user's hand gestures and estimate body poses from the robot's viewpoint only. A range camera is employed to derive the depth data at a high frame rate. Depth data is useful for image segmentation, objects detection and localization in 3D spaces. A pair of stereo cameras is used to sense the user's head gestures and eye gaze direction, which provide useful information about the user's attention direction. Both hand shapes and hand trajectories are recognized. Full configurations of body poses are estimated using a model-based algorithm. Poses are tracked by a Particle Filter method, and refined by a gradient-based searching method in the neighborhood of the particles which have top largest weights.