K. Hoshino, Takuya Kasahara, Naoki Igo, Motomasa Tomida, T. Mukai, Kinji Nishi, Hajime Kotani
{"title":"Gesture-world environment technology for mobile manipulation","authors":"K. Hoshino, Takuya Kasahara, Naoki Igo, Motomasa Tomida, T. Mukai, Kinji Nishi, Hajime Kotani","doi":"10.1109/SII.2010.5708323","DOIUrl":null,"url":null,"abstract":"The aim of this paper is to propose the technology to allow people to control robots by means of everyday gestures without using sensors or controllers. The hand pose estimation we propose reduces the number of image features per data set to 64, which makes the construction of a large-scale database possible. This has also made it possible to estimate the 3D hand poses of unspecified users with individual differences without sacrificing estimation accuracy. Specifically, the system we propose involved the construction in advance of a large database comprising three elements: hand joint information including the wrist, low-order proportional information on the hand images to indicate the rough hand shape, and hand pose data comprised of 64 image features per data set. To estimate a hand pose, the system first performs coarse screening to select similar data sets from the database based on the three hand proportions of the input image, and then performed a detailed search to find the data set most similar to the input images based on 64 image features. Using subjects with varying hand poses, we performed joint angle estimation using our hand pose estimation system comprised of 750,000 hand pose data sets, achieving roughly the same average estimation error as our previous system, about 2 degrees. However, the standard deviation of the estimation error was smaller than in our previous system having roughly 30,000 data sets: down from 26.91 degrees to 14.57 degrees for the index finger PIP joint and from 15.77 degrees to 10.28 degrees for the thumb. We were thus able to confirm an improvement in estimation accuracy, even for unspecified users. Further, the processing speed, using a notebook PC of normal specifications and a compact high-speed camera, was about 80 fps or more, including image capture, hand pose estimation, and CG rendering and robot control of the estimation result.","PeriodicalId":334652,"journal":{"name":"2010 IEEE/SICE International Symposium on System Integration","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE/SICE International Symposium on System Integration","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SII.2010.5708323","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The aim of this paper is to propose the technology to allow people to control robots by means of everyday gestures without using sensors or controllers. The hand pose estimation we propose reduces the number of image features per data set to 64, which makes the construction of a large-scale database possible. This has also made it possible to estimate the 3D hand poses of unspecified users with individual differences without sacrificing estimation accuracy. Specifically, the system we propose involved the construction in advance of a large database comprising three elements: hand joint information including the wrist, low-order proportional information on the hand images to indicate the rough hand shape, and hand pose data comprised of 64 image features per data set. To estimate a hand pose, the system first performs coarse screening to select similar data sets from the database based on the three hand proportions of the input image, and then performed a detailed search to find the data set most similar to the input images based on 64 image features. Using subjects with varying hand poses, we performed joint angle estimation using our hand pose estimation system comprised of 750,000 hand pose data sets, achieving roughly the same average estimation error as our previous system, about 2 degrees. However, the standard deviation of the estimation error was smaller than in our previous system having roughly 30,000 data sets: down from 26.91 degrees to 14.57 degrees for the index finger PIP joint and from 15.77 degrees to 10.28 degrees for the thumb. We were thus able to confirm an improvement in estimation accuracy, even for unspecified users. Further, the processing speed, using a notebook PC of normal specifications and a compact high-speed camera, was about 80 fps or more, including image capture, hand pose estimation, and CG rendering and robot control of the estimation result.