Grasp Position Estimation from Depth Image Using Stacked Hourglass Network Structure

Keisuke Hamamoto, Huimin Lu, Yujie Li, Tohru Kamiya, Y. Nakatoh, S. Serikawa
{"title":"Grasp Position Estimation from Depth Image Using Stacked Hourglass Network Structure","authors":"Keisuke Hamamoto, Huimin Lu, Yujie Li, Tohru Kamiya, Y. Nakatoh, S. Serikawa","doi":"10.1109/COMPSAC54236.2022.00187","DOIUrl":null,"url":null,"abstract":"In recent years, robots have been used not only in factories. However, most robots currently used in such places can only perform the actions programmed to perform in a predefined space. For robots to become widespread in the future, not only in factories, distribution warehouses, and other places but also in homes and other environments where robots receive complex commands and their surroundings are constantly being updated, it is necessary to make robots intelligent. Therefore, this study proposed a deep learning grasp position estimation model using depth images to achieve intelligence in pick-and-place. This study used only depth images as the training data to build the deep learning model. Some previous studies have used RGB images and depth images. However, in this study, we used only depth images as training data because we expect the inference to be based on the object's shape, independent of the color information of the object. By performing inference based on the target object's shape, the deep learning model is expected to minimize the need for re-training when the target object package changes in the production line since it is not dependent on the RGB image. In this study, we propose a deep learning model that focuses on the stacked encoder-decoder structure of the Stacked Hourglass Network. We compared the proposed method with the baseline method in the same evaluation metrics and a real robot, which shows higher accuracy than other methods in previous studies.","PeriodicalId":330838,"journal":{"name":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC54236.2022.00187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, robots have been used not only in factories. However, most robots currently used in such places can only perform the actions programmed to perform in a predefined space. For robots to become widespread in the future, not only in factories, distribution warehouses, and other places but also in homes and other environments where robots receive complex commands and their surroundings are constantly being updated, it is necessary to make robots intelligent. Therefore, this study proposed a deep learning grasp position estimation model using depth images to achieve intelligence in pick-and-place. This study used only depth images as the training data to build the deep learning model. Some previous studies have used RGB images and depth images. However, in this study, we used only depth images as training data because we expect the inference to be based on the object's shape, independent of the color information of the object. By performing inference based on the target object's shape, the deep learning model is expected to minimize the need for re-training when the target object package changes in the production line since it is not dependent on the RGB image. In this study, we propose a deep learning model that focuses on the stacked encoder-decoder structure of the Stacked Hourglass Network. We compared the proposed method with the baseline method in the same evaluation metrics and a real robot, which shows higher accuracy than other methods in previous studies.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于堆叠沙漏网络结构的深度图像抓取位置估计
近年来,机器人不仅用于工厂。然而,目前在这些地方使用的大多数机器人只能在预定义的空间内执行程序规定的动作。为了使机器人在未来得到广泛应用,不仅在工厂、配送仓库等地方,而且在家庭和其他环境中,机器人接受复杂的命令,并且它们的周围环境不断更新,因此有必要使机器人智能化。因此,本研究提出了一种基于深度图像的深度学习抓取位置估计模型,以实现智能拾取。本研究仅使用深度图像作为训练数据来构建深度学习模型。之前的一些研究使用了RGB图像和深度图像。然而,在本研究中,我们只使用深度图像作为训练数据,因为我们希望推理基于物体的形状,独立于物体的颜色信息。通过根据目标物体的形状进行推理,深度学习模型有望在生产线中目标物体包发生变化时最大限度地减少重新训练的需要,因为它不依赖于RGB图像。在这项研究中,我们提出了一个深度学习模型,重点关注堆叠沙漏网络的堆叠编码器-解码器结构。我们将所提出的方法与基线方法在相同的评价指标下进行了比较,并对一个真实的机器人进行了比较,结果表明该方法比以往研究的其他方法具有更高的精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Category-Aware App Permission Recommendation based on Sparse Linear Model Early Detection of At-Risk Students in a Calculus Course Apple-YOLO: A Novel Mobile Terminal Detector Based on YOLOv5 for Early Apple Leaf Diseases A Safe Route Recommendation Method Based on Driver Characteristics from Telematics Data GSDNet: An Anti-interference Cochlea Segmentation Model Based on GAN
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1