Grasp Position Estimation from Depth Image Using Stacked Hourglass Network Structure

2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC) Pub Date : 2022-06-01 DOI:10.1109/COMPSAC54236.2022.00187

Keisuke Hamamoto, Huimin Lu, Yujie Li, Tohru Kamiya, Y. Nakatoh, S. Serikawa

{"title":"Grasp Position Estimation from Depth Image Using Stacked Hourglass Network Structure","authors":"Keisuke Hamamoto, Huimin Lu, Yujie Li, Tohru Kamiya, Y. Nakatoh, S. Serikawa","doi":"10.1109/COMPSAC54236.2022.00187","DOIUrl":null,"url":null,"abstract":"In recent years, robots have been used not only in factories. However, most robots currently used in such places can only perform the actions programmed to perform in a predefined space. For robots to become widespread in the future, not only in factories, distribution warehouses, and other places but also in homes and other environments where robots receive complex commands and their surroundings are constantly being updated, it is necessary to make robots intelligent. Therefore, this study proposed a deep learning grasp position estimation model using depth images to achieve intelligence in pick-and-place. This study used only depth images as the training data to build the deep learning model. Some previous studies have used RGB images and depth images. However, in this study, we used only depth images as training data because we expect the inference to be based on the object's shape, independent of the color information of the object. By performing inference based on the target object's shape, the deep learning model is expected to minimize the need for re-training when the target object package changes in the production line since it is not dependent on the RGB image. In this study, we propose a deep learning model that focuses on the stacked encoder-decoder structure of the Stacked Hourglass Network. We compared the proposed method with the baseline method in the same evaluation metrics and a real robot, which shows higher accuracy than other methods in previous studies.","PeriodicalId":330838,"journal":{"name":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC54236.2022.00187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, robots have been used not only in factories. However, most robots currently used in such places can only perform the actions programmed to perform in a predefined space. For robots to become widespread in the future, not only in factories, distribution warehouses, and other places but also in homes and other environments where robots receive complex commands and their surroundings are constantly being updated, it is necessary to make robots intelligent. Therefore, this study proposed a deep learning grasp position estimation model using depth images to achieve intelligence in pick-and-place. This study used only depth images as the training data to build the deep learning model. Some previous studies have used RGB images and depth images. However, in this study, we used only depth images as training data because we expect the inference to be based on the object's shape, independent of the color information of the object. By performing inference based on the target object's shape, the deep learning model is expected to minimize the need for re-training when the target object package changes in the production line since it is not dependent on the RGB image. In this study, we propose a deep learning model that focuses on the stacked encoder-decoder structure of the Stacked Hourglass Network. We compared the proposed method with the baseline method in the same evaluation metrics and a real robot, which shows higher accuracy than other methods in previous studies.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于堆叠沙漏网络结构的深度图像抓取位置估计

近年来，机器人不仅用于工厂。然而，目前在这些地方使用的大多数机器人只能在预定义的空间内执行程序规定的动作。为了使机器人在未来得到广泛应用，不仅在工厂、配送仓库等地方，而且在家庭和其他环境中，机器人接受复杂的命令，并且它们的周围环境不断更新，因此有必要使机器人智能化。因此，本研究提出了一种基于深度图像的深度学习抓取位置估计模型，以实现智能拾取。本研究仅使用深度图像作为训练数据来构建深度学习模型。之前的一些研究使用了RGB图像和深度图像。然而，在本研究中，我们只使用深度图像作为训练数据，因为我们希望推理基于物体的形状，独立于物体的颜色信息。通过根据目标物体的形状进行推理，深度学习模型有望在生产线中目标物体包发生变化时最大限度地减少重新训练的需要，因为它不依赖于RGB图像。在这项研究中，我们提出了一个深度学习模型，重点关注堆叠沙漏网络的堆叠编码器-解码器结构。我们将所提出的方法与基线方法在相同的评价指标下进行了比较，并对一个真实的机器人进行了比较，结果表明该方法比以往研究的其他方法具有更高的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC)

自引率

0.00%

发文量