Alireza Makki, Alireza Hadi, Bahram Tarvirdizadeh, M. Teimouri
{"title":"POPDNet: Primitive Object Pose Detection Network Based on Voxel Data with Three Cartesian Channels","authors":"Alireza Makki, Alireza Hadi, Bahram Tarvirdizadeh, M. Teimouri","doi":"10.1109/ICSPIS54653.2021.9729364","DOIUrl":null,"url":null,"abstract":"In this article, the vision problem in a robotic application is under focus to handle the grasping of objects based on a new method. Converting an object into primitive objects is assumed to be done in the first step of the vision scenario. The second step, which is the main contribution of this paper, is classifying a primitive object and determining its position, orientation, and dimensions. In this way, the voxel data with three Cartesian channels of a primitive object is considered the input of a convolutional neural network that extracts the required parameters. A virtual camera in the simulation tool (Gazebo) is used to prepare the necessary dataset for training the neural network. Although the use of voxel data with Cartesian channels increases the volume of input data and slows down the processing speed, it is shown in this study that it effectively improves the accuracy of the network in estimating the parameters of primitive objects. Based on the provided virtual dataset, the mean errors when using Cartesian channels are decreased 81%, −33%, and 53% for the position, orientation, and dimensions, respectively, compared to binary voxel data. In the same comparison, these errors are −7%, 80%, and 55% lower than RGB data.","PeriodicalId":286966,"journal":{"name":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","volume":"187 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Signal Processing and Intelligent Systems (ICSPIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPIS54653.2021.9729364","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this article, the vision problem in a robotic application is under focus to handle the grasping of objects based on a new method. Converting an object into primitive objects is assumed to be done in the first step of the vision scenario. The second step, which is the main contribution of this paper, is classifying a primitive object and determining its position, orientation, and dimensions. In this way, the voxel data with three Cartesian channels of a primitive object is considered the input of a convolutional neural network that extracts the required parameters. A virtual camera in the simulation tool (Gazebo) is used to prepare the necessary dataset for training the neural network. Although the use of voxel data with Cartesian channels increases the volume of input data and slows down the processing speed, it is shown in this study that it effectively improves the accuracy of the network in estimating the parameters of primitive objects. Based on the provided virtual dataset, the mean errors when using Cartesian channels are decreased 81%, −33%, and 53% for the position, orientation, and dimensions, respectively, compared to binary voxel data. In the same comparison, these errors are −7%, 80%, and 55% lower than RGB data.