{"title":"Backbone Neural Network Design of Single Shot Detector from RGB-D Images for Object Detection","authors":"P. Sharma, Damian Valles","doi":"10.1109/UEMCON51285.2020.9298175","DOIUrl":null,"url":null,"abstract":"Recognition technology has gained state of art performance with the dawn of deep convolutional neural network and with these achievements in the field of computer vision, machine learning and 3D sensor, industries are near to start new era of the automation. However, object detection for robotic grasping in varying environment, low illumination, occlusion and partial images gives poor accuracy and speed to detect object. In this research, a multimodal architecture is designed to be used as a base network/ backbone network of Single Shot Detector (SSD). This architecture uses RGB and Depth images as an input and gives single output. Most of the researchers used VGG16/19, ResNet and MobileNet for detection purposes. In this paper, a new architecture is designed to perform a specific task of grasping. For classification using RGB-D architecture, it achieved an average accuracy of 95% with the learning rate of 0.0001 and outperforms the other architectures in accuracy for limited objects.","PeriodicalId":433609,"journal":{"name":"2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UEMCON51285.2020.9298175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Recognition technology has gained state of art performance with the dawn of deep convolutional neural network and with these achievements in the field of computer vision, machine learning and 3D sensor, industries are near to start new era of the automation. However, object detection for robotic grasping in varying environment, low illumination, occlusion and partial images gives poor accuracy and speed to detect object. In this research, a multimodal architecture is designed to be used as a base network/ backbone network of Single Shot Detector (SSD). This architecture uses RGB and Depth images as an input and gives single output. Most of the researchers used VGG16/19, ResNet and MobileNet for detection purposes. In this paper, a new architecture is designed to perform a specific task of grasping. For classification using RGB-D architecture, it achieved an average accuracy of 95% with the learning rate of 0.0001 and outperforms the other architectures in accuracy for limited objects.