{"title":"Research on object classification based on visual-tactile fusion","authors":"Peng Zhang, Lu Bai, Dongri Shan","doi":"10.1117/12.2682381","DOIUrl":null,"url":null,"abstract":"As two modes of direct contact between robots and external environment, visual and tactile play a critical role in improving robot perception ability. In the real environment, it is difficult for the robot to achieve high accuracy when classifying objects only by a single mode (visual or tactile). In order to improve the classification accuracy of robots, a novel visual-tactile fusion method is proposed in this paper. Firstly, the ResNet18 is selected as the backbone network to extract visual features. To improve the accuracy of object localization and recognition in the visual network, the Position-Channel Attention Mechanism (PCAM) block is added after conv3 and conv4 of ResNet18. Then, the four-layer one-dimensional convolutional neural network is used to extract tactile features, and the extracted tactile features are fused with visual features at the feature layer. Finally, the experimental results demonstrate that compared with the existing methods, on the self-made dataset VHAC-52, the proposed method has improved the AUC and ACC by 1.60% and 1.47%, respectively.","PeriodicalId":440430,"journal":{"name":"International Conference on Electronic Technology and Information Science","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Electronic Technology and Information Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2682381","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As two modes of direct contact between robots and external environment, visual and tactile play a critical role in improving robot perception ability. In the real environment, it is difficult for the robot to achieve high accuracy when classifying objects only by a single mode (visual or tactile). In order to improve the classification accuracy of robots, a novel visual-tactile fusion method is proposed in this paper. Firstly, the ResNet18 is selected as the backbone network to extract visual features. To improve the accuracy of object localization and recognition in the visual network, the Position-Channel Attention Mechanism (PCAM) block is added after conv3 and conv4 of ResNet18. Then, the four-layer one-dimensional convolutional neural network is used to extract tactile features, and the extracted tactile features are fused with visual features at the feature layer. Finally, the experimental results demonstrate that compared with the existing methods, on the self-made dataset VHAC-52, the proposed method has improved the AUC and ACC by 1.60% and 1.47%, respectively.