突出的基于知识的对象检测

2022 4th International Conference on Control and Robotics (ICCR) Pub Date : 2022-12-02 DOI:10.1109/ICCR55715.2022.10053899

Xueyuan Zhang, Chunzhe Wang, Han Du, Li Quan, Jin Shi, Yirong Ma

{"title":"突出的基于知识的对象检测","authors":"Xueyuan Zhang, Chunzhe Wang, Han Du, Li Quan, Jin Shi, Yirong Ma","doi":"10.1109/ICCR55715.2022.10053899","DOIUrl":null,"url":null,"abstract":"Human use their visual systems to perceive the interest objects in the images and videos with the past experience including shapes, textures, spatial knowledge and other subconscious information. In this paper, we develop an end-to-end object detection framework, combining with salient knowledge of objects. Firstly, we use the convolutional neural networks(CNNs) to extract the multi-scales feature maps representing the normal knowledge of objects in the images and videos. Then, the candidate feature map is selected from the extracted feature maps to encode the salient knowledge of objects using the mathematical strategy, and the new feature map is generated using the candidate feature map and the salient knowledge of objects. Thirdly, we use the feature map combining with salient knowledge and other feature maps at different scales to identify and localize the objects in the images and videos. The results show that our proposed approach can achieve better performance than other attention-based object detectors on PASCAL VOC 2007 and PASCAL VOC 2012, and this indicates the predicted results of our approach have a good consistency with the object's perception of human brains. At the same time, our approach can process 43 frames per second on the device NVIDIA GTX1080, and is more practical from the efficiency of running time.","PeriodicalId":441511,"journal":{"name":"2022 4th International Conference on Control and Robotics (ICCR)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Salient Knowledge-Based Object Detection\",\"authors\":\"Xueyuan Zhang, Chunzhe Wang, Han Du, Li Quan, Jin Shi, Yirong Ma\",\"doi\":\"10.1109/ICCR55715.2022.10053899\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human use their visual systems to perceive the interest objects in the images and videos with the past experience including shapes, textures, spatial knowledge and other subconscious information. In this paper, we develop an end-to-end object detection framework, combining with salient knowledge of objects. Firstly, we use the convolutional neural networks(CNNs) to extract the multi-scales feature maps representing the normal knowledge of objects in the images and videos. Then, the candidate feature map is selected from the extracted feature maps to encode the salient knowledge of objects using the mathematical strategy, and the new feature map is generated using the candidate feature map and the salient knowledge of objects. Thirdly, we use the feature map combining with salient knowledge and other feature maps at different scales to identify and localize the objects in the images and videos. The results show that our proposed approach can achieve better performance than other attention-based object detectors on PASCAL VOC 2007 and PASCAL VOC 2012, and this indicates the predicted results of our approach have a good consistency with the object's perception of human brains. At the same time, our approach can process 43 frames per second on the device NVIDIA GTX1080, and is more practical from the efficiency of running time.\",\"PeriodicalId\":441511,\"journal\":{\"name\":\"2022 4th International Conference on Control and Robotics (ICCR)\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 4th International Conference on Control and Robotics (ICCR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCR55715.2022.10053899\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Control and Robotics (ICCR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCR55715.2022.10053899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

人类的视觉系统利用过去的经验，包括形状、纹理、空间知识等潜意识信息，感知图像和视频中感兴趣的物体。在本文中，我们开发了一个端到端的目标检测框架，并结合了目标的显著性知识。首先，利用卷积神经网络(cnn)提取图像和视频中代表物体正常知识的多尺度特征图;然后，从提取的特征图中选择候选特征图，利用数学策略对目标显著性知识进行编码，利用候选特征图和目标显著性知识生成新的特征图。第三，结合显著性知识和其他不同尺度的特征图对图像和视频中的目标进行识别和定位。结果表明，我们的方法在PASCAL VOC 2007和PASCAL VOC 2012上取得了比其他基于注意力的目标检测器更好的性能，这表明我们的方法的预测结果与人类大脑对目标的感知有很好的一致性。同时，我们的方法可以在NVIDIA GTX1080设备上每秒处理43帧，从运行时间的效率上更加实用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Salient Knowledge-Based Object Detection

Human use their visual systems to perceive the interest objects in the images and videos with the past experience including shapes, textures, spatial knowledge and other subconscious information. In this paper, we develop an end-to-end object detection framework, combining with salient knowledge of objects. Firstly, we use the convolutional neural networks(CNNs) to extract the multi-scales feature maps representing the normal knowledge of objects in the images and videos. Then, the candidate feature map is selected from the extracted feature maps to encode the salient knowledge of objects using the mathematical strategy, and the new feature map is generated using the candidate feature map and the salient knowledge of objects. Thirdly, we use the feature map combining with salient knowledge and other feature maps at different scales to identify and localize the objects in the images and videos. The results show that our proposed approach can achieve better performance than other attention-based object detectors on PASCAL VOC 2007 and PASCAL VOC 2012, and this indicates the predicted results of our approach have a good consistency with the object's perception of human brains. At the same time, our approach can process 43 frames per second on the device NVIDIA GTX1080, and is more practical from the efficiency of running time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 4th International Conference on Control and Robotics (ICCR)

自引率

0.00%

发文量