突出的基于知识的对象检测

Xueyuan Zhang, Chunzhe Wang, Han Du, Li Quan, Jin Shi, Yirong Ma
{"title":"突出的基于知识的对象检测","authors":"Xueyuan Zhang, Chunzhe Wang, Han Du, Li Quan, Jin Shi, Yirong Ma","doi":"10.1109/ICCR55715.2022.10053899","DOIUrl":null,"url":null,"abstract":"Human use their visual systems to perceive the interest objects in the images and videos with the past experience including shapes, textures, spatial knowledge and other subconscious information. In this paper, we develop an end-to-end object detection framework, combining with salient knowledge of objects. Firstly, we use the convolutional neural networks(CNNs) to extract the multi-scales feature maps representing the normal knowledge of objects in the images and videos. Then, the candidate feature map is selected from the extracted feature maps to encode the salient knowledge of objects using the mathematical strategy, and the new feature map is generated using the candidate feature map and the salient knowledge of objects. Thirdly, we use the feature map combining with salient knowledge and other feature maps at different scales to identify and localize the objects in the images and videos. The results show that our proposed approach can achieve better performance than other attention-based object detectors on PASCAL VOC 2007 and PASCAL VOC 2012, and this indicates the predicted results of our approach have a good consistency with the object's perception of human brains. At the same time, our approach can process 43 frames per second on the device NVIDIA GTX1080, and is more practical from the efficiency of running time.","PeriodicalId":441511,"journal":{"name":"2022 4th International Conference on Control and Robotics (ICCR)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Salient Knowledge-Based Object Detection\",\"authors\":\"Xueyuan Zhang, Chunzhe Wang, Han Du, Li Quan, Jin Shi, Yirong Ma\",\"doi\":\"10.1109/ICCR55715.2022.10053899\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human use their visual systems to perceive the interest objects in the images and videos with the past experience including shapes, textures, spatial knowledge and other subconscious information. In this paper, we develop an end-to-end object detection framework, combining with salient knowledge of objects. Firstly, we use the convolutional neural networks(CNNs) to extract the multi-scales feature maps representing the normal knowledge of objects in the images and videos. Then, the candidate feature map is selected from the extracted feature maps to encode the salient knowledge of objects using the mathematical strategy, and the new feature map is generated using the candidate feature map and the salient knowledge of objects. Thirdly, we use the feature map combining with salient knowledge and other feature maps at different scales to identify and localize the objects in the images and videos. The results show that our proposed approach can achieve better performance than other attention-based object detectors on PASCAL VOC 2007 and PASCAL VOC 2012, and this indicates the predicted results of our approach have a good consistency with the object's perception of human brains. At the same time, our approach can process 43 frames per second on the device NVIDIA GTX1080, and is more practical from the efficiency of running time.\",\"PeriodicalId\":441511,\"journal\":{\"name\":\"2022 4th International Conference on Control and Robotics (ICCR)\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 4th International Conference on Control and Robotics (ICCR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCR55715.2022.10053899\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Control and Robotics (ICCR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCR55715.2022.10053899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

人类的视觉系统利用过去的经验,包括形状、纹理、空间知识等潜意识信息,感知图像和视频中感兴趣的物体。在本文中,我们开发了一个端到端的目标检测框架,并结合了目标的显著性知识。首先,利用卷积神经网络(cnn)提取图像和视频中代表物体正常知识的多尺度特征图;然后,从提取的特征图中选择候选特征图,利用数学策略对目标显著性知识进行编码,利用候选特征图和目标显著性知识生成新的特征图。第三,结合显著性知识和其他不同尺度的特征图对图像和视频中的目标进行识别和定位。结果表明,我们的方法在PASCAL VOC 2007和PASCAL VOC 2012上取得了比其他基于注意力的目标检测器更好的性能,这表明我们的方法的预测结果与人类大脑对目标的感知有很好的一致性。同时,我们的方法可以在NVIDIA GTX1080设备上每秒处理43帧,从运行时间的效率上更加实用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Salient Knowledge-Based Object Detection
Human use their visual systems to perceive the interest objects in the images and videos with the past experience including shapes, textures, spatial knowledge and other subconscious information. In this paper, we develop an end-to-end object detection framework, combining with salient knowledge of objects. Firstly, we use the convolutional neural networks(CNNs) to extract the multi-scales feature maps representing the normal knowledge of objects in the images and videos. Then, the candidate feature map is selected from the extracted feature maps to encode the salient knowledge of objects using the mathematical strategy, and the new feature map is generated using the candidate feature map and the salient knowledge of objects. Thirdly, we use the feature map combining with salient knowledge and other feature maps at different scales to identify and localize the objects in the images and videos. The results show that our proposed approach can achieve better performance than other attention-based object detectors on PASCAL VOC 2007 and PASCAL VOC 2012, and this indicates the predicted results of our approach have a good consistency with the object's perception of human brains. At the same time, our approach can process 43 frames per second on the device NVIDIA GTX1080, and is more practical from the efficiency of running time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Mobile Humanoid Robot Control through Object Movement Imagery Optimization of Two-end Access Platform Automated Warehouse Storage Allocation Long-Tailed Object Mining Based on CLIP Model for Autonomous Driving Node Deployment and Energy Saving Optimization Method for Wireless Sensor Networks Based on Q-learning Off-policy Q-learning-based Tracking Control for Stochastic Linear Discrete-Time Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1