MSYOLOF: Multi-input-single-output encoder network with tripartite feature enhancement for object detection

Gong Cheng, Xi Yong, Xin Lyu, Tao Zeng, Xinyu Wang, Jiale Chen, Xin Li
{"title":"MSYOLOF: Multi-input-single-output encoder network with tripartite feature enhancement for object detection","authors":"Gong Cheng, Xi Yong, Xin Lyu, Tao Zeng, Xinyu Wang, Jiale Chen, Xin Li","doi":"10.1145/3609703.3609710","DOIUrl":null,"url":null,"abstract":"Object detection under one-level feature is a challenging task, which requires that object representations at different scales can be extracted on a single feature map. However, existing object detectors using a one-level feature suffer from inadequate of different-scale object representations resulting in low accuracy for multi-scale object detection, especially for smaller objects. To address the problem above-mentioned, a novel object detector named MSYOLOF, is proposed to construct an effective single feature map for detecting objects of different scales. In the proposed network, three modules are proposed to bring considerable improvements, namely Feature Pyramid Pooling (FPP), Feature Perception Enhancement (FPE), and Dual Branch Receptive Field (DBRF). Firstly, the FPP module aggregates contextual information from various regions to improve the network's ability to achieve global information, which strengthens the model's understanding of the overall scene. Then, the FPE module utilizes coordinate attention to construct a residual block to obtain orientation-aware and position-sensitive information, making the network efficient in accurately locating and identifying objects of interest. Third, by rethinking the Dilated Encoder of YOLOF, the DBRF module reduces information loss and mitigates the problem of being sensitive only to large objects when dilated convolution utilizes large expansion rates. Extensive experiments are conducted on COCO benchmark to validate the effectiveness of our network, which exhibits superior performance compared to other state-of-the-art networks.","PeriodicalId":101485,"journal":{"name":"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609703.3609710","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Object detection under one-level feature is a challenging task, which requires that object representations at different scales can be extracted on a single feature map. However, existing object detectors using a one-level feature suffer from inadequate of different-scale object representations resulting in low accuracy for multi-scale object detection, especially for smaller objects. To address the problem above-mentioned, a novel object detector named MSYOLOF, is proposed to construct an effective single feature map for detecting objects of different scales. In the proposed network, three modules are proposed to bring considerable improvements, namely Feature Pyramid Pooling (FPP), Feature Perception Enhancement (FPE), and Dual Branch Receptive Field (DBRF). Firstly, the FPP module aggregates contextual information from various regions to improve the network's ability to achieve global information, which strengthens the model's understanding of the overall scene. Then, the FPE module utilizes coordinate attention to construct a residual block to obtain orientation-aware and position-sensitive information, making the network efficient in accurately locating and identifying objects of interest. Third, by rethinking the Dilated Encoder of YOLOF, the DBRF module reduces information loss and mitigates the problem of being sensitive only to large objects when dilated convolution utilizes large expansion rates. Extensive experiments are conducted on COCO benchmark to validate the effectiveness of our network, which exhibits superior performance compared to other state-of-the-art networks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MSYOLOF:多输入-单输出编码器网络与三方特征增强的目标检测
单级特征下的目标检测是一项具有挑战性的任务,它要求在单个特征图上提取不同尺度的目标表示。然而,现有的单级特征目标检测器由于缺乏不同尺度的目标表示,导致多尺度目标检测精度低,尤其是对较小的目标检测精度低。为了解决上述问题,提出了一种新的目标检测器MSYOLOF,用于构建有效的单特征映射来检测不同尺度的目标。在该网络中,提出了三个模块,即特征金字塔池(FPP)、特征感知增强(FPE)和双分支接受野(DBRF),带来了相当大的改进。首先,FPP模块聚合来自各个区域的上下文信息,提高网络获取全局信息的能力,增强模型对整体场景的理解。然后,FPE模块利用坐标注意构造残差块,获得方向感知和位置敏感信息,使网络能够高效准确地定位和识别感兴趣的目标。第三,通过重新考虑YOLOF的扩展编码器,DBRF模块减少了信息丢失,并缓解了当扩展卷积使用大扩展速率时仅对大对象敏感的问题。在COCO基准上进行了大量的实验,以验证我们的网络的有效性,与其他最先进的网络相比,它表现出优越的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Identification-Dissemination-Warning: Algorithm and Prediction of Early Warning of Network Public Opinion Exploration of transfer learning capability of multilingual models for text classification Reconstructing 3D Shapes as an Union of Boxes from Multi-View Images LLFormer: An Efficient and Real-time LiDAR Lane Detection Method based on Transformer Survey of the Formal Verification of Operating Systems in Power Monitoring System
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1