BiF-DETR:Remote sensing object detection based on Bidirectional information fusion

IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Displays Pub Date : 2024-07-19 DOI:10.1016/j.displa.2024.102802
Zhijing Xu, Chao Wang, Kan Huang
{"title":"BiF-DETR:Remote sensing object detection based on Bidirectional information fusion","authors":"Zhijing Xu,&nbsp;Chao Wang,&nbsp;Kan Huang","doi":"10.1016/j.displa.2024.102802","DOIUrl":null,"url":null,"abstract":"<div><p>Remote Sensing Object Detection(RSOD) is a fundamental task in the field of remote sensing image processing. The complexity of the background, the diversity of object scales and the locality limitation of Convolutional Neural Network (CNN) present specific challenges for RSOD. In this paper, an innovative hybrid detector, Bidirectional Information Fusion DEtection TRansformer (BiF-DETR), is proposed to mitigate the above issues. Specifically, BiF-DETR takes anchor-free detection network, CenterNet, as the baseline, designs the feature extraction backbone in parallel, extracts the local feature details using CNNs, and obtains the global information and long-term dependencies using Transformer branch. A Bidirectional Information Fusion (BIF) module is elaborately designed to reduce the semantic differences between different styles of feature maps through multi-level iterative information interactions, fully utilizing the complementary advantages of different detectors. Additionally, Coordination Attention(CA), is introduced to enables the detection network to focus on the saliency information of small objects. To address diversity insufficiency of remote sensing images in the training stage, Cascade Mixture Data Augmentation (CMDA), is designed to improve the robustness and generalization ability of the model. Comparative experiments with other cutting-edge methods are conducted on the publicly available DOTA and NWPU VHR-10 datasets. The experimental results reveal that the performance of proposed method is state-of-the-art, with <em>m</em>AP reaching 77.43% and 94.75%, respectively, far exceeding the other 25 competitive methods.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102802"},"PeriodicalIF":3.7000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141938224001665/pdfft?md5=e3ed1b94823f012220f1a30a72ed7985&pid=1-s2.0-S0141938224001665-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938224001665","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Remote Sensing Object Detection(RSOD) is a fundamental task in the field of remote sensing image processing. The complexity of the background, the diversity of object scales and the locality limitation of Convolutional Neural Network (CNN) present specific challenges for RSOD. In this paper, an innovative hybrid detector, Bidirectional Information Fusion DEtection TRansformer (BiF-DETR), is proposed to mitigate the above issues. Specifically, BiF-DETR takes anchor-free detection network, CenterNet, as the baseline, designs the feature extraction backbone in parallel, extracts the local feature details using CNNs, and obtains the global information and long-term dependencies using Transformer branch. A Bidirectional Information Fusion (BIF) module is elaborately designed to reduce the semantic differences between different styles of feature maps through multi-level iterative information interactions, fully utilizing the complementary advantages of different detectors. Additionally, Coordination Attention(CA), is introduced to enables the detection network to focus on the saliency information of small objects. To address diversity insufficiency of remote sensing images in the training stage, Cascade Mixture Data Augmentation (CMDA), is designed to improve the robustness and generalization ability of the model. Comparative experiments with other cutting-edge methods are conducted on the publicly available DOTA and NWPU VHR-10 datasets. The experimental results reveal that the performance of proposed method is state-of-the-art, with mAP reaching 77.43% and 94.75%, respectively, far exceeding the other 25 competitive methods.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
BiF-DETR:基于双向信息融合的遥感物体探测
遥感物体检测(RSOD)是遥感图像处理领域的一项基本任务。背景的复杂性、物体尺度的多样性以及卷积神经网络(CNN)的定位限制,都给遥感物体检测带来了特殊的挑战。本文提出了一种创新的混合检测器--双向信息融合检测转换器(BiF-DETR),以缓解上述问题。具体来说,BiF-DETR 以无锚检测网络 CenterNet 为基线,并行设计特征提取骨干网,使用 CNN 提取局部特征细节,并使用 Transformer 分支获取全局信息和长期依赖关系。精心设计的双向信息融合(Bidirectional Information Fusion,BIF)模块通过多层次的迭代信息交互,充分利用不同检测器的互补优势,减少不同风格特征图之间的语义差异。此外,还引入了协调注意力(CA),使检测网络能够关注小物体的显著性信息。为解决训练阶段遥感图像多样性不足的问题,设计了级联混合数据增强(CMDA),以提高模型的鲁棒性和泛化能力。在公开的 DOTA 和 NWPU VHR-10 数据集上进行了与其他前沿方法的对比实验。实验结果表明,所提方法的性能达到了最先进水平,mAP 分别达到了 77.43% 和 94.75%,远远超过了其他 25 种竞争方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Displays
Displays 工程技术-工程:电子与电气
CiteScore
4.60
自引率
25.60%
发文量
138
审稿时长
92 days
期刊介绍: Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.
期刊最新文献
Mambav3d: A mamba-based virtual 3D module stringing semantic information between layers of medical image slices Luminance decomposition and Transformer based no-reference tone-mapped image quality assessment GLDBF: Global and local dual-branch fusion network for no-reference point cloud quality assessment Virtual reality in medical education: Effectiveness of Immersive Virtual Anatomy Laboratory (IVAL) compared to traditional learning approaches Weighted ensemble deep learning approach for classification of gastrointestinal diseases in colonoscopy images aided by explainable AI
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1