BiF-DETR:Remote sensing object detection based on Bidirectional information fusion

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Displays Pub Date : 2024-07-19 DOI:10.1016/j.displa.2024.102802

Zhijing Xu, Chao Wang, Kan Huang

{"title":"BiF-DETR:Remote sensing object detection based on Bidirectional information fusion","authors":"Zhijing Xu, Chao Wang, Kan Huang","doi":"10.1016/j.displa.2024.102802","DOIUrl":null,"url":null,"abstract":"<div><p>Remote Sensing Object Detection(RSOD) is a fundamental task in the field of remote sensing image processing. The complexity of the background, the diversity of object scales and the locality limitation of Convolutional Neural Network (CNN) present specific challenges for RSOD. In this paper, an innovative hybrid detector, Bidirectional Information Fusion DEtection TRansformer (BiF-DETR), is proposed to mitigate the above issues. Specifically, BiF-DETR takes anchor-free detection network, CenterNet, as the baseline, designs the feature extraction backbone in parallel, extracts the local feature details using CNNs, and obtains the global information and long-term dependencies using Transformer branch. A Bidirectional Information Fusion (BIF) module is elaborately designed to reduce the semantic differences between different styles of feature maps through multi-level iterative information interactions, fully utilizing the complementary advantages of different detectors. Additionally, Coordination Attention(CA), is introduced to enables the detection network to focus on the saliency information of small objects. To address diversity insufficiency of remote sensing images in the training stage, Cascade Mixture Data Augmentation (CMDA), is designed to improve the robustness and generalization ability of the model. Comparative experiments with other cutting-edge methods are conducted on the publicly available DOTA and NWPU VHR-10 datasets. The experimental results reveal that the performance of proposed method is state-of-the-art, with <em>m</em>AP reaching 77.43% and 94.75%, respectively, far exceeding the other 25 competitive methods.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102802"},"PeriodicalIF":3.7000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141938224001665/pdfft?md5=e3ed1b94823f012220f1a30a72ed7985&pid=1-s2.0-S0141938224001665-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938224001665","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Remote Sensing Object Detection(RSOD) is a fundamental task in the field of remote sensing image processing. The complexity of the background, the diversity of object scales and the locality limitation of Convolutional Neural Network (CNN) present specific challenges for RSOD. In this paper, an innovative hybrid detector, Bidirectional Information Fusion DEtection TRansformer (BiF-DETR), is proposed to mitigate the above issues. Specifically, BiF-DETR takes anchor-free detection network, CenterNet, as the baseline, designs the feature extraction backbone in parallel, extracts the local feature details using CNNs, and obtains the global information and long-term dependencies using Transformer branch. A Bidirectional Information Fusion (BIF) module is elaborately designed to reduce the semantic differences between different styles of feature maps through multi-level iterative information interactions, fully utilizing the complementary advantages of different detectors. Additionally, Coordination Attention(CA), is introduced to enables the detection network to focus on the saliency information of small objects. To address diversity insufficiency of remote sensing images in the training stage, Cascade Mixture Data Augmentation (CMDA), is designed to improve the robustness and generalization ability of the model. Comparative experiments with other cutting-edge methods are conducted on the publicly available DOTA and NWPU VHR-10 datasets. The experimental results reveal that the performance of proposed method is state-of-the-art, with mAP reaching 77.43% and 94.75%, respectively, far exceeding the other 25 competitive methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

BiF-DETR：基于双向信息融合的遥感物体探测

遥感物体检测（RSOD）是遥感图像处理领域的一项基本任务。背景的复杂性、物体尺度的多样性以及卷积神经网络（CNN）的定位限制，都给遥感物体检测带来了特殊的挑战。本文提出了一种创新的混合检测器--双向信息融合检测转换器（BiF-DETR），以缓解上述问题。具体来说，BiF-DETR 以无锚检测网络 CenterNet 为基线，并行设计特征提取骨干网，使用 CNN 提取局部特征细节，并使用 Transformer 分支获取全局信息和长期依赖关系。精心设计的双向信息融合（Bidirectional Information Fusion，BIF）模块通过多层次的迭代信息交互，充分利用不同检测器的互补优势，减少不同风格特征图之间的语义差异。此外，还引入了协调注意力（CA），使检测网络能够关注小物体的显著性信息。为解决训练阶段遥感图像多样性不足的问题，设计了级联混合数据增强（CMDA），以提高模型的鲁棒性和泛化能力。在公开的 DOTA 和 NWPU VHR-10 数据集上进行了与其他前沿方法的对比实验。实验结果表明，所提方法的性能达到了最先进水平，mAP 分别达到了 77.43% 和 94.75%，远远超过了其他 25 种竞争方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.