{"title":"BiF-DETR:Remote sensing object detection based on Bidirectional information fusion","authors":"Zhijing Xu, Chao Wang, Kan Huang","doi":"10.1016/j.displa.2024.102802","DOIUrl":null,"url":null,"abstract":"<div><p>Remote Sensing Object Detection(RSOD) is a fundamental task in the field of remote sensing image processing. The complexity of the background, the diversity of object scales and the locality limitation of Convolutional Neural Network (CNN) present specific challenges for RSOD. In this paper, an innovative hybrid detector, Bidirectional Information Fusion DEtection TRansformer (BiF-DETR), is proposed to mitigate the above issues. Specifically, BiF-DETR takes anchor-free detection network, CenterNet, as the baseline, designs the feature extraction backbone in parallel, extracts the local feature details using CNNs, and obtains the global information and long-term dependencies using Transformer branch. A Bidirectional Information Fusion (BIF) module is elaborately designed to reduce the semantic differences between different styles of feature maps through multi-level iterative information interactions, fully utilizing the complementary advantages of different detectors. Additionally, Coordination Attention(CA), is introduced to enables the detection network to focus on the saliency information of small objects. To address diversity insufficiency of remote sensing images in the training stage, Cascade Mixture Data Augmentation (CMDA), is designed to improve the robustness and generalization ability of the model. Comparative experiments with other cutting-edge methods are conducted on the publicly available DOTA and NWPU VHR-10 datasets. The experimental results reveal that the performance of proposed method is state-of-the-art, with <em>m</em>AP reaching 77.43% and 94.75%, respectively, far exceeding the other 25 competitive methods.</p></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"84 ","pages":"Article 102802"},"PeriodicalIF":3.7000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141938224001665/pdfft?md5=e3ed1b94823f012220f1a30a72ed7985&pid=1-s2.0-S0141938224001665-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938224001665","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Remote Sensing Object Detection(RSOD) is a fundamental task in the field of remote sensing image processing. The complexity of the background, the diversity of object scales and the locality limitation of Convolutional Neural Network (CNN) present specific challenges for RSOD. In this paper, an innovative hybrid detector, Bidirectional Information Fusion DEtection TRansformer (BiF-DETR), is proposed to mitigate the above issues. Specifically, BiF-DETR takes anchor-free detection network, CenterNet, as the baseline, designs the feature extraction backbone in parallel, extracts the local feature details using CNNs, and obtains the global information and long-term dependencies using Transformer branch. A Bidirectional Information Fusion (BIF) module is elaborately designed to reduce the semantic differences between different styles of feature maps through multi-level iterative information interactions, fully utilizing the complementary advantages of different detectors. Additionally, Coordination Attention(CA), is introduced to enables the detection network to focus on the saliency information of small objects. To address diversity insufficiency of remote sensing images in the training stage, Cascade Mixture Data Augmentation (CMDA), is designed to improve the robustness and generalization ability of the model. Comparative experiments with other cutting-edge methods are conducted on the publicly available DOTA and NWPU VHR-10 datasets. The experimental results reveal that the performance of proposed method is state-of-the-art, with mAP reaching 77.43% and 94.75%, respectively, far exceeding the other 25 competitive methods.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.