{"title":"A Lightweight Fusion Strategy With Enhanced Interlayer Feature Correlation for Small Object Detection","authors":"Yao Xiao;Tingfa Xu;Xin Yu;Yuqiang Fang;Jianan Li","doi":"10.1109/TGRS.2024.3457155","DOIUrl":null,"url":null,"abstract":"Detecting small objects in drone imagery is challenging due to low resolution and background blending, leading to limited feature information. Multiscale feature fusion can enhance detection by capturing information at different scales, but traditional strategies fall short. Simple concatenation or addition operations do not fully utilize multiscale fusion advantages, resulting in insufficient correlation between features. This inadequacy hinders the detection of small objects, especially in complex backgrounds and densely populated areas. To address this issue and efficiently utilize the limited computational resources, we propose a lightweight fusion strategy based on enhanced interlayer feature correlation (EFC) to replace the traditional feature fusion strategy in feature pyramid network (FPN). The semantic expressions of different layers in the feature pyramid are inconsistent. In EFC, the grouped feature focus unit (GFF) enhances the feature correlation of each layer by focusing on the contextual information of different features. The multilevel feature reconstruction module (MFR) effectively reconstructs and transforms the strength and weakness information of each layer in the pyramid to reduce redundant feature fusion and retain more information about small targets in deep networks. It is noteworthy that the proposed method is plug-and-play and can be widely applied to various base networks. Extensive experiments and comprehensive evaluations on VisDrone, unmanned aerial vehicle benchmark object detection and tracking (UAVDT), and microsoft common objects in context (COCO) demonstrate the effectiveness. Using generalized focal loss (GFL) as the baseline on the VisDrone dataset with a large number of small targets, the proposed method improves the detection mean average precision (mAP) by 1.7%, surpassing many lightweight state-of-the-art methods and significantly reducing the Params and GFLOPs at the neck end. The code will be available at \n<uri>https://github.com/nuliweixiao/EFC.git</uri>\n.","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":null,"pages":null},"PeriodicalIF":7.5000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10671587/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Detecting small objects in drone imagery is challenging due to low resolution and background blending, leading to limited feature information. Multiscale feature fusion can enhance detection by capturing information at different scales, but traditional strategies fall short. Simple concatenation or addition operations do not fully utilize multiscale fusion advantages, resulting in insufficient correlation between features. This inadequacy hinders the detection of small objects, especially in complex backgrounds and densely populated areas. To address this issue and efficiently utilize the limited computational resources, we propose a lightweight fusion strategy based on enhanced interlayer feature correlation (EFC) to replace the traditional feature fusion strategy in feature pyramid network (FPN). The semantic expressions of different layers in the feature pyramid are inconsistent. In EFC, the grouped feature focus unit (GFF) enhances the feature correlation of each layer by focusing on the contextual information of different features. The multilevel feature reconstruction module (MFR) effectively reconstructs and transforms the strength and weakness information of each layer in the pyramid to reduce redundant feature fusion and retain more information about small targets in deep networks. It is noteworthy that the proposed method is plug-and-play and can be widely applied to various base networks. Extensive experiments and comprehensive evaluations on VisDrone, unmanned aerial vehicle benchmark object detection and tracking (UAVDT), and microsoft common objects in context (COCO) demonstrate the effectiveness. Using generalized focal loss (GFL) as the baseline on the VisDrone dataset with a large number of small targets, the proposed method improves the detection mean average precision (mAP) by 1.7%, surpassing many lightweight state-of-the-art methods and significantly reducing the Params and GFLOPs at the neck end. The code will be available at
https://github.com/nuliweixiao/EFC.git
.
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.