Qunpo Liu, Yi Zhao, Ruxin Gao, Xuhui Bu, Naohiko Hanajima
{"title":"SpanEffiDet:用于物体检测的跨尺度和跨路径特征融合","authors":"Qunpo Liu, Yi Zhao, Ruxin Gao, Xuhui Bu, Naohiko Hanajima","doi":"10.1007/s11063-024-11653-6","DOIUrl":null,"url":null,"abstract":"<p>Lower versions of EfficientDet (such as D0, D1) have smaller network structures and parameter sizes, but lower detection accuracy. Higher versions exhibit higher accuracy, but the increase in network complexity poses challenges for real-time processing and hardware requirements. To meet the higher accuracy requirements under limited computational resources, this paper introduces SpanEffiDet based on the channel adaptive frequency filter (CAFF) and the Span-Path Bidirectional Feature Pyramid structure. Firstly, the CAFF module proposed in this paper realizes the frequency domain transformation of channel information through Fourier transform and effectively extracts the key features through semantic adaptive frequency filtering, thus, eliminating channel redundant information of EfficientNet. Simultaneously, the module has the ability to compute the weights across the channels and at fine granularity, and capture the detailed information of element features. Secondly, a two-way characteristic pyramid network multi-level cross-BIFPN, which can achieve multi-layer and multi-nodes, is proposed to build cross-level information transmission to incorporate both semantic and positional information of the target. This design enables the network to more effectively detect objects with significant size differences in complex environments. Finally, by introducing generalized focal Loss V2, reliable localization quality estimation scores are predicted from the distribution statistics of bounding boxes, thereby improving localization accuracy. The experimental results indicate that on the MS COCO dataset, SpanEffiDet-D0 achieved an AP improvement of 3.3% compared to the original EfficientDet series algorithms. Similarly, on the PASCAL VOC2007 and 2012 datasets, the mAP of SpanEffiDet-D0 is respectively 1.66 and 2.65% higher than that of EfficientDet-D0.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"98 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SpanEffiDet: Span-Scale and Span-Path Feature Fusion for Object Detection\",\"authors\":\"Qunpo Liu, Yi Zhao, Ruxin Gao, Xuhui Bu, Naohiko Hanajima\",\"doi\":\"10.1007/s11063-024-11653-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Lower versions of EfficientDet (such as D0, D1) have smaller network structures and parameter sizes, but lower detection accuracy. Higher versions exhibit higher accuracy, but the increase in network complexity poses challenges for real-time processing and hardware requirements. To meet the higher accuracy requirements under limited computational resources, this paper introduces SpanEffiDet based on the channel adaptive frequency filter (CAFF) and the Span-Path Bidirectional Feature Pyramid structure. Firstly, the CAFF module proposed in this paper realizes the frequency domain transformation of channel information through Fourier transform and effectively extracts the key features through semantic adaptive frequency filtering, thus, eliminating channel redundant information of EfficientNet. Simultaneously, the module has the ability to compute the weights across the channels and at fine granularity, and capture the detailed information of element features. Secondly, a two-way characteristic pyramid network multi-level cross-BIFPN, which can achieve multi-layer and multi-nodes, is proposed to build cross-level information transmission to incorporate both semantic and positional information of the target. This design enables the network to more effectively detect objects with significant size differences in complex environments. Finally, by introducing generalized focal Loss V2, reliable localization quality estimation scores are predicted from the distribution statistics of bounding boxes, thereby improving localization accuracy. The experimental results indicate that on the MS COCO dataset, SpanEffiDet-D0 achieved an AP improvement of 3.3% compared to the original EfficientDet series algorithms. Similarly, on the PASCAL VOC2007 and 2012 datasets, the mAP of SpanEffiDet-D0 is respectively 1.66 and 2.65% higher than that of EfficientDet-D0.</p>\",\"PeriodicalId\":51144,\"journal\":{\"name\":\"Neural Processing Letters\",\"volume\":\"98 1\",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Processing Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11063-024-11653-6\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Processing Letters","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11063-024-11653-6","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
SpanEffiDet: Span-Scale and Span-Path Feature Fusion for Object Detection
Lower versions of EfficientDet (such as D0, D1) have smaller network structures and parameter sizes, but lower detection accuracy. Higher versions exhibit higher accuracy, but the increase in network complexity poses challenges for real-time processing and hardware requirements. To meet the higher accuracy requirements under limited computational resources, this paper introduces SpanEffiDet based on the channel adaptive frequency filter (CAFF) and the Span-Path Bidirectional Feature Pyramid structure. Firstly, the CAFF module proposed in this paper realizes the frequency domain transformation of channel information through Fourier transform and effectively extracts the key features through semantic adaptive frequency filtering, thus, eliminating channel redundant information of EfficientNet. Simultaneously, the module has the ability to compute the weights across the channels and at fine granularity, and capture the detailed information of element features. Secondly, a two-way characteristic pyramid network multi-level cross-BIFPN, which can achieve multi-layer and multi-nodes, is proposed to build cross-level information transmission to incorporate both semantic and positional information of the target. This design enables the network to more effectively detect objects with significant size differences in complex environments. Finally, by introducing generalized focal Loss V2, reliable localization quality estimation scores are predicted from the distribution statistics of bounding boxes, thereby improving localization accuracy. The experimental results indicate that on the MS COCO dataset, SpanEffiDet-D0 achieved an AP improvement of 3.3% compared to the original EfficientDet series algorithms. Similarly, on the PASCAL VOC2007 and 2012 datasets, the mAP of SpanEffiDet-D0 is respectively 1.66 and 2.65% higher than that of EfficientDet-D0.
期刊介绍:
Neural Processing Letters is an international journal publishing research results and innovative ideas on all aspects of artificial neural networks. Coverage includes theoretical developments, biological models, new formal modes, learning, applications, software and hardware developments, and prospective researches.
The journal promotes fast exchange of information in the community of neural network researchers and users. The resurgence of interest in the field of artificial neural networks since the beginning of the 1980s is coupled to tremendous research activity in specialized or multidisciplinary groups. Research, however, is not possible without good communication between people and the exchange of information, especially in a field covering such different areas; fast communication is also a key aspect, and this is the reason for Neural Processing Letters