FPDIoU Loss: A loss function for efficient bounding box regression of rotated object detection

IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Image and Vision Computing Pub Date : 2025-02-01 Epub Date: 2024-12-10 DOI:10.1016/j.imavis.2024.105381
Siliang Ma , Yong Xu
{"title":"FPDIoU Loss: A loss function for efficient bounding box regression of rotated object detection","authors":"Siliang Ma ,&nbsp;Yong Xu","doi":"10.1016/j.imavis.2024.105381","DOIUrl":null,"url":null,"abstract":"<div><div>Bounding box regression is one of the important steps of object detection. However, rotation detectors often involve a more complicated loss based on SkewIoU which is unfriendly to gradient-based training. Most of the existing loss functions for rotated object detection calculate the difference between two bounding boxes only focus on the deviation of area or each points distance (e.g., <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>S</mi><mi>m</mi><mi>o</mi><mi>o</mi><mi>t</mi><mi>h</mi><mo>−</mo><mi>L</mi><mn>1</mn></mrow></msub></math></span>, <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>R</mi><mi>o</mi><mi>t</mi><mi>a</mi><mi>t</mi><mi>e</mi><mi>d</mi><mi>I</mi><mi>o</mi><mi>U</mi></mrow></msub></math></span> and <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>P</mi><mi>I</mi><mi>o</mi><mi>U</mi></mrow></msub></math></span>). The calculation process of some loss functions is extremely complex (e.g. <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>K</mi><mi>F</mi><mi>I</mi><mi>o</mi><mi>U</mi></mrow></msub></math></span>). In order to improve the efficiency and accuracy of bounding box regression for rotated object detection, we proposed a novel metric for arbitrary shapes comparison based on minimum points distance, which takes most of the factors from existing loss functions for rotated object detection into account, i.e., the overlap or nonoverlapping area, the central points distance and the rotation angle. We also proposed a loss function called <span><math><msub><mrow><mi>L</mi></mrow><mrow><mi>F</mi><mi>P</mi><mi>D</mi><mi>I</mi><mi>o</mi><mi>U</mi></mrow></msub></math></span> based on four points distance for accurate bounding box regression focusing on faster and high quality anchor boxes. In the experiments, <span><math><mrow><mi>F</mi><mi>P</mi><mi>D</mi><mi>I</mi><mi>o</mi><mi>U</mi></mrow></math></span> loss has been applied to state-of-the-art rotated object detection (e.g., RTMDET, H2RBox) models training with three popular benchmarks of rotated object detection including DOTA, DIOR, HRSC2016 and two benchmarks of arbitrary orientation scene text detection including ICDAR 2017 RRC-MLT and ICDAR 2019 RRC-MLT, which achieves better performance than existing loss functions. The code is available at <span><span>https://github.com/JacksonMa618/FPDIoU</span><svg><path></path></svg></span></div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105381"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004864","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/10 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Bounding box regression is one of the important steps of object detection. However, rotation detectors often involve a more complicated loss based on SkewIoU which is unfriendly to gradient-based training. Most of the existing loss functions for rotated object detection calculate the difference between two bounding boxes only focus on the deviation of area or each points distance (e.g., LSmoothL1, LRotatedIoU and LPIoU). The calculation process of some loss functions is extremely complex (e.g. LKFIoU). In order to improve the efficiency and accuracy of bounding box regression for rotated object detection, we proposed a novel metric for arbitrary shapes comparison based on minimum points distance, which takes most of the factors from existing loss functions for rotated object detection into account, i.e., the overlap or nonoverlapping area, the central points distance and the rotation angle. We also proposed a loss function called LFPDIoU based on four points distance for accurate bounding box regression focusing on faster and high quality anchor boxes. In the experiments, FPDIoU loss has been applied to state-of-the-art rotated object detection (e.g., RTMDET, H2RBox) models training with three popular benchmarks of rotated object detection including DOTA, DIOR, HRSC2016 and two benchmarks of arbitrary orientation scene text detection including ICDAR 2017 RRC-MLT and ICDAR 2019 RRC-MLT, which achieves better performance than existing loss functions. The code is available at https://github.com/JacksonMa618/FPDIoU
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FPDIoU Loss:用于旋转目标检测的有效边界盒回归的损失函数
边界盒回归是目标检测的重要步骤之一。然而,旋转检测器通常涉及基于SkewIoU的更复杂的损失,这对基于梯度的训练不友好。现有用于旋转目标检测的损失函数大多只计算两个边界框之间的差,只关注面积的偏差或每个点的距离(如LSmooth−L1、LRotatedIoU和LPIoU)。有些损失函数的计算过程极其复杂(如LKFIoU)。为了提高边界盒回归旋转目标检测的效率和精度,提出了一种基于最小点距离的任意形状比较度量,该度量考虑了现有旋转目标检测损失函数中的大部分因素,即重叠或不重叠区域、中心点距离和旋转角度。我们还提出了一种基于四点距离的损失函数LFPDIoU,用于快速、高质量的锚盒精确边界盒回归。在实验中,将FPDIoU损失应用于最先进的旋转目标检测(如RTMDET, H2RBox)模型,该模型使用了三个流行的旋转目标检测基准(DOTA, DIOR, HRSC2016)和两个任意方向场景文本检测基准(ICDAR 2017 RRC-MLT和ICDAR 2019 RRC-MLT)进行训练,取得了比现有损失函数更好的性能。代码可在https://github.com/JacksonMa618/FPDIoU上获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Image and Vision Computing
Image and Vision Computing 工程技术-工程:电子与电气
CiteScore
8.50
自引率
8.50%
发文量
143
审稿时长
7.8 months
期刊介绍: Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.
期刊最新文献
TABNet: A Triplet Augmentation Self-recovery framework with Boundary-aware Pseudo-labels for scribble-based medical image segmentation HBMF-YOLO: Target detection in harsh environments based on a hybrid backbone network and multi-feature fusion Enhancing biometric transparency through skeletal feature learning in chest X-rays: A triplet network approach with Explainable AI All you need for object detection: From pixels, points, and prompts to Next-Gen fusion and multimodal LLMs/VLMs in autonomous vehicles Bidirectional causal learning for visual question answering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1