Libra-SOD: Balanced label assignment for small object detection

IF 7.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Knowledge-Based Systems Pub Date : 2024-08-08 DOI:10.1016/j.knosys.2024.112353
Zhuangzhuang Zhou, Yingying Zhu
{"title":"Libra-SOD: Balanced label assignment for small object detection","authors":"Zhuangzhuang Zhou,&nbsp;Yingying Zhu","doi":"10.1016/j.knosys.2024.112353","DOIUrl":null,"url":null,"abstract":"<div><p>Small object detection (SOD) is one of the notoriously challenging tasks in the computer vision community. Due to instances occupying fairly small regions and having limited overlap with priors (anchors or points), strict label assignment based on pre-defined IoU thresholds usually results in a lack of sufficient training samples for small objects. Despite center sampling or IoU statistic-based label assignment strategies mitigate imbalanced label assignment results, they struggle to deliver consistent gains for small, medium and large objects simultaneously. In this paper, we propose a novel model with a balanced label assignment (BLA) strategy for SOD in complex scenes, called Libra-SOD. First, the BLA is proposed, which considers both classification confidence and localization quality in the assignment process, and assigns the same number of positive samples to each Ground Truth. Second, to cooperate with BLA closely, we introduce a task-aware head, which makes the assignment results more reliable by interweaving classification and regression tasks. Finally, the task-aware loss is designed to dynamically assign weight factors and labels during supervised predictions, allowing the framework to focus more on valuable samples. Extensive experiments are performed on four challenging datasets. In DIOR (object DetectIon in Optical Remote sensing image), Libra-SOD achieves a state-of-the-art performance of 73.7 mAP with ResNet-50 as the backbone. To the best of our knowledge, Libra-SOD is the first single-stage framework that performs over 30 AP on SODA-D (Small Object Detection dAtasets).</p></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124009870","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Small object detection (SOD) is one of the notoriously challenging tasks in the computer vision community. Due to instances occupying fairly small regions and having limited overlap with priors (anchors or points), strict label assignment based on pre-defined IoU thresholds usually results in a lack of sufficient training samples for small objects. Despite center sampling or IoU statistic-based label assignment strategies mitigate imbalanced label assignment results, they struggle to deliver consistent gains for small, medium and large objects simultaneously. In this paper, we propose a novel model with a balanced label assignment (BLA) strategy for SOD in complex scenes, called Libra-SOD. First, the BLA is proposed, which considers both classification confidence and localization quality in the assignment process, and assigns the same number of positive samples to each Ground Truth. Second, to cooperate with BLA closely, we introduce a task-aware head, which makes the assignment results more reliable by interweaving classification and regression tasks. Finally, the task-aware loss is designed to dynamically assign weight factors and labels during supervised predictions, allowing the framework to focus more on valuable samples. Extensive experiments are performed on four challenging datasets. In DIOR (object DetectIon in Optical Remote sensing image), Libra-SOD achieves a state-of-the-art performance of 73.7 mAP with ResNet-50 as the backbone. To the best of our knowledge, Libra-SOD is the first single-stage framework that performs over 30 AP on SODA-D (Small Object Detection dAtasets).

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Libra-SOD:用于小物体检测的均衡标签分配
小物体检测(SOD)是计算机视觉领域声名狼藉的挑战性任务之一。由于小物体占据的区域相当小,与先验物(锚或点)的重叠有限,因此基于预定义 IoU 阈值的严格标签分配通常会导致小物体缺乏足够的训练样本。尽管基于中心采样或 IoU 统计的标签分配策略可以缓解不平衡的标签分配结果,但它们很难同时为大、中、小型对象带来一致的收益。在本文中,我们针对复杂场景中的 SOD 提出了一种具有均衡标签分配(BLA)策略的新型模型,称为 Libra-SOD。首先,我们提出了 BLA,它在分配过程中同时考虑了分类可信度和定位质量,并为每个地面实况分配相同数量的正样本。其次,为了与 BLA 紧密配合,我们引入了任务感知头(task-aware head),通过交织分类和回归任务,使分配结果更加可靠。最后,我们设计了任务感知损失,以便在监督预测过程中动态分配权重因子和标签,从而使该框架能更多地关注有价值的样本。我们在四个具有挑战性的数据集上进行了广泛的实验。在 DIOR(光学遥感图像中的物体检测)中,Libra-SOD 以 ResNet-50 为骨干,取得了 73.7 mAP 的一流性能。据我们所知,Libra-SOD 是首个在 SODA-D(小物体检测数据集)上性能超过 30 AP 的单级框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
期刊最新文献
Local Metric NER: A new paradigm for named entity recognition from a multi-label perspective CRATI: Contrastive representation-based multimodal sound event localization and detection ALDANER: Active Learning based Data Augmentation for Named Entity Recognition Robust deadline-aware network function parallelization framework under demand uncertainty PMCN: Parallax-motion collaboration network for stereo video dehazing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1