利用局部计数增强头部优势，提高人群中的人类检测能力

IF 7.9 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS IEEE Transactions on Automation Science and Engineering Pub Date : 2024-11-05 DOI:10.1109/TASE.2024.3488856

Shoudong Han;Huilin Ding;Zhiling Han;Heng Li

{"title":"利用局部计数增强头部优势，提高人群中的人类检测能力","authors":"Shoudong Han;Huilin Ding;Zhiling Han;Heng Li","doi":"10.1109/TASE.2024.3488856","DOIUrl":null,"url":null,"abstract":"In crowded scenes, it is difficult to extract discriminating human features due to occlusion. Some human detectors have improved this issue by introducing head detection. However, a complex problem still exists in associating full-body detection with its corresponding head detection. Instead of learning the association, we propose a Head-dominant Enhancement Module (HDEM) that uses the full-body proposal to regress the head bounding box. To embed the head information into the target human feature, we further propose a Consistent Weighted (CW) loss. Additionally, existing Non-Maximum Suppression (NMS) algorithms do not consider density changes with the selection of detection boxes, which leads to false and missed detection. Similar to human visual habits in occluded scenarios, we propose a Count-aware Dynamic Threshold Module (CADTM) that utilizes head context information to predict local count, which is associated with crowd density. CADTM can solve the inherent defects of Greedy-NMS in crowded scenes by adjusting the Intersection over Union (IoU) threshold dynamically. Ultimately, through the combination of HDEM and CADTM, we achieve state-of-the-art performance on CrowdHuman with a small computational cost. Our method achieves 4.6% AP gains, 2.2% MR-2 gains, and 3.2% JI gains over a Cascade R-CNN baseline. Furthermore, the proposed method is flexible and can be used with most proposal-based detection frameworks and various IoU-based NMS. Note to Practitioners—The motivation for this study arises from a prevalent issue encountered in human detection applications, particularly in densely populated areas such as shopping malls, streets, and subway stations, where occlusion poses a significant challenge. Employing a generic object detector results in numerous missed detections, thereby significantly compromising the overall performance of the detector. This research proposes a convenient plugin that achieves significant performance improvements at a very small cost. To validate its efficacy, our proposed method is thoroughly evaluated on proposal-based detection frameworks. The experimental results demonstrate the robustness of our approach and its ability to adapt to diverse crowded scenarios.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"8794-8804"},"PeriodicalIF":7.9000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Head-Dominant Enhancement With Local Count for Better Human Detection in Crowds\",\"authors\":\"Shoudong Han;Huilin Ding;Zhiling Han;Heng Li\",\"doi\":\"10.1109/TASE.2024.3488856\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In crowded scenes, it is difficult to extract discriminating human features due to occlusion. Some human detectors have improved this issue by introducing head detection. However, a complex problem still exists in associating full-body detection with its corresponding head detection. Instead of learning the association, we propose a Head-dominant Enhancement Module (HDEM) that uses the full-body proposal to regress the head bounding box. To embed the head information into the target human feature, we further propose a Consistent Weighted (CW) loss. Additionally, existing Non-Maximum Suppression (NMS) algorithms do not consider density changes with the selection of detection boxes, which leads to false and missed detection. Similar to human visual habits in occluded scenarios, we propose a Count-aware Dynamic Threshold Module (CADTM) that utilizes head context information to predict local count, which is associated with crowd density. CADTM can solve the inherent defects of Greedy-NMS in crowded scenes by adjusting the Intersection over Union (IoU) threshold dynamically. Ultimately, through the combination of HDEM and CADTM, we achieve state-of-the-art performance on CrowdHuman with a small computational cost. Our method achieves 4.6% AP gains, 2.2% MR-2 gains, and 3.2% JI gains over a Cascade R-CNN baseline. Furthermore, the proposed method is flexible and can be used with most proposal-based detection frameworks and various IoU-based NMS. Note to Practitioners—The motivation for this study arises from a prevalent issue encountered in human detection applications, particularly in densely populated areas such as shopping malls, streets, and subway stations, where occlusion poses a significant challenge. Employing a generic object detector results in numerous missed detections, thereby significantly compromising the overall performance of the detector. This research proposes a convenient plugin that achieves significant performance improvements at a very small cost. To validate its efficacy, our proposed method is thoroughly evaluated on proposal-based detection frameworks. The experimental results demonstrate the robustness of our approach and its ability to adapt to diverse crowded scenarios.\",\"PeriodicalId\":51060,\"journal\":{\"name\":\"IEEE Transactions on Automation Science and Engineering\",\"volume\":\"22 \",\"pages\":\"8794-8804\"},\"PeriodicalIF\":7.9000,\"publicationDate\":\"2024-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Automation Science and Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10745131/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745131/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

在拥挤的场景中，由于遮挡的存在，很难提取出有区别的人体特征。一些人体探测器通过引入头部检测改善了这个问题。然而，将全身检测与相应的头部检测相关联仍然存在一个复杂的问题。而不是学习关联，我们提出了一个头部优势增强模块（HDEM），它使用全身建议来回归头部边界框。为了将头部信息嵌入到目标人体特征中，我们进一步提出了一致加权（CW）损失。此外，现有的非最大抑制（Non-Maximum Suppression， NMS）算法没有考虑密度随检测盒选择的变化，导致误检和漏检。与人类在闭塞情况下的视觉习惯类似，我们提出了一个计数感知动态阈值模块（CADTM），该模块利用头部上下文信息来预测与人群密度相关的局部计数。CADTM通过动态调整IoU （Intersection over Union）阈值，解决了Greedy-NMS在拥挤场景下的固有缺陷。最终，通过HDEM和CADTM的结合，我们以很小的计算成本在CrowdHuman上实现了最先进的性能。我们的方法在级联R-CNN基线上实现了4.6%的AP增益，2.2%的MR-2增益和3.2%的JI增益。此外，该方法具有灵活性，可用于大多数基于提议的检测框架和各种基于ou的NMS。从业人员注意事项：本研究的动机源于人类检测应用中遇到的一个普遍问题，特别是在购物中心、街道和地铁站等人口密集的地区，在这些地方遮挡构成了重大挑战。使用通用对象检测器会导致大量的遗漏检测，从而显著影响检测器的整体性能。本研究提出了一个方便的插件，以非常小的成本实现显著的性能改进。为了验证其有效性，我们提出的方法在基于提议的检测框架上进行了全面的评估。实验结果证明了该方法的鲁棒性及其适应各种拥挤场景的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Head-Dominant Enhancement With Local Count for Better Human Detection in Crowds

In crowded scenes, it is difficult to extract discriminating human features due to occlusion. Some human detectors have improved this issue by introducing head detection. However, a complex problem still exists in associating full-body detection with its corresponding head detection. Instead of learning the association, we propose a Head-dominant Enhancement Module (HDEM) that uses the full-body proposal to regress the head bounding box. To embed the head information into the target human feature, we further propose a Consistent Weighted (CW) loss. Additionally, existing Non-Maximum Suppression (NMS) algorithms do not consider density changes with the selection of detection boxes, which leads to false and missed detection. Similar to human visual habits in occluded scenarios, we propose a Count-aware Dynamic Threshold Module (CADTM) that utilizes head context information to predict local count, which is associated with crowd density. CADTM can solve the inherent defects of Greedy-NMS in crowded scenes by adjusting the Intersection over Union (IoU) threshold dynamically. Ultimately, through the combination of HDEM and CADTM, we achieve state-of-the-art performance on CrowdHuman with a small computational cost. Our method achieves 4.6% AP gains, 2.2% MR-2 gains, and 3.2% JI gains over a Cascade R-CNN baseline. Furthermore, the proposed method is flexible and can be used with most proposal-based detection frameworks and various IoU-based NMS. Note to Practitioners—The motivation for this study arises from a prevalent issue encountered in human detection applications, particularly in densely populated areas such as shopping malls, streets, and subway stations, where occlusion poses a significant challenge. Employing a generic object detector results in numerous missed detections, thereby significantly compromising the overall performance of the detector. This research proposes a convenient plugin that achieves significant performance improvements at a very small cost. To validate its efficacy, our proposed method is thoroughly evaluated on proposal-based detection frameworks. The experimental results demonstrate the robustness of our approach and its ability to adapt to diverse crowded scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.