自上而下的深度框架，实现可通用的多视角行人检测

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2024-11-28 Epub Date: 2024-08-22 DOI:10.1016/j.neucom.2024.128458

Rui Qiu , Ming Xu , Yuchen Ling , Jeremy S. Smith , Yuyao Yan , Xinheng Wang

{"title":"自上而下的深度框架，实现可通用的多视角行人检测","authors":"Rui Qiu , Ming Xu , Yuchen Ling , Jeremy S. Smith , Yuyao Yan , Xinheng Wang","doi":"10.1016/j.neucom.2024.128458","DOIUrl":null,"url":null,"abstract":"<div><p>Multiple cameras have been frequently used to detect heavily occluded pedestrians. The state-of-the-art methods, for deep multi-view pedestrian detection, usually project the feature maps, extracted from multiple views, to the ground plane through homographies for information fusion. However, this bottom-up approach can easily overfit the camera locations and orientations in a training dataset, which leads to a weak generalisation performance and compromises its real-world applications. To address this problem, a deep top-down framework TMVD is proposed, in which the feature maps within the rectangular boxes, sitting at each cell of the discretized ground plane and of the average pedestrians’ size, in the multiple views are weighted and embedded in a top view. They are used to infer the locations of pedestrians by using a convolutional neural network. The proposed method significantly improves the generalisation performance when compared with the benchmark methods for deep multi-view pedestrian detection. Meanwhile, it also significantly outperforms the other top-down methods.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"607 ","pages":"Article 128458"},"PeriodicalIF":6.5000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A deep top-down framework towards generalisable multi-view pedestrian detection\",\"authors\":\"Rui Qiu , Ming Xu , Yuchen Ling , Jeremy S. Smith , Yuyao Yan , Xinheng Wang\",\"doi\":\"10.1016/j.neucom.2024.128458\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Multiple cameras have been frequently used to detect heavily occluded pedestrians. The state-of-the-art methods, for deep multi-view pedestrian detection, usually project the feature maps, extracted from multiple views, to the ground plane through homographies for information fusion. However, this bottom-up approach can easily overfit the camera locations and orientations in a training dataset, which leads to a weak generalisation performance and compromises its real-world applications. To address this problem, a deep top-down framework TMVD is proposed, in which the feature maps within the rectangular boxes, sitting at each cell of the discretized ground plane and of the average pedestrians’ size, in the multiple views are weighted and embedded in a top view. They are used to infer the locations of pedestrians by using a convolutional neural network. The proposed method significantly improves the generalisation performance when compared with the benchmark methods for deep multi-view pedestrian detection. Meanwhile, it also significantly outperforms the other top-down methods.</p></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"607 \",\"pages\":\"Article 128458\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2024-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231224012293\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/8/22 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224012293","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/22 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

多摄像头经常被用于检测严重遮挡的行人。最先进的多视角行人深度检测方法通常是将从多个视角提取的特征图通过同构法投射到地平面，进行信息融合。然而，这种自下而上的方法很容易过度拟合训练数据集中的相机位置和方向，从而导致泛化性能较弱，影响其在现实世界中的应用。为了解决这个问题，我们提出了一种自上而下的深度框架 TMVD，在这个框架中，多个视图中离散化地面平面每个单元的矩形框内的特征图以及行人的平均尺寸都会被加权并嵌入到顶视图中。利用卷积神经网络推断行人的位置。与深度多视图行人检测的基准方法相比，所提出的方法大大提高了泛化性能。同时，它的性能也明显优于其他自上而下的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A deep top-down framework towards generalisable multi-view pedestrian detection

Multiple cameras have been frequently used to detect heavily occluded pedestrians. The state-of-the-art methods, for deep multi-view pedestrian detection, usually project the feature maps, extracted from multiple views, to the ground plane through homographies for information fusion. However, this bottom-up approach can easily overfit the camera locations and orientations in a training dataset, which leads to a weak generalisation performance and compromises its real-world applications. To address this problem, a deep top-down framework TMVD is proposed, in which the feature maps within the rectangular boxes, sitting at each cell of the discretized ground plane and of the average pedestrians’ size, in the multiple views are weighted and embedded in a top view. They are used to infer the locations of pedestrians by using a convolutional neural network. The proposed method significantly improves the generalisation performance when compared with the benchmark methods for deep multi-view pedestrian detection. Meanwhile, it also significantly outperforms the other top-down methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.