{"title":"自上而下的深度框架,实现可通用的多视角行人检测","authors":"","doi":"10.1016/j.neucom.2024.128458","DOIUrl":null,"url":null,"abstract":"<div><p>Multiple cameras have been frequently used to detect heavily occluded pedestrians. The state-of-the-art methods, for deep multi-view pedestrian detection, usually project the feature maps, extracted from multiple views, to the ground plane through homographies for information fusion. However, this bottom-up approach can easily overfit the camera locations and orientations in a training dataset, which leads to a weak generalisation performance and compromises its real-world applications. To address this problem, a deep top-down framework TMVD is proposed, in which the feature maps within the rectangular boxes, sitting at each cell of the discretized ground plane and of the average pedestrians’ size, in the multiple views are weighted and embedded in a top view. They are used to infer the locations of pedestrians by using a convolutional neural network. The proposed method significantly improves the generalisation performance when compared with the benchmark methods for deep multi-view pedestrian detection. Meanwhile, it also significantly outperforms the other top-down methods.</p></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A deep top-down framework towards generalisable multi-view pedestrian detection\",\"authors\":\"\",\"doi\":\"10.1016/j.neucom.2024.128458\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Multiple cameras have been frequently used to detect heavily occluded pedestrians. The state-of-the-art methods, for deep multi-view pedestrian detection, usually project the feature maps, extracted from multiple views, to the ground plane through homographies for information fusion. However, this bottom-up approach can easily overfit the camera locations and orientations in a training dataset, which leads to a weak generalisation performance and compromises its real-world applications. To address this problem, a deep top-down framework TMVD is proposed, in which the feature maps within the rectangular boxes, sitting at each cell of the discretized ground plane and of the average pedestrians’ size, in the multiple views are weighted and embedded in a top view. They are used to infer the locations of pedestrians by using a convolutional neural network. The proposed method significantly improves the generalisation performance when compared with the benchmark methods for deep multi-view pedestrian detection. Meanwhile, it also significantly outperforms the other top-down methods.</p></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231224012293\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224012293","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A deep top-down framework towards generalisable multi-view pedestrian detection
Multiple cameras have been frequently used to detect heavily occluded pedestrians. The state-of-the-art methods, for deep multi-view pedestrian detection, usually project the feature maps, extracted from multiple views, to the ground plane through homographies for information fusion. However, this bottom-up approach can easily overfit the camera locations and orientations in a training dataset, which leads to a weak generalisation performance and compromises its real-world applications. To address this problem, a deep top-down framework TMVD is proposed, in which the feature maps within the rectangular boxes, sitting at each cell of the discretized ground plane and of the average pedestrians’ size, in the multiple views are weighted and embedded in a top view. They are used to infer the locations of pedestrians by using a convolutional neural network. The proposed method significantly improves the generalisation performance when compared with the benchmark methods for deep multi-view pedestrian detection. Meanwhile, it also significantly outperforms the other top-down methods.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.