交通场景中鲁棒多分辨率行人检测

2013 IEEE Conference on Computer Vision and Pattern Recognition Pub Date : 2013-06-23 DOI:10.1109/CVPR.2013.390

Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, S. Li

{"title":"交通场景中鲁棒多分辨率行人检测","authors":"Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, S. Li","doi":"10.1109/CVPR.2013.390","DOIUrl":null,"url":null,"abstract":"The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"3033-3040"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"187","resultStr":"{\"title\":\"Robust Multi-resolution Pedestrian Detection in Traffic Scenes\",\"authors\":\"Junjie Yan, Xucong Zhang, Zhen Lei, Shengcai Liao, S. Li\",\"doi\":\"10.1109/CVPR.2013.390\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).\",\"PeriodicalId\":6343,\"journal\":{\"name\":\"2013 IEEE Conference on Computer Vision and Pattern Recognition\",\"volume\":\"1 1\",\"pages\":\"3033-3040\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"187\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE Conference on Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR.2013.390\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Conference on Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2013.390","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 187

摘要

随着分辨率的降低，性能严重下降是当前行人检测技术的主要瓶颈。本文将不同分辨率下的行人检测视为不同但相关的问题，并提出了一个多任务模型来综合考虑它们的共性和差异性。该模型包含分辨率感知转换，将不同分辨率的行人映射到公共空间，在公共空间中构建共享检测器来区分行人和背景。在模型学习方面，我们提出了一种坐标下降方法来迭代学习分辨率感知变换和基于检测器的可变形部分模型。在交通场景中，车辆周围存在许多误报，因此，我们进一步根据行人-车辆关系建立上下文模型来抑制误报。即使在没有车辆注释的情况下，上下文模型也可以自动学习。在加州理工学院行人基准测试中，我们的方法将身高超过30像素的行人的平均失分率降低到60%，明显优于之前的先进技术(71%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Robust Multi-resolution Pedestrian Detection in Traffic Scenes

The serious performance decline with decreasing resolution is the major bottleneck for current pedestrian detection techniques. In this paper, we take pedestrian detection in different resolutions as different but related problems, and propose a Multi-Task model to jointly consider their commonness and differences. The model contains resolution aware transformations to map pedestrians in different resolutions to a common space, where a shared detector is constructed to distinguish pedestrians from background. For model learning, we present a coordinate descent procedure to learn the resolution aware transformations and deformable part model (DPM) based detector iteratively. In traffic scenes, there are many false positives located around vehicles, therefore, we further build a context model to suppress them according to the pedestrian-vehicle relationship. The context model can be learned automatically even when the vehicle annotations are not available. Our method reduces the mean miss rate to 60% for pedestrians taller than 30 pixels on the Caltech Pedestrian Benchmark, which noticeably outperforms previous state-of-the-art (71%).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE Conference on Computer Vision and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Segment-Tree Based Cost Aggregation for Stereo Matching Event Retrieval in Large Video Collections with Circulant Temporal Encoding Articulated and Restricted Motion Subspaces and Their Signatures Subspace Interpolation via Dictionary Learning for Unsupervised Domain Adaptation Learning Video Saliency from Human Gaze Using Candidate Selection