R. Wagner, Daniel E. Crispell, Patrick Feeney, J. Mundy
{"title":"4-D Scene Alignment in Surveillance Video","authors":"R. Wagner, Daniel E. Crispell, Patrick Feeney, J. Mundy","doi":"10.1109/AIPR47015.2019.9174582","DOIUrl":null,"url":null,"abstract":"Designing robust activity detectors for fixed camera surveillance video requires knowledge of the 3-D scene. This paper presents an automatic camera calibration process that provides a mechanism to reason about the spatial proximity between objects at different times. It combines a CNN-based camera pose estimator with a vertical scale provided by pedestrian observations to establish the 4-D scene geometry. Unlike some previous methods, the people do not need to be tracked nor do the head and feet need to be explicitly detected. It is robust to individual height variations and camera parameter estimation errors.","PeriodicalId":167075,"journal":{"name":"2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"267 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIPR47015.2019.9174582","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Designing robust activity detectors for fixed camera surveillance video requires knowledge of the 3-D scene. This paper presents an automatic camera calibration process that provides a mechanism to reason about the spatial proximity between objects at different times. It combines a CNN-based camera pose estimator with a vertical scale provided by pedestrian observations to establish the 4-D scene geometry. Unlike some previous methods, the people do not need to be tracked nor do the head and feet need to be explicitly detected. It is robust to individual height variations and camera parameter estimation errors.