{"title":"通过联合学习局部和全局计数进行端到端人群计数","authors":"C. Shang, H. Ai, Bo Bai","doi":"10.1109/ICIP.2016.7532551","DOIUrl":null,"url":null,"abstract":"Crowd counting is a very challenging task in crowded scenes due to heavy occlusions, appearance variations and perspective distortions. Current crowd counting methods typically operate on an image patch level with overlaps, then sum over the patches to get the final count. In this paper, we propose an end-to-end convolutional neural network (CNN) architecture that takes a whole image as its input and directly outputs the counting result. While making use of sharing computations over overlapping regions, our method takes advantages of contextual information when predicting both local and global count. In particular, we first feed the image to a pre-trained CNN to get a set of high level features. Then the features are mapped to local counting numbers using recurrent network layers with memory cells. We perform the experiments on several challenging crowd counting datasets, which achieve the state-of-the-art results and demonstrate the effectiveness of our method.","PeriodicalId":6521,"journal":{"name":"2016 IEEE International Conference on Image Processing (ICIP)","volume":"14 1","pages":"1215-1219"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"169","resultStr":"{\"title\":\"End-to-end crowd counting via joint learning local and global count\",\"authors\":\"C. Shang, H. Ai, Bo Bai\",\"doi\":\"10.1109/ICIP.2016.7532551\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Crowd counting is a very challenging task in crowded scenes due to heavy occlusions, appearance variations and perspective distortions. Current crowd counting methods typically operate on an image patch level with overlaps, then sum over the patches to get the final count. In this paper, we propose an end-to-end convolutional neural network (CNN) architecture that takes a whole image as its input and directly outputs the counting result. While making use of sharing computations over overlapping regions, our method takes advantages of contextual information when predicting both local and global count. In particular, we first feed the image to a pre-trained CNN to get a set of high level features. Then the features are mapped to local counting numbers using recurrent network layers with memory cells. We perform the experiments on several challenging crowd counting datasets, which achieve the state-of-the-art results and demonstrate the effectiveness of our method.\",\"PeriodicalId\":6521,\"journal\":{\"name\":\"2016 IEEE International Conference on Image Processing (ICIP)\",\"volume\":\"14 1\",\"pages\":\"1215-1219\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"169\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE International Conference on Image Processing (ICIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIP.2016.7532551\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Image Processing (ICIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIP.2016.7532551","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
End-to-end crowd counting via joint learning local and global count
Crowd counting is a very challenging task in crowded scenes due to heavy occlusions, appearance variations and perspective distortions. Current crowd counting methods typically operate on an image patch level with overlaps, then sum over the patches to get the final count. In this paper, we propose an end-to-end convolutional neural network (CNN) architecture that takes a whole image as its input and directly outputs the counting result. While making use of sharing computations over overlapping regions, our method takes advantages of contextual information when predicting both local and global count. In particular, we first feed the image to a pre-trained CNN to get a set of high level features. Then the features are mapped to local counting numbers using recurrent network layers with memory cells. We perform the experiments on several challenging crowd counting datasets, which achieve the state-of-the-art results and demonstrate the effectiveness of our method.