Xuehui Wu;Huanliang Xu;Henry Leung;Xiaobo Lu;Yanbin Li
{"title":"F2CENet: Single-Image Object Counting Based on Block Co-Saliency Density Map Estimation","authors":"Xuehui Wu;Huanliang Xu;Henry Leung;Xiaobo Lu;Yanbin Li","doi":"10.1109/TCSVT.2024.3449070","DOIUrl":null,"url":null,"abstract":"This paper presents a novel single-image object counting method based on block co-saliency density map estimation, called free-to-count everything network (F2CENet). Image block co-saliency attention is introduced to promote density estimation adaptation, allowing to input any image with arbitrary size for accurate counting using the learned model without requiring manually labeled few shots. The proposed network also outperforms existing crowd counting methods based on geometry-adaptive kernels in complex scenes. A novel module generates multilevel & scale block correlation maps to guide the co-saliency density map estimation. Co-saliency attention maps are then fused for accurately locating block-wise salient objects under guidance of the initial cues. Hence, accurate density maps are generated via comprehensive learning of internal relations in block co-salient features and progressive optimization of local details with saliency-oriented scene understanding. Results from extensive experiments on existing density map estimation datasets with arbitrary challenges verify the effectiveness of the proposed F2CENet and show that it outperforms various state-of-the-art few-shot and crowd counting methods. Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) are used as evaluation metrics to measure the accuracy which are commonly used metrics for counting task. The average predicted MAE and RMSE are 10.88% and 8.44% less compared with the state-of-the-art evaluated on dataset contains sufficiently large and diverse categories used for few-shot and crowd counting.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"13141-13151"},"PeriodicalIF":8.3000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10644047/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a novel single-image object counting method based on block co-saliency density map estimation, called free-to-count everything network (F2CENet). Image block co-saliency attention is introduced to promote density estimation adaptation, allowing to input any image with arbitrary size for accurate counting using the learned model without requiring manually labeled few shots. The proposed network also outperforms existing crowd counting methods based on geometry-adaptive kernels in complex scenes. A novel module generates multilevel & scale block correlation maps to guide the co-saliency density map estimation. Co-saliency attention maps are then fused for accurately locating block-wise salient objects under guidance of the initial cues. Hence, accurate density maps are generated via comprehensive learning of internal relations in block co-salient features and progressive optimization of local details with saliency-oriented scene understanding. Results from extensive experiments on existing density map estimation datasets with arbitrary challenges verify the effectiveness of the proposed F2CENet and show that it outperforms various state-of-the-art few-shot and crowd counting methods. Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) are used as evaluation metrics to measure the accuracy which are commonly used metrics for counting task. The average predicted MAE and RMSE are 10.88% and 8.44% less compared with the state-of-the-art evaluated on dataset contains sufficiently large and diverse categories used for few-shot and crowd counting.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.