{"title":"基于逐像素密度分布模型的半监督计数","authors":"Hui Lin;Zhiheng Ma;Rongrong Ji;Yaowei Wang;Zhou Su;Xiaopeng Hong;Deyu Meng","doi":"10.1109/TPAMI.2025.3532512","DOIUrl":null,"url":null,"abstract":"This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled. We formulate the pixel-wise density value to regress as a probability distribution, instead of a single deterministic value. On this basis, we propose a semi-supervised crowd counting model. First, we design a pixel-wise distribution matching loss to measure the differences in the pixel-wise density distributions between the prediction and the ground-truth; Second, we enhance the transformer decoder by using <underline>density tokens</u> to specialize the forwards of decoders w.r.t. different density intervals; Third, we design the <underline>interleaving consistency</u> self-supervised learning mechanism to learn from unlabeled data efficiently. Extensive experiments on four datasets are performed to show that our method clearly outperforms the competitors by a large margin under various labeled ratio settings.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"3625-3638"},"PeriodicalIF":18.6000,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Semi-Supervised Counting via Pixel-by-Pixel Density Distribution Modeling\",\"authors\":\"Hui Lin;Zhiheng Ma;Rongrong Ji;Yaowei Wang;Zhou Su;Xiaopeng Hong;Deyu Meng\",\"doi\":\"10.1109/TPAMI.2025.3532512\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled. We formulate the pixel-wise density value to regress as a probability distribution, instead of a single deterministic value. On this basis, we propose a semi-supervised crowd counting model. First, we design a pixel-wise distribution matching loss to measure the differences in the pixel-wise density distributions between the prediction and the ground-truth; Second, we enhance the transformer decoder by using <underline>density tokens</u> to specialize the forwards of decoders w.r.t. different density intervals; Third, we design the <underline>interleaving consistency</u> self-supervised learning mechanism to learn from unlabeled data efficiently. Extensive experiments on four datasets are performed to show that our method clearly outperforms the competitors by a large margin under various labeled ratio settings.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 5\",\"pages\":\"3625-3638\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10848320/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10848320/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semi-Supervised Counting via Pixel-by-Pixel Density Distribution Modeling
This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled. We formulate the pixel-wise density value to regress as a probability distribution, instead of a single deterministic value. On this basis, we propose a semi-supervised crowd counting model. First, we design a pixel-wise distribution matching loss to measure the differences in the pixel-wise density distributions between the prediction and the ground-truth; Second, we enhance the transformer decoder by using density tokens to specialize the forwards of decoders w.r.t. different density intervals; Third, we design the interleaving consistency self-supervised learning mechanism to learn from unlabeled data efficiently. Extensive experiments on four datasets are performed to show that our method clearly outperforms the competitors by a large margin under various labeled ratio settings.