Kaihua Zhang, Mengming Michael Dong, Bo Liu, Xiaotong Yuan, Qingshan Liu
{"title":"DeepACG:基于语义感知对比Gromov-Wasserstein距离的共显著性检测","authors":"Kaihua Zhang, Mengming Michael Dong, Bo Liu, Xiaotong Yuan, Qingshan Liu","doi":"10.1109/CVPR46437.2021.01349","DOIUrl":null,"url":null,"abstract":"The objective of co-saliency detection is to segment the co-occurring salient objects in a group of images. To address this task, we introduce a new deep network architecture via semantic-aware contrast Gromov-Wasserstein distance (DeepACG). We first adopt the Gromov-Wasserstein (GW) distance to build dense 4D correlation volumes for all pairs of image pixels within the image group. These dense correlation volumes enable the network to accurately discover the structured pair-wise pixel similarities among the common salient objects. Second, we develop a semantic-aware co-attention module (SCAM) to enhance the foreground co-saliency through predicted categorical information. Specifically, SCAM recognizes the semantic class of the foreground co-objects, and this information is then modulated to the deep representations to localize the related pixels. Third, we design a contrast edge-enhanced module (EEM) to capture richer contexts and preserve fine-grained spatial information. We validate the effectiveness of our model using three largest and most challenging benchmark datasets (Cosal2015, CoCA, and CoSOD3k). Extensive experiments have demonstrated the substantial practical merit of each module. Compared with the existing works, DeepACG shows significant improvements and achieves state-of-the-art performance.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"DeepACG: Co-Saliency Detection via Semantic-aware Contrast Gromov-Wasserstein Distance\",\"authors\":\"Kaihua Zhang, Mengming Michael Dong, Bo Liu, Xiaotong Yuan, Qingshan Liu\",\"doi\":\"10.1109/CVPR46437.2021.01349\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of co-saliency detection is to segment the co-occurring salient objects in a group of images. To address this task, we introduce a new deep network architecture via semantic-aware contrast Gromov-Wasserstein distance (DeepACG). We first adopt the Gromov-Wasserstein (GW) distance to build dense 4D correlation volumes for all pairs of image pixels within the image group. These dense correlation volumes enable the network to accurately discover the structured pair-wise pixel similarities among the common salient objects. Second, we develop a semantic-aware co-attention module (SCAM) to enhance the foreground co-saliency through predicted categorical information. Specifically, SCAM recognizes the semantic class of the foreground co-objects, and this information is then modulated to the deep representations to localize the related pixels. Third, we design a contrast edge-enhanced module (EEM) to capture richer contexts and preserve fine-grained spatial information. We validate the effectiveness of our model using three largest and most challenging benchmark datasets (Cosal2015, CoCA, and CoSOD3k). Extensive experiments have demonstrated the substantial practical merit of each module. Compared with the existing works, DeepACG shows significant improvements and achieves state-of-the-art performance.\",\"PeriodicalId\":339646,\"journal\":{\"name\":\"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR46437.2021.01349\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR46437.2021.01349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DeepACG: Co-Saliency Detection via Semantic-aware Contrast Gromov-Wasserstein Distance
The objective of co-saliency detection is to segment the co-occurring salient objects in a group of images. To address this task, we introduce a new deep network architecture via semantic-aware contrast Gromov-Wasserstein distance (DeepACG). We first adopt the Gromov-Wasserstein (GW) distance to build dense 4D correlation volumes for all pairs of image pixels within the image group. These dense correlation volumes enable the network to accurately discover the structured pair-wise pixel similarities among the common salient objects. Second, we develop a semantic-aware co-attention module (SCAM) to enhance the foreground co-saliency through predicted categorical information. Specifically, SCAM recognizes the semantic class of the foreground co-objects, and this information is then modulated to the deep representations to localize the related pixels. Third, we design a contrast edge-enhanced module (EEM) to capture richer contexts and preserve fine-grained spatial information. We validate the effectiveness of our model using three largest and most challenging benchmark datasets (Cosal2015, CoCA, and CoSOD3k). Extensive experiments have demonstrated the substantial practical merit of each module. Compared with the existing works, DeepACG shows significant improvements and achieves state-of-the-art performance.