Guanze Liu, Bo Xu, Han Huang, Cheng Lu, Yandong Guo
{"title":"SDETR:注意引导显著目标检测与变压器","authors":"Guanze Liu, Bo Xu, Han Huang, Cheng Lu, Yandong Guo","doi":"10.1109/icassp43922.2022.9746367","DOIUrl":null,"url":null,"abstract":"Most existing CNN-based salient object detection methods can identify fine-grained segmentation details like hair and animal fur, but often mispredict the salient object due to lack of global contextual information caused by locality convolution layers. The limited training data of the current SOD task adds additional difficulty to capture the saliency information. In this paper, we propose a two-stage predict-refine SDETR model to leverage both benefits of transformer and CNN layers that can produce results with accurate saliency prediction and fine-grained local details. We also propose a novel pre-train dataset annotation COCO SOD to erase the overfitting problem caused by insufficient training data. Comprehensive experiments on five benchmark datasets demonstrate that the SDETR outperforms state-of-the-art approaches on four evaluation metrics, and our COCO SOD can largely improve the model performance on DUTS, ECSSD, DUT, PASCAL-S datasets.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"SDETR: Attention-Guided Salient Object Detection with Transformer\",\"authors\":\"Guanze Liu, Bo Xu, Han Huang, Cheng Lu, Yandong Guo\",\"doi\":\"10.1109/icassp43922.2022.9746367\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most existing CNN-based salient object detection methods can identify fine-grained segmentation details like hair and animal fur, but often mispredict the salient object due to lack of global contextual information caused by locality convolution layers. The limited training data of the current SOD task adds additional difficulty to capture the saliency information. In this paper, we propose a two-stage predict-refine SDETR model to leverage both benefits of transformer and CNN layers that can produce results with accurate saliency prediction and fine-grained local details. We also propose a novel pre-train dataset annotation COCO SOD to erase the overfitting problem caused by insufficient training data. Comprehensive experiments on five benchmark datasets demonstrate that the SDETR outperforms state-of-the-art approaches on four evaluation metrics, and our COCO SOD can largely improve the model performance on DUTS, ECSSD, DUT, PASCAL-S datasets.\",\"PeriodicalId\":272439,\"journal\":{\"name\":\"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/icassp43922.2022.9746367\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icassp43922.2022.9746367","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SDETR: Attention-Guided Salient Object Detection with Transformer
Most existing CNN-based salient object detection methods can identify fine-grained segmentation details like hair and animal fur, but often mispredict the salient object due to lack of global contextual information caused by locality convolution layers. The limited training data of the current SOD task adds additional difficulty to capture the saliency information. In this paper, we propose a two-stage predict-refine SDETR model to leverage both benefits of transformer and CNN layers that can produce results with accurate saliency prediction and fine-grained local details. We also propose a novel pre-train dataset annotation COCO SOD to erase the overfitting problem caused by insufficient training data. Comprehensive experiments on five benchmark datasets demonstrate that the SDETR outperforms state-of-the-art approaches on four evaluation metrics, and our COCO SOD can largely improve the model performance on DUTS, ECSSD, DUT, PASCAL-S datasets.