{"title":"一种新的Siamese共关注网络用于无监督视频对象分割","authors":"Zhenghao Zhang, Liguo Sun, Lingyu Si, C. Zheng","doi":"10.1109/ICCCS52626.2021.9449293","DOIUrl":null,"url":null,"abstract":"Unsupervised Video Object Segmentation (UVOS) aims to generate accurate pixel-level masks for moving objects without any prior knowledge. A lot of UVOS methods process frames independently by using image segmentation model without considering the temporal information between consecutive frames. Other works rely on RNNs or motion cues to find objects that need to be tracked, these models learn short-term temporal dependencies and thus tend to accumulate errors over time. We propose a new Siamese Co-attention Network to tackle Unsupervised Video Object Segmentation task based on SOLOv2. The Co-attention module in our Siamese Network captures global correspondences between a reference frame and the current one from same video, and it can learn pairwise correlation at any distance to help current frame correctly distinguish primary objects from a global view. Our proposed method is evaluated in TianChi VOS Challenge and DAVIS2017, and the results indicate that it exhibits superior performance.","PeriodicalId":376290,"journal":{"name":"2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A New Siamese Co-attention Network for Unsupervised Video Object Segmentation\",\"authors\":\"Zhenghao Zhang, Liguo Sun, Lingyu Si, C. Zheng\",\"doi\":\"10.1109/ICCCS52626.2021.9449293\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unsupervised Video Object Segmentation (UVOS) aims to generate accurate pixel-level masks for moving objects without any prior knowledge. A lot of UVOS methods process frames independently by using image segmentation model without considering the temporal information between consecutive frames. Other works rely on RNNs or motion cues to find objects that need to be tracked, these models learn short-term temporal dependencies and thus tend to accumulate errors over time. We propose a new Siamese Co-attention Network to tackle Unsupervised Video Object Segmentation task based on SOLOv2. The Co-attention module in our Siamese Network captures global correspondences between a reference frame and the current one from same video, and it can learn pairwise correlation at any distance to help current frame correctly distinguish primary objects from a global view. Our proposed method is evaluated in TianChi VOS Challenge and DAVIS2017, and the results indicate that it exhibits superior performance.\",\"PeriodicalId\":376290,\"journal\":{\"name\":\"2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS)\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCS52626.2021.9449293\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCS52626.2021.9449293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A New Siamese Co-attention Network for Unsupervised Video Object Segmentation
Unsupervised Video Object Segmentation (UVOS) aims to generate accurate pixel-level masks for moving objects without any prior knowledge. A lot of UVOS methods process frames independently by using image segmentation model without considering the temporal information between consecutive frames. Other works rely on RNNs or motion cues to find objects that need to be tracked, these models learn short-term temporal dependencies and thus tend to accumulate errors over time. We propose a new Siamese Co-attention Network to tackle Unsupervised Video Object Segmentation task based on SOLOv2. The Co-attention module in our Siamese Network captures global correspondences between a reference frame and the current one from same video, and it can learn pairwise correlation at any distance to help current frame correctly distinguish primary objects from a global view. Our proposed method is evaluated in TianChi VOS Challenge and DAVIS2017, and the results indicate that it exhibits superior performance.