Guoqiang Wang, Lan Li, Min Zhu, Rui Zhao, Xiang Zhang
{"title":"基于核的局部匹配网络用于视频对象分割","authors":"Guoqiang Wang, Lan Li, Min Zhu, Rui Zhao, Xiang Zhang","doi":"10.1007/s00138-024-01524-4","DOIUrl":null,"url":null,"abstract":"<p>Recently, the methods based on space-time memory network have achieved advanced performance in semi-supervised video object segmentation, which has attracted wide attention. However, this kind of methods still have a fatal limitation. It has the interference problem of similar objects caused by the way of non-local matching, which seriously limits the performance of video object segmentation. To solve this problem, we propose a Kernel-guided Attention Matching Network (KAMNet) by the use of local matching instead of non-local matching. At first, KAMNet uses spatio-temporal attention mechanism to enhance the model’s discrimination between foreground objects and background areas. Then KAMNet utilizes gaussian kernel to guide the matching between the current frame and the reference set. Because the gaussian kernel decays away from the center, it can limit the matching to the central region, thus achieving local matching. Our KAMNet gets speed-accuracy trade-off on benchmark datasets DAVIS 2016 (<span>\\( \\mathcal {J \\& F}\\)</span> of 87.6%) and DAVIS 2017 (<span>\\( \\mathcal {J \\& F}\\)</span> of 76.0%) with 0.12 second per frame.</p>","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"46 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Kernel based local matching network for video object segmentation\",\"authors\":\"Guoqiang Wang, Lan Li, Min Zhu, Rui Zhao, Xiang Zhang\",\"doi\":\"10.1007/s00138-024-01524-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Recently, the methods based on space-time memory network have achieved advanced performance in semi-supervised video object segmentation, which has attracted wide attention. However, this kind of methods still have a fatal limitation. It has the interference problem of similar objects caused by the way of non-local matching, which seriously limits the performance of video object segmentation. To solve this problem, we propose a Kernel-guided Attention Matching Network (KAMNet) by the use of local matching instead of non-local matching. At first, KAMNet uses spatio-temporal attention mechanism to enhance the model’s discrimination between foreground objects and background areas. Then KAMNet utilizes gaussian kernel to guide the matching between the current frame and the reference set. Because the gaussian kernel decays away from the center, it can limit the matching to the central region, thus achieving local matching. Our KAMNet gets speed-accuracy trade-off on benchmark datasets DAVIS 2016 (<span>\\\\( \\\\mathcal {J \\\\& F}\\\\)</span> of 87.6%) and DAVIS 2017 (<span>\\\\( \\\\mathcal {J \\\\& F}\\\\)</span> of 76.0%) with 0.12 second per frame.</p>\",\"PeriodicalId\":51116,\"journal\":{\"name\":\"Machine Vision and Applications\",\"volume\":\"46 1\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine Vision and Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00138-024-01524-4\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Vision and Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00138-024-01524-4","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
最近,基于时空记忆网络的方法在半监督视频对象分割方面取得了先进的性能,引起了广泛关注。但是,这种方法仍然存在致命的局限性。它存在非局部匹配方式导致的相似对象干扰问题,严重限制了视频对象分割的性能。为了解决这个问题,我们提出了一种利用局部匹配代替非局部匹配的内核引导注意力匹配网络(KAMNet)。首先,KAMNet 利用时空注意力机制来增强模型对前景物体和背景区域的辨别能力。然后,KAMNet 利用高斯核引导当前帧与参考集之间的匹配。由于高斯核从中心开始衰减,它可以将匹配限制在中心区域,从而实现局部匹配。我们的 KAMNet 在基准数据集 DAVIS 2016(87.6%)和 DAVIS 2017(76.0%)上实现了速度与精度的权衡,每帧耗时 0.12 秒。
Kernel based local matching network for video object segmentation
Recently, the methods based on space-time memory network have achieved advanced performance in semi-supervised video object segmentation, which has attracted wide attention. However, this kind of methods still have a fatal limitation. It has the interference problem of similar objects caused by the way of non-local matching, which seriously limits the performance of video object segmentation. To solve this problem, we propose a Kernel-guided Attention Matching Network (KAMNet) by the use of local matching instead of non-local matching. At first, KAMNet uses spatio-temporal attention mechanism to enhance the model’s discrimination between foreground objects and background areas. Then KAMNet utilizes gaussian kernel to guide the matching between the current frame and the reference set. Because the gaussian kernel decays away from the center, it can limit the matching to the central region, thus achieving local matching. Our KAMNet gets speed-accuracy trade-off on benchmark datasets DAVIS 2016 (\( \mathcal {J \& F}\) of 87.6%) and DAVIS 2017 (\( \mathcal {J \& F}\) of 76.0%) with 0.12 second per frame.
期刊介绍:
Machine Vision and Applications publishes high-quality technical contributions in machine vision research and development. Specifically, the editors encourage submittals in all applications and engineering aspects of image-related computing. In particular, original contributions dealing with scientific, commercial, industrial, military, and biomedical applications of machine vision, are all within the scope of the journal.
Particular emphasis is placed on engineering and technology aspects of image processing and computer vision.
The following aspects of machine vision applications are of interest: algorithms, architectures, VLSI implementations, AI techniques and expert systems for machine vision, front-end sensing, multidimensional and multisensor machine vision, real-time techniques, image databases, virtual reality and visualization. Papers must include a significant experimental validation component.