{"title":"学习视频对象分割的高质量动态记忆","authors":"Yong Liu;Ran Yu;Fei Yin;Xinyuan Zhao;Wei Zhao;Weihao Xia;Jiahao Wang;Yitong Wang;Yansong Tang;Yujiu Yang","doi":"10.1109/TPAMI.2025.3532306","DOIUrl":null,"url":null,"abstract":"Recently, several spatial-temporal memory-based methods have verified that storing intermediate frames with masks as memory helps segment target objects in videos. However, they mainly focus on better matching between the current frame and memory frames without paying attention to the quality of the memory. Consequently, frames with poor segmentation masks may be memorized, leading to error accumulation problems. Besides, the linear increase of memory frames with the growth of frame numbers limits the ability of the models to handle long videos. To this end, we propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame, allowing the memory bank to selectively store accurately segmented frames and prevent error accumulation. Then, we combine the segmentation quality with temporal consistency to dynamically update the memory bank and make the models can handle videos of arbitrary length. The above operation ensures the reliability of memory frames and improves the quality of memory at the frame level. Moreover, we observe that the memory features extracted from reliable frames still contain noise and have limited representation capabilities. To address this problem, we propose to perform memory enhancement and anchoring on the basis of QDMN to improve the quality of memory from the feature level, resulting in a more robust and effective network QDMN++. Our method achieves state-of-the-art performance on all popular benchmarks. Moreover, extensive experiments demonstrate that the proposed memory screening mechanism can be applied to any memory-based methods as generic plugins.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 5","pages":"3452-3468"},"PeriodicalIF":18.6000,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning High-Quality Dynamic Memory for Video Object Segmentation\",\"authors\":\"Yong Liu;Ran Yu;Fei Yin;Xinyuan Zhao;Wei Zhao;Weihao Xia;Jiahao Wang;Yitong Wang;Yansong Tang;Yujiu Yang\",\"doi\":\"10.1109/TPAMI.2025.3532306\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, several spatial-temporal memory-based methods have verified that storing intermediate frames with masks as memory helps segment target objects in videos. However, they mainly focus on better matching between the current frame and memory frames without paying attention to the quality of the memory. Consequently, frames with poor segmentation masks may be memorized, leading to error accumulation problems. Besides, the linear increase of memory frames with the growth of frame numbers limits the ability of the models to handle long videos. To this end, we propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame, allowing the memory bank to selectively store accurately segmented frames and prevent error accumulation. Then, we combine the segmentation quality with temporal consistency to dynamically update the memory bank and make the models can handle videos of arbitrary length. The above operation ensures the reliability of memory frames and improves the quality of memory at the frame level. Moreover, we observe that the memory features extracted from reliable frames still contain noise and have limited representation capabilities. To address this problem, we propose to perform memory enhancement and anchoring on the basis of QDMN to improve the quality of memory from the feature level, resulting in a more robust and effective network QDMN++. Our method achieves state-of-the-art performance on all popular benchmarks. Moreover, extensive experiments demonstrate that the proposed memory screening mechanism can be applied to any memory-based methods as generic plugins.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 5\",\"pages\":\"3452-3468\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10848350/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10848350/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning High-Quality Dynamic Memory for Video Object Segmentation
Recently, several spatial-temporal memory-based methods have verified that storing intermediate frames with masks as memory helps segment target objects in videos. However, they mainly focus on better matching between the current frame and memory frames without paying attention to the quality of the memory. Consequently, frames with poor segmentation masks may be memorized, leading to error accumulation problems. Besides, the linear increase of memory frames with the growth of frame numbers limits the ability of the models to handle long videos. To this end, we propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame, allowing the memory bank to selectively store accurately segmented frames and prevent error accumulation. Then, we combine the segmentation quality with temporal consistency to dynamically update the memory bank and make the models can handle videos of arbitrary length. The above operation ensures the reliability of memory frames and improves the quality of memory at the frame level. Moreover, we observe that the memory features extracted from reliable frames still contain noise and have limited representation capabilities. To address this problem, we propose to perform memory enhancement and anchoring on the basis of QDMN to improve the quality of memory from the feature level, resulting in a more robust and effective network QDMN++. Our method achieves state-of-the-art performance on all popular benchmarks. Moreover, extensive experiments demonstrate that the proposed memory screening mechanism can be applied to any memory-based methods as generic plugins.