{"title":"学习视频压缩的长短期信息传播与融合","authors":"Shen Wang;Donghui Feng;Guo Lu;Zhengxue Cheng;Li Song;Wenjun Zhang","doi":"10.1109/TBC.2024.3434702","DOIUrl":null,"url":null,"abstract":"In recent years, numerous learned video compression (LVC) methods have emerged, demonstrating rapid developments and satisfactory performance. However, in most previous methods, only the previous one frame is used as reference. Although some works introduce the usage of the previous multiple frames, the exploitation of temporal information is not comprehensive. Our proposed method not only utilizes the short-term information from multiple neighboring frames but also introduces long-term feature information as the reference, which effectively enhances the quality of the context and improves the compression efficiency. In our scheme, we propose the long-term information exploitation mechanism to capture long-term temporal relevance. The update and propagation of long-term information establish an implicit connection between the latent representation of the current frame and distant reference frames, aiding in the generation of long-term context. Meanwhile, the short-term neighboring frames are also utilized to extract local information and generate short-term context. The fusion of long-term context and short-term context results in a more comprehensive and high-quality context to achieve sufficient temporal information mining. Besides, the multiple frames information also helps to improve the efficiency of motion compression. They are used to generate the predicted motion and remove spatio-temporal redundancies in motion information by second-order motion prediction and fusion. Experimental results demonstrate that our proposed efficient learned video compression scheme can achieve 4.79% BD-rate saving when compared with H.266 (VTM).","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 4","pages":"1254-1265"},"PeriodicalIF":3.2000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Long-Term and Short-Term Information Propagation and Fusion for Learned Video Compression\",\"authors\":\"Shen Wang;Donghui Feng;Guo Lu;Zhengxue Cheng;Li Song;Wenjun Zhang\",\"doi\":\"10.1109/TBC.2024.3434702\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, numerous learned video compression (LVC) methods have emerged, demonstrating rapid developments and satisfactory performance. However, in most previous methods, only the previous one frame is used as reference. Although some works introduce the usage of the previous multiple frames, the exploitation of temporal information is not comprehensive. Our proposed method not only utilizes the short-term information from multiple neighboring frames but also introduces long-term feature information as the reference, which effectively enhances the quality of the context and improves the compression efficiency. In our scheme, we propose the long-term information exploitation mechanism to capture long-term temporal relevance. The update and propagation of long-term information establish an implicit connection between the latent representation of the current frame and distant reference frames, aiding in the generation of long-term context. Meanwhile, the short-term neighboring frames are also utilized to extract local information and generate short-term context. The fusion of long-term context and short-term context results in a more comprehensive and high-quality context to achieve sufficient temporal information mining. Besides, the multiple frames information also helps to improve the efficiency of motion compression. They are used to generate the predicted motion and remove spatio-temporal redundancies in motion information by second-order motion prediction and fusion. Experimental results demonstrate that our proposed efficient learned video compression scheme can achieve 4.79% BD-rate saving when compared with H.266 (VTM).\",\"PeriodicalId\":13159,\"journal\":{\"name\":\"IEEE Transactions on Broadcasting\",\"volume\":\"70 4\",\"pages\":\"1254-1265\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Broadcasting\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10659718/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Broadcasting","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10659718/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Long-Term and Short-Term Information Propagation and Fusion for Learned Video Compression
In recent years, numerous learned video compression (LVC) methods have emerged, demonstrating rapid developments and satisfactory performance. However, in most previous methods, only the previous one frame is used as reference. Although some works introduce the usage of the previous multiple frames, the exploitation of temporal information is not comprehensive. Our proposed method not only utilizes the short-term information from multiple neighboring frames but also introduces long-term feature information as the reference, which effectively enhances the quality of the context and improves the compression efficiency. In our scheme, we propose the long-term information exploitation mechanism to capture long-term temporal relevance. The update and propagation of long-term information establish an implicit connection between the latent representation of the current frame and distant reference frames, aiding in the generation of long-term context. Meanwhile, the short-term neighboring frames are also utilized to extract local information and generate short-term context. The fusion of long-term context and short-term context results in a more comprehensive and high-quality context to achieve sufficient temporal information mining. Besides, the multiple frames information also helps to improve the efficiency of motion compression. They are used to generate the predicted motion and remove spatio-temporal redundancies in motion information by second-order motion prediction and fusion. Experimental results demonstrate that our proposed efficient learned video compression scheme can achieve 4.79% BD-rate saving when compared with H.266 (VTM).
期刊介绍:
The Society’s Field of Interest is “Devices, equipment, techniques and systems related to broadcast technology, including the production, distribution, transmission, and propagation aspects.” In addition to this formal FOI statement, which is used to provide guidance to the Publications Committee in the selection of content, the AdCom has further resolved that “broadcast systems includes all aspects of transmission, propagation, and reception.”