利用对比学习进行多参考生成式人脸视频压缩

Goluck Konuko, Giuseppe Valenzise
{"title":"利用对比学习进行多参考生成式人脸视频压缩","authors":"Goluck Konuko, Giuseppe Valenzise","doi":"arxiv-2409.01029","DOIUrl":null,"url":null,"abstract":"Generative face video coding (GFVC) has been demonstrated as a potential\napproach to low-latency, low bitrate video conferencing. GFVC frameworks\nachieve an extreme gain in coding efficiency with over 70% bitrate savings when\ncompared to conventional codecs at bitrates below 10kbps. In recent MPEG/JVET\nstandardization efforts, all the information required to reconstruct video\nsequences using GFVC frameworks are adopted as part of the supplemental\nenhancement information (SEI) in existing compression pipelines. In light of\nthis development, we aim to address a challenge that has been weakly addressed\nin prior GFVC frameworks, i.e., reconstruction drift as the distance between\nthe reference and target frames increases. This challenge creates the need to\nupdate the reference buffer more frequently by transmitting more Intra-refresh\nframes, which are the most expensive element of the GFVC bitstream. To overcome\nthis problem, we propose instead multiple reference animation as a robust\napproach to minimizing reconstruction drift, especially when used in a\nbi-directional prediction mode. Further, we propose a contrastive learning\nformulation for multi-reference animation. We observe that using a contrastive\nlearning framework enhances the representation capabilities of the animation\ngenerator. The resulting framework, MRDAC (Multi-Reference Deep Animation\nCodec) can therefore be used to compress longer sequences with fewer reference\nframes or achieve a significant gain in reconstruction accuracy at comparable\nbitrates to previous frameworks. Quantitative and qualitative results show\nsignificant coding and reconstruction quality gains compared to previous GFVC\nmethods, and more accurate animation quality in presence of large pose and\nfacial expression changes.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Reference Generative Face Video Compression with Contrastive Learning\",\"authors\":\"Goluck Konuko, Giuseppe Valenzise\",\"doi\":\"arxiv-2409.01029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generative face video coding (GFVC) has been demonstrated as a potential\\napproach to low-latency, low bitrate video conferencing. GFVC frameworks\\nachieve an extreme gain in coding efficiency with over 70% bitrate savings when\\ncompared to conventional codecs at bitrates below 10kbps. In recent MPEG/JVET\\nstandardization efforts, all the information required to reconstruct video\\nsequences using GFVC frameworks are adopted as part of the supplemental\\nenhancement information (SEI) in existing compression pipelines. In light of\\nthis development, we aim to address a challenge that has been weakly addressed\\nin prior GFVC frameworks, i.e., reconstruction drift as the distance between\\nthe reference and target frames increases. This challenge creates the need to\\nupdate the reference buffer more frequently by transmitting more Intra-refresh\\nframes, which are the most expensive element of the GFVC bitstream. To overcome\\nthis problem, we propose instead multiple reference animation as a robust\\napproach to minimizing reconstruction drift, especially when used in a\\nbi-directional prediction mode. Further, we propose a contrastive learning\\nformulation for multi-reference animation. We observe that using a contrastive\\nlearning framework enhances the representation capabilities of the animation\\ngenerator. The resulting framework, MRDAC (Multi-Reference Deep Animation\\nCodec) can therefore be used to compress longer sequences with fewer reference\\nframes or achieve a significant gain in reconstruction accuracy at comparable\\nbitrates to previous frameworks. Quantitative and qualitative results show\\nsignificant coding and reconstruction quality gains compared to previous GFVC\\nmethods, and more accurate animation quality in presence of large pose and\\nfacial expression changes.\",\"PeriodicalId\":501480,\"journal\":{\"name\":\"arXiv - CS - Multimedia\",\"volume\":\"24 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.01029\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

生成式人脸视频编码(GFVC)已被证明是低延迟、低比特率视频会议的一种潜在方法。与比特率低于 10kbps 的传统编解码器相比,GFVC 框架能极大地提高编码效率,节省 70% 以上的比特率。在最近的 MPEG/JVET 标准化工作中,使用 GFVC 框架重构视频序列所需的所有信息都被采纳为现有压缩管道中补充增强信息(SEI)的一部分。考虑到这一发展,我们的目标是解决在先前的 GFVC 框架中解决不力的难题,即随着参考帧和目标帧之间距离的增加而出现的重建漂移。这一挑战导致需要通过传输更多的内部刷新帧来更频繁地更新参考缓冲区,而内部刷新帧是 GFVC 比特流中最昂贵的元素。为了解决这个问题,我们提出了多参考动画作为一种稳健的方法,以最大限度地减少重建漂移,尤其是在非定向预测模式下使用时。此外,我们还为多参考动画提出了一种对比学习公式。我们发现,使用对比学习框架可以增强动画生成器的表示能力。因此,由此产生的框架 MRDAC(多参考深度动画编解码器)可以用来压缩较长的序列,减少参考帧的数量,或者以与以前框架相当的比特率实现重建精度的显著提高。定量和定性结果表明,与以前的 GFVC 方法相比,编码和重建质量都有显著提高,而且在姿势和面部表情变化较大的情况下,动画质量也更加精确。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Multi-Reference Generative Face Video Compression with Contrastive Learning
Generative face video coding (GFVC) has been demonstrated as a potential approach to low-latency, low bitrate video conferencing. GFVC frameworks achieve an extreme gain in coding efficiency with over 70% bitrate savings when compared to conventional codecs at bitrates below 10kbps. In recent MPEG/JVET standardization efforts, all the information required to reconstruct video sequences using GFVC frameworks are adopted as part of the supplemental enhancement information (SEI) in existing compression pipelines. In light of this development, we aim to address a challenge that has been weakly addressed in prior GFVC frameworks, i.e., reconstruction drift as the distance between the reference and target frames increases. This challenge creates the need to update the reference buffer more frequently by transmitting more Intra-refresh frames, which are the most expensive element of the GFVC bitstream. To overcome this problem, we propose instead multiple reference animation as a robust approach to minimizing reconstruction drift, especially when used in a bi-directional prediction mode. Further, we propose a contrastive learning formulation for multi-reference animation. We observe that using a contrastive learning framework enhances the representation capabilities of the animation generator. The resulting framework, MRDAC (Multi-Reference Deep Animation Codec) can therefore be used to compress longer sequences with fewer reference frames or achieve a significant gain in reconstruction accuracy at comparable bitrates to previous frameworks. Quantitative and qualitative results show significant coding and reconstruction quality gains compared to previous GFVC methods, and more accurate animation quality in presence of large pose and facial expression changes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Vista3D: Unravel the 3D Darkside of a Single Image MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion Efficient Low-Resolution Face Recognition via Bridge Distillation Enhancing Few-Shot Classification without Forgetting through Multi-Level Contrastive Constraints NVLM: Open Frontier-Class Multimodal LLMs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1