A Survey on Video Diffusion Models

IF 23.8 1区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS ACM Computing Surveys Pub Date : 2024-09-18 DOI:10.1145/3696415

Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, Yu-Gang Jiang

{"title":"A Survey on Video Diffusion Models","authors":"Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, Yu-Gang Jiang","doi":"10.1145/3696415","DOIUrl":null,"url":null,"abstract":"The recent wave of AI-generated content (AIGC) has witnessed substantial success in computer vision, with the diffusion model playing a crucial role in this achievement. Due to their impressive generative capabilities, diffusion models are gradually superseding methods based on GANs and auto-regressive Transformers, demonstrating exceptional performance not only in image generation and editing, but also in the realm of video-related research. However, existing surveys mainly focus on diffusion models in the context of image generation, with few up-to-date reviews on their application in the video domain. To address this gap, this paper presents a comprehensive review of video diffusion models in the AIGC era. Specifically, we begin with a concise introduction to the fundamentals and evolution of diffusion models. Subsequently, we present an overview of research on diffusion models in the video domain, categorizing the work into three key areas: video generation, video editing, and other video understanding tasks. We conduct a thorough review of the literature in these three key areas, including further categorization and practical contributions in the field. Finally, we discuss the challenges faced by research in this domain and outline potential future developmental trends. A comprehensive list of video diffusion models studied in this survey is available at https://github.com/ChenHsing/Awesome-Video-Diffusion-Models.","PeriodicalId":50926,"journal":{"name":"ACM Computing Surveys","volume":"29 1","pages":""},"PeriodicalIF":23.8000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Computing Surveys","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3696415","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

The recent wave of AI-generated content (AIGC) has witnessed substantial success in computer vision, with the diffusion model playing a crucial role in this achievement. Due to their impressive generative capabilities, diffusion models are gradually superseding methods based on GANs and auto-regressive Transformers, demonstrating exceptional performance not only in image generation and editing, but also in the realm of video-related research. However, existing surveys mainly focus on diffusion models in the context of image generation, with few up-to-date reviews on their application in the video domain. To address this gap, this paper presents a comprehensive review of video diffusion models in the AIGC era. Specifically, we begin with a concise introduction to the fundamentals and evolution of diffusion models. Subsequently, we present an overview of research on diffusion models in the video domain, categorizing the work into three key areas: video generation, video editing, and other video understanding tasks. We conduct a thorough review of the literature in these three key areas, including further categorization and practical contributions in the field. Finally, we discuss the challenges faced by research in this domain and outline potential future developmental trends. A comprehensive list of video diffusion models studied in this survey is available at https://github.com/ChenHsing/Awesome-Video-Diffusion-Models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

视频传播模型调查

最近的人工智能生成内容（AIGC）浪潮见证了计算机视觉领域的巨大成功，而扩散模型在这一成就中发挥了至关重要的作用。由于其令人印象深刻的生成能力，扩散模型正逐渐取代基于 GAN 和自动回归变换器的方法，不仅在图像生成和编辑方面表现出卓越的性能，在视频相关研究领域也是如此。然而，现有的研究主要关注图像生成中的扩散模型，很少有关于其在视频领域应用的最新评论。针对这一空白，本文对 AIGC 时代的视频扩散模型进行了全面评述。具体来说，我们首先简要介绍了扩散模型的基本原理和演变。随后，我们概述了视频领域的扩散模型研究，并将研究工作分为三个关键领域：视频生成、视频编辑和其他视频理解任务。我们对这三个关键领域的文献进行了全面回顾，包括进一步分类和该领域的实际贡献。最后，我们讨论了该领域研究面临的挑战，并概述了潜在的未来发展趋势。本调查所研究的视频扩散模型的综合列表可在 https://github.com/ChenHsing/Awesome-Video-Diffusion-Models 上查阅。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Computing Surveys 工程技术-计算机：理论方法

CiteScore

33.20

自引率

0.60%

发文量

372

审稿时长

12 months

期刊介绍： ACM Computing Surveys is an academic journal that focuses on publishing surveys and tutorials on various areas of computing research and practice. The journal aims to provide comprehensive and easily understandable articles that guide readers through the literature and help them understand topics outside their specialties. In terms of impact, CSUR has a high reputation with a 2022 Impact Factor of 16.6. It is ranked 3rd out of 111 journals in the field of Computer Science Theory & Methods. ACM Computing Surveys is indexed and abstracted in various services, including AI2 Semantic Scholar, Baidu, Clarivate/ISI: JCR, CNKI, DeepDyve, DTU, EBSCO: EDS/HOST, and IET Inspec, among others.

期刊最新文献

Causal Discovery from Temporal Data: An Overview and New Perspectives Explainable Artificial Intelligence: Importance, Use Domains, Stages, Output Shapes, and Challenges Racial Bias within Face Recognition: A Survey A Survey of Machine Learning for Urban Decision Making: Applications in Planning, Transportation, and Healthcare Tool Learning with Foundation Models