AMG: Avatar Motion Guided Video Generation

Zhangsihao Yang, Mengyi Shan, Mohammad Farazi, Wenhui Zhu, Yanxi Chen, Xuanzhao Dong, Yalin Wang
{"title":"AMG: Avatar Motion Guided Video Generation","authors":"Zhangsihao Yang, Mengyi Shan, Mohammad Farazi, Wenhui Zhu, Yanxi Chen, Xuanzhao Dong, Yalin Wang","doi":"arxiv-2409.01502","DOIUrl":null,"url":null,"abstract":"Human video generation task has gained significant attention with the\nadvancement of deep generative models. Generating realistic videos with human\nmovements is challenging in nature, due to the intricacies of human body\ntopology and sensitivity to visual artifacts. The extensively studied 2D media\ngeneration methods take advantage of massive human media datasets, but struggle\nwith 3D-aware control; whereas 3D avatar-based approaches, while offering more\nfreedom in control, lack photorealism and cannot be harmonized seamlessly with\nbackground scene. We propose AMG, a method that combines the 2D photorealism\nand 3D controllability by conditioning video diffusion models on controlled\nrendering of 3D avatars. We additionally introduce a novel data processing\npipeline that reconstructs and renders human avatar movements from dynamic\ncamera videos. AMG is the first method that enables multi-person diffusion\nvideo generation with precise control over camera positions, human motions, and\nbackground style. We also demonstrate through extensive evaluation that it\noutperforms existing human video generation methods conditioned on pose\nsequences or driving videos in terms of realism and adaptability.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":"136 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Human video generation task has gained significant attention with the advancement of deep generative models. Generating realistic videos with human movements is challenging in nature, due to the intricacies of human body topology and sensitivity to visual artifacts. The extensively studied 2D media generation methods take advantage of massive human media datasets, but struggle with 3D-aware control; whereas 3D avatar-based approaches, while offering more freedom in control, lack photorealism and cannot be harmonized seamlessly with background scene. We propose AMG, a method that combines the 2D photorealism and 3D controllability by conditioning video diffusion models on controlled rendering of 3D avatars. We additionally introduce a novel data processing pipeline that reconstructs and renders human avatar movements from dynamic camera videos. AMG is the first method that enables multi-person diffusion video generation with precise control over camera positions, human motions, and background style. We also demonstrate through extensive evaluation that it outperforms existing human video generation methods conditioned on pose sequences or driving videos in terms of realism and adaptability.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AMG:阿凡达动作引导视频生成器
随着深度生成模型的发展,人类视频生成任务获得了极大关注。由于人体结构的复杂性和对视觉伪影的敏感性,生成逼真的人体动作视频本质上具有挑战性。已被广泛研究的二维媒体生成方法利用了海量人类媒体数据集的优势,但在三维感知控制方面却举步维艰;而基于三维头像的方法虽然提供了更大的控制自由度,但却缺乏逼真度,无法与背景场景无缝协调。我们提出了 AMG 方法,通过在三维头像的控制渲染中调节视频扩散模型,将二维逼真度和三维可控性结合起来。此外,我们还引入了一种新颖的数据处理管道,可从动态摄像机视频中重建和渲染人类头像的动作。AMG 是第一种能精确控制摄像机位置、人体运动和背景风格的多人扩散视频生成方法。我们还通过广泛的评估证明,AMG 在逼真度和适应性方面优于现有的以位置序列或驾驶视频为条件的人体视频生成方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations A Missing Data Imputation GAN for Character Sprite Generation Visualizing Temporal Topic Embeddings with a Compass Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models Phys3DGS: Physically-based 3D Gaussian Splatting for Inverse Rendering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1