StereoCrafter:从单目视频生成基于扩散的长尺寸高保真立体三维图像

Sijie Zhao, Wenbo Hu, Xiaodong Cun, Yong Zhang, Xiaoyu Li, Zhe Kong, Xiangjun Gao, Muyao Niu, Ying Shan
{"title":"StereoCrafter:从单目视频生成基于扩散的长尺寸高保真立体三维图像","authors":"Sijie Zhao, Wenbo Hu, Xiaodong Cun, Yong Zhang, Xiaoyu Li, Zhe Kong, Xiangjun Gao, Muyao Niu, Ying Shan","doi":"arxiv-2409.07447","DOIUrl":null,"url":null,"abstract":"This paper presents a novel framework for converting 2D videos to immersive\nstereoscopic 3D, addressing the growing demand for 3D content in immersive\nexperience. Leveraging foundation models as priors, our approach overcomes the\nlimitations of traditional methods and boosts the performance to ensure the\nhigh-fidelity generation required by the display devices. The proposed system\nconsists of two main steps: depth-based video splatting for warping and\nextracting occlusion mask, and stereo video inpainting. We utilize pre-trained\nstable video diffusion as the backbone and introduce a fine-tuning protocol for\nthe stereo video inpainting task. To handle input video with varying lengths\nand resolutions, we explore auto-regressive strategies and tiled processing.\nFinally, a sophisticated data processing pipeline has been developed to\nreconstruct a large-scale and high-quality dataset to support our training. Our\nframework demonstrates significant improvements in 2D-to-3D video conversion,\noffering a practical solution for creating immersive content for 3D devices\nlike Apple Vision Pro and 3D displays. In summary, this work contributes to the\nfield by presenting an effective method for generating high-quality\nstereoscopic videos from monocular input, potentially transforming how we\nexperience digital media.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos\",\"authors\":\"Sijie Zhao, Wenbo Hu, Xiaodong Cun, Yong Zhang, Xiaoyu Li, Zhe Kong, Xiangjun Gao, Muyao Niu, Ying Shan\",\"doi\":\"arxiv-2409.07447\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel framework for converting 2D videos to immersive\\nstereoscopic 3D, addressing the growing demand for 3D content in immersive\\nexperience. Leveraging foundation models as priors, our approach overcomes the\\nlimitations of traditional methods and boosts the performance to ensure the\\nhigh-fidelity generation required by the display devices. The proposed system\\nconsists of two main steps: depth-based video splatting for warping and\\nextracting occlusion mask, and stereo video inpainting. We utilize pre-trained\\nstable video diffusion as the backbone and introduce a fine-tuning protocol for\\nthe stereo video inpainting task. To handle input video with varying lengths\\nand resolutions, we explore auto-regressive strategies and tiled processing.\\nFinally, a sophisticated data processing pipeline has been developed to\\nreconstruct a large-scale and high-quality dataset to support our training. Our\\nframework demonstrates significant improvements in 2D-to-3D video conversion,\\noffering a practical solution for creating immersive content for 3D devices\\nlike Apple Vision Pro and 3D displays. In summary, this work contributes to the\\nfield by presenting an effective method for generating high-quality\\nstereoscopic videos from monocular input, potentially transforming how we\\nexperience digital media.\",\"PeriodicalId\":501174,\"journal\":{\"name\":\"arXiv - CS - Graphics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07447\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种将 2D 视频转换为沉浸式立体 3D 的新型框架,以满足沉浸式体验对 3D 内容日益增长的需求。利用基础模型作为先验,我们的方法克服了传统方法的局限性,并提高了性能,以确保显示设备所需的高保真生成。我们提出的系统包括两个主要步骤:基于深度的视频拼接(用于扭曲和提取遮挡)和立体视频内绘。我们利用预训练的稳定视频扩散作为骨干,并为立体视频绘制任务引入了微调协议。为了处理不同长度和分辨率的输入视频,我们探索了自动回归策略和平铺处理方法。最后,我们开发了一个复杂的数据处理管道,以重建一个大规模、高质量的数据集来支持我们的训练。我们的框架在 2D 到 3D 视频转换方面取得了重大改进,为苹果 Vision Pro 等 3D 设备和 3D 显示器创建身临其境的内容提供了实用的解决方案。总之,这项工作提出了一种从单眼输入生成高质量立体视频的有效方法,可能会改变我们体验数字媒体的方式,从而为该领域做出贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos
This paper presents a novel framework for converting 2D videos to immersive stereoscopic 3D, addressing the growing demand for 3D content in immersive experience. Leveraging foundation models as priors, our approach overcomes the limitations of traditional methods and boosts the performance to ensure the high-fidelity generation required by the display devices. The proposed system consists of two main steps: depth-based video splatting for warping and extracting occlusion mask, and stereo video inpainting. We utilize pre-trained stable video diffusion as the backbone and introduce a fine-tuning protocol for the stereo video inpainting task. To handle input video with varying lengths and resolutions, we explore auto-regressive strategies and tiled processing. Finally, a sophisticated data processing pipeline has been developed to reconstruct a large-scale and high-quality dataset to support our training. Our framework demonstrates significant improvements in 2D-to-3D video conversion, offering a practical solution for creating immersive content for 3D devices like Apple Vision Pro and 3D displays. In summary, this work contributes to the field by presenting an effective method for generating high-quality stereoscopic videos from monocular input, potentially transforming how we experience digital media.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations A Missing Data Imputation GAN for Character Sprite Generation Visualizing Temporal Topic Embeddings with a Compass Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models Phys3DGS: Physically-based 3D Gaussian Splatting for Inverse Rendering
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1