首页 > 最新文献

arXiv - CS - Graphics最新文献

英文 中文
Demystifying Spatial Dependence: Interactive Visualizations for Interpreting Local Spatial Autocorrelation 揭开空间相关性的神秘面纱:解读局部空间自相关性的互动可视化方法
Pub Date : 2024-08-05 DOI: arxiv-2408.02418
Lee Mason, Blanaid Hicks, Jonas Almeida
The Local Moran's I statistic is a valuable tool for identifying localizedpatterns of spatial autocorrelation. Understanding these patterns is crucial inspatial analysis, but interpreting the statistic can be difficult. To simplifythis process, we introduce three novel visualizations that enhance theinterpretation of Local Moran's I results. These visualizations can beinteractively linked to one another, and to established visualizations, tooffer a more holistic exploration of the results. We provide a JavaScriptlibrary with implementations of these new visual elements, along with a webdashboard that demonstrates their integrated use.
局部莫兰 I 统计量是识别局部空间自相关模式的重要工具。理解这些模式对空间分析至关重要,但解释统计量却很困难。为了简化这一过程,我们引入了三种新颖的可视化方法,以加强对局部莫兰 I 结果的解释。这些可视化效果可以相互交互链接,也可以与已有的可视化效果链接,从而对结果进行更全面的探索。我们提供了一个 JavaScript 库,其中包含这些新的可视化元素的实现方法,同时还提供了一个网络仪表板来演示它们的集成使用。
{"title":"Demystifying Spatial Dependence: Interactive Visualizations for Interpreting Local Spatial Autocorrelation","authors":"Lee Mason, Blanaid Hicks, Jonas Almeida","doi":"arxiv-2408.02418","DOIUrl":"https://doi.org/arxiv-2408.02418","url":null,"abstract":"The Local Moran's I statistic is a valuable tool for identifying localized\u0000patterns of spatial autocorrelation. Understanding these patterns is crucial in\u0000spatial analysis, but interpreting the statistic can be difficult. To simplify\u0000this process, we introduce three novel visualizations that enhance the\u0000interpretation of Local Moran's I results. These visualizations can be\u0000interactively linked to one another, and to established visualizations, to\u0000offer a more holistic exploration of the results. We provide a JavaScript\u0000library with implementations of these new visual elements, along with a web\u0000dashboard that demonstrates their integrated use.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization MeshAnything V2:通过相邻网格标记化生成艺术家创作的网格
Pub Date : 2024-08-05 DOI: arxiv-2408.02555
Yiwen Chen, Yikai Wang, Yihao Luo, Zhengyi Wang, Zilong Chen, Jun Zhu, Chi Zhang, Guosheng Lin
We introduce MeshAnything V2, an autoregressive transformer that generatesArtist-Created Meshes (AM) aligned to given shapes. It can be integrated withvarious 3D asset production pipelines to achieve high-quality, highlycontrollable AM generation. MeshAnything V2 surpasses previous methods in bothefficiency and performance using models of the same size. These improvementsare due to our newly proposed mesh tokenization method: Adjacent MeshTokenization (AMT). Different from previous methods that represent each facewith three vertices, AMT uses a single vertex whenever possible. Compared toprevious methods, AMT requires about half the token sequence length torepresent the same mesh in average. Furthermore, the token sequences from AMTare more compact and well-structured, fundamentally benefiting AM generation.Our extensive experiments show that AMT significantly improves the efficiencyand performance of AM generation. Project Page:https://buaacyw.github.io/meshanything-v2/
我们介绍 MeshAnything V2,它是一种自回归变换器,可生成与给定形状对齐的艺术家自创网格(AM)。它可以与各种三维资产生产流水线集成,以实现高质量、高度可控的 AM 生成。在使用相同大小的模型时,MeshAnything V2 在效率和性能上都超越了以前的方法。这些改进归功于我们新提出的网格标记化方法:相邻网格标记化(AMT)。与以前用三个顶点表示每个面的方法不同,AMT 尽可能使用单个顶点。与以前的方法相比,AMT 平均只需要一半的标记序列长度就能表示相同的网格。我们的大量实验表明,AMT 显著提高了 AM 生成的效率和性能。项目页面:https://buaacyw.github.io/meshanything-v2/
{"title":"MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization","authors":"Yiwen Chen, Yikai Wang, Yihao Luo, Zhengyi Wang, Zilong Chen, Jun Zhu, Chi Zhang, Guosheng Lin","doi":"arxiv-2408.02555","DOIUrl":"https://doi.org/arxiv-2408.02555","url":null,"abstract":"We introduce MeshAnything V2, an autoregressive transformer that generates\u0000Artist-Created Meshes (AM) aligned to given shapes. It can be integrated with\u0000various 3D asset production pipelines to achieve high-quality, highly\u0000controllable AM generation. MeshAnything V2 surpasses previous methods in both\u0000efficiency and performance using models of the same size. These improvements\u0000are due to our newly proposed mesh tokenization method: Adjacent Mesh\u0000Tokenization (AMT). Different from previous methods that represent each face\u0000with three vertices, AMT uses a single vertex whenever possible. Compared to\u0000previous methods, AMT requires about half the token sequence length to\u0000represent the same mesh in average. Furthermore, the token sequences from AMT\u0000are more compact and well-structured, fundamentally benefiting AM generation.\u0000Our extensive experiments show that AMT significantly improves the efficiency\u0000and performance of AM generation. Project Page:\u0000https://buaacyw.github.io/meshanything-v2/","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models TurboEdit:使用几步扩散模型进行基于文本的图像编辑
Pub Date : 2024-08-01 DOI: arxiv-2408.00735
Gilad Deutch, Rinon Gal, Daniel Garibi, Or Patashnik, Daniel Cohen-Or
Diffusion models have opened the path to a wide range of text-based imageediting frameworks. However, these typically build on the multi-step nature ofthe diffusion backwards process, and adapting them to distilled, fast-samplingmethods has proven surprisingly challenging. Here, we focus on a popular lineof text-based editing frameworks - the ``edit-friendly'' DDPM-noise inversionapproach. We analyze its application to fast sampling methods and categorizeits failures into two classes: the appearance of visual artifacts, andinsufficient editing strength. We trace the artifacts to mismatched noisestatistics between inverted noises and the expected noise schedule, and suggesta shifted noise schedule which corrects for this offset. To increase editingstrength, we propose a pseudo-guidance approach that efficiently increases themagnitude of edits without introducing new artifacts. All in all, our methodenables text-based image editing with as few as three diffusion steps, whileproviding novel insights into the mechanisms behind popular text-based editingapproaches.
扩散模型为各种基于文本的图像编辑框架开辟了道路。然而,这些框架通常建立在多步骤的扩散反向过程基础之上,而将它们改编为精炼、快速的采样方法已被证明具有令人惊讶的挑战性。在这里,我们将重点放在一种流行的基于文本的编辑框架上--"编辑友好 "的 DDPM 噪声反演方法。我们分析了该方法在快速采样方法中的应用,并将其故障分为两类:出现视觉伪影和编辑强度不足。我们将这些假象追溯到倒置噪声与预期噪声表之间不匹配的噪声统计,并提出了一种可纠正这种偏移的移位噪声表。为了提高编辑强度,我们提出了一种伪引导方法,这种方法可以有效地提高编辑量,同时又不会带来新的假象。总之,我们的方法只需三个扩散步骤就能实现基于文本的图像编辑,同时还为流行的基于文本编辑方法背后的机制提供了新的见解。
{"title":"TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models","authors":"Gilad Deutch, Rinon Gal, Daniel Garibi, Or Patashnik, Daniel Cohen-Or","doi":"arxiv-2408.00735","DOIUrl":"https://doi.org/arxiv-2408.00735","url":null,"abstract":"Diffusion models have opened the path to a wide range of text-based image\u0000editing frameworks. However, these typically build on the multi-step nature of\u0000the diffusion backwards process, and adapting them to distilled, fast-sampling\u0000methods has proven surprisingly challenging. Here, we focus on a popular line\u0000of text-based editing frameworks - the ``edit-friendly'' DDPM-noise inversion\u0000approach. We analyze its application to fast sampling methods and categorize\u0000its failures into two classes: the appearance of visual artifacts, and\u0000insufficient editing strength. We trace the artifacts to mismatched noise\u0000statistics between inverted noises and the expected noise schedule, and suggest\u0000a shifted noise schedule which corrects for this offset. To increase editing\u0000strength, we propose a pseudo-guidance approach that efficiently increases the\u0000magnitude of edits without introducing new artifacts. All in all, our method\u0000enables text-based image editing with as few as three diffusion steps, while\u0000providing novel insights into the mechanisms behind popular text-based editing\u0000approaches.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141885796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MotionFix: Text-Driven 3D Human Motion Editing MotionFix:文本驱动的 3D 人体动作编辑
Pub Date : 2024-08-01 DOI: arxiv-2408.00712
Nikos Athanasiou, Alpár Ceske, Markos Diomataris, Michael J. Black, Gül Varol
The focus of this paper is 3D motion editing. Given a 3D human motion and atextual description of the desired modification, our goal is to generate anedited motion as described by the text. The challenges include the lack oftraining data and the design of a model that faithfully edits the sourcemotion. In this paper, we address both these challenges. We build a methodologyto semi-automatically collect a dataset of triplets in the form of (i) a sourcemotion, (ii) a target motion, and (iii) an edit text, and create the newMotionFix dataset. Having access to such data allows us to train a conditionaldiffusion model, TMED, that takes both the source motion and the edit text asinput. We further build various baselines trained only on text-motion pairsdatasets, and show superior performance of our model trained on triplets. Weintroduce new retrieval-based metrics for motion editing and establish a newbenchmark on the evaluation set of MotionFix. Our results are encouraging,paving the way for further research on finegrained motion generation. Code andmodels will be made publicly available.
本文的重点是三维运动编辑。给定三维人体动作和所需修改的文字描述,我们的目标是生成文字描述的编辑动作。我们面临的挑战包括缺乏训练数据,以及如何设计一个能忠实编辑源动作的模型。在本文中,我们将解决这两个难题。我们建立了一种半自动收集三元组数据集的方法:(i) 源动作、(ii) 目标动作和 (iii) 编辑文本,并创建新的动作修复数据集。有了这些数据,我们就可以训练一个条件扩散模型 TMED,该模型将源运动和编辑文本作为输入。我们进一步建立了仅在文本-动作对数据集上训练的各种基线,并展示了我们在三元组上训练的模型的卓越性能。我们为动作编辑引入了新的基于检索的指标,并在 MotionFix 的评估集上建立了新的基准。我们的结果令人鼓舞,为进一步研究精细运动生成铺平了道路。代码和模型将公开发布。
{"title":"MotionFix: Text-Driven 3D Human Motion Editing","authors":"Nikos Athanasiou, Alpár Ceske, Markos Diomataris, Michael J. Black, Gül Varol","doi":"arxiv-2408.00712","DOIUrl":"https://doi.org/arxiv-2408.00712","url":null,"abstract":"The focus of this paper is 3D motion editing. Given a 3D human motion and a\u0000textual description of the desired modification, our goal is to generate an\u0000edited motion as described by the text. The challenges include the lack of\u0000training data and the design of a model that faithfully edits the source\u0000motion. In this paper, we address both these challenges. We build a methodology\u0000to semi-automatically collect a dataset of triplets in the form of (i) a source\u0000motion, (ii) a target motion, and (iii) an edit text, and create the new\u0000MotionFix dataset. Having access to such data allows us to train a conditional\u0000diffusion model, TMED, that takes both the source motion and the edit text as\u0000input. We further build various baselines trained only on text-motion pairs\u0000datasets, and show superior performance of our model trained on triplets. We\u0000introduce new retrieval-based metrics for motion editing and establish a new\u0000benchmark on the evaluation set of MotionFix. Our results are encouraging,\u0000paving the way for further research on finegrained motion generation. Code and\u0000models will be made publicly available.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141885797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement SF3D:利用 UV 解包和照明解缠实现稳定的快速 3D 网格重构
Pub Date : 2024-08-01 DOI: arxiv-2408.00653
Mark Boss, Zixuan Huang, Aaryaman Vasishta, Varun Jampani
We present SF3D, a novel method for rapid and high-quality textured objectmesh reconstruction from a single image in just 0.5 seconds. Unlike mostexisting approaches, SF3D is explicitly trained for mesh generation,incorporating a fast UV unwrapping technique that enables swift texturegeneration rather than relying on vertex colors. The method also learns topredict material parameters and normal maps to enhance the visual quality ofthe reconstructed 3D meshes. Furthermore, SF3D integrates a delighting step toeffectively remove low-frequency illumination effects, ensuring that thereconstructed meshes can be easily used in novel illumination conditions.Experiments demonstrate the superior performance of SF3D over the existingtechniques. Project page: https://stable-fast-3d.github.io
我们介绍的 SF3D 是一种新型方法,可在 0.5 秒内从单张图像快速、高质量地重建物体纹理网格。与大多数现有方法不同的是,SF3D 在生成网格时进行了明确的训练,并结合了快速 UV 解包技术,从而能够快速生成纹理,而不是依赖顶点颜色。该方法还能学习预测材料参数和法线贴图,以提高重建三维网格的视觉质量。此外,SF3D 还集成了一个愉悦步骤,以有效消除低频光照效应,确保所构建的网格能在新的光照条件下轻松使用。项目页面: https://stable-fast-3d.github.io
{"title":"SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement","authors":"Mark Boss, Zixuan Huang, Aaryaman Vasishta, Varun Jampani","doi":"arxiv-2408.00653","DOIUrl":"https://doi.org/arxiv-2408.00653","url":null,"abstract":"We present SF3D, a novel method for rapid and high-quality textured object\u0000mesh reconstruction from a single image in just 0.5 seconds. Unlike most\u0000existing approaches, SF3D is explicitly trained for mesh generation,\u0000incorporating a fast UV unwrapping technique that enables swift texture\u0000generation rather than relying on vertex colors. The method also learns to\u0000predict material parameters and normal maps to enhance the visual quality of\u0000the reconstructed 3D meshes. Furthermore, SF3D integrates a delighting step to\u0000effectively remove low-frequency illumination effects, ensuring that the\u0000reconstructed meshes can be easily used in novel illumination conditions.\u0000Experiments demonstrate the superior performance of SF3D over the existing\u0000techniques. Project page: https://stable-fast-3d.github.io","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141885801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Octahedral Field: Octahedral prior for simultaneous smoothing and sharp edge regularization 神经八面体场:同时进行平滑和锐边正则化的八面体先验
Pub Date : 2024-08-01 DOI: arxiv-2408.00303
Ruichen Zheng, Tao Yu
Neural implicit representation, the parameterization of distance function asa coordinate neural field, has emerged as a promising lead in tackling surfacereconstruction from unoriented point clouds. To enforce consistent orientation,existing methods focus on regularizing the gradient of the distance function,such as constraining it to be of the unit norm, minimizing its divergence, oraligning it with the eigenvector of Hessian that corresponds to zeroeigenvalue. However, under the presence of large scanning noise, they tend toeither overfit the noise input or produce an excessively smooth reconstruction.In this work, we propose to guide the surface reconstruction under a newvariant of neural field, the octahedral field, leveraging the sphericalharmonics representation of octahedral frames originated in the hexahedralmeshing. Such field automatically snaps to geometry features when constrainedto be smooth, and naturally preserves sharp angles when interpolated overcreases. By simultaneously fitting and smoothing the octahedral field alongsidethe implicit geometry, it behaves analogously to bilateral filtering, resultingin smooth reconstruction while preserving sharp edges. Despite being operatedpurely pointwise, our method outperforms various traditional and neuralapproaches across extensive experiments, and is very competitive with methodsthat require normal and data priors. Our full implementation is available at:https://github.com/Ankbzpx/frame-field.
神经隐式表示是将距离函数参数化为坐标神经场的一种方法,已成为解决无方向点云表面重建问题的一种有前途的方法。为了实现一致的定向,现有的方法主要是对距离函数的梯度进行正则化处理,如约束其为单位法、最小化其发散、使其与零特征值对应的 Hessian 特征向量对齐等。在这项工作中,我们提出了一种新的神经场变量--八面体场,利用源自六面体网格的八面体框架的球面谐波表示来指导曲面重建。这种场在受到平滑约束时会自动捕捉几何特征,并在插值过度增加时自然保留锐角。通过同时拟合和平滑隐含几何体的八面体场,它的作用类似于双边滤波,从而在保留锐角的同时实现平滑重建。尽管我们的方法纯粹是点式操作,但在大量实验中,我们的方法优于各种传统方法和神经方法,与需要正态和数据先验的方法相比,我们的方法极具竞争力。我们的完整实现可在以下网址获得:https://github.com/Ankbzpx/frame-field。
{"title":"Neural Octahedral Field: Octahedral prior for simultaneous smoothing and sharp edge regularization","authors":"Ruichen Zheng, Tao Yu","doi":"arxiv-2408.00303","DOIUrl":"https://doi.org/arxiv-2408.00303","url":null,"abstract":"Neural implicit representation, the parameterization of distance function as\u0000a coordinate neural field, has emerged as a promising lead in tackling surface\u0000reconstruction from unoriented point clouds. To enforce consistent orientation,\u0000existing methods focus on regularizing the gradient of the distance function,\u0000such as constraining it to be of the unit norm, minimizing its divergence, or\u0000aligning it with the eigenvector of Hessian that corresponds to zero\u0000eigenvalue. However, under the presence of large scanning noise, they tend to\u0000either overfit the noise input or produce an excessively smooth reconstruction.\u0000In this work, we propose to guide the surface reconstruction under a new\u0000variant of neural field, the octahedral field, leveraging the spherical\u0000harmonics representation of octahedral frames originated in the hexahedral\u0000meshing. Such field automatically snaps to geometry features when constrained\u0000to be smooth, and naturally preserves sharp angles when interpolated over\u0000creases. By simultaneously fitting and smoothing the octahedral field alongside\u0000the implicit geometry, it behaves analogously to bilateral filtering, resulting\u0000in smooth reconstruction while preserving sharp edges. Despite being operated\u0000purely pointwise, our method outperforms various traditional and neural\u0000approaches across extensive experiments, and is very competitive with methods\u0000that require normal and data priors. Our full implementation is available at:\u0000https://github.com/Ankbzpx/frame-field.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141885804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion 重现一切:利用运动文本反转实现语义视频运动转移
Pub Date : 2024-08-01 DOI: arxiv-2408.00458
Manuel Kansy, Jacek Naruniec, Christopher Schroers, Markus Gross, Romann M. Weber
Recent years have seen a tremendous improvement in the quality of videogeneration and editing approaches. While several techniques focus on editingappearance, few address motion. Current approaches using text, trajectories, orbounding boxes are limited to simple motions, so we specify motions with asingle motion reference video instead. We further propose to use a pre-trainedimage-to-video model rather than a text-to-video model. This approach allows usto preserve the exact appearance and position of a target object or scene andhelps disentangle appearance from motion. Our method, called motion-textualinversion, leverages our observation that image-to-video models extractappearance mainly from the (latent) image input, while the text/image embeddinginjected via cross-attention predominantly controls motion. We thus representmotion using text/image embedding tokens. By operating on an inflatedmotion-text embedding containing multiple text/image embedding tokens perframe, we achieve a high temporal motion granularity. Once optimized on themotion reference video, this embedding can be applied to various target imagesto generate videos with semantically similar motions. Our approach does notrequire spatial alignment between the motion reference video and target image,generalizes across various domains, and can be applied to various tasks such asfull-body and face reenactment, as well as controlling the motion of inanimateobjects and the camera. We empirically demonstrate the effectiveness of ourmethod in the semantic video motion transfer task, significantly outperformingexisting methods in this context.
近年来,视频生成和编辑方法的质量有了极大的提高。虽然有几种技术侧重于编辑外观,但很少有技术能解决运动问题。目前使用文本、轨迹或边框的方法仅限于简单的运动,因此我们使用单个运动参考视频来指定运动。我们还建议使用预先训练好的 "图像到视频 "模型,而不是 "文本到视频 "模型。这种方法可以保留目标对象或场景的准确外观和位置,并有助于将外观与运动区分开来。我们的方法被称为 "运动-文本转换",它利用了我们的观察结果,即图像-视频模型主要从(潜在)图像输入中提取外观,而通过交叉注意注入的文本/图像嵌入则主要控制运动。因此,我们使用文本/图像嵌入标记来表示运动。通过对每帧包含多个文本/图像嵌入标记的膨胀运动文本嵌入进行操作,我们实现了较高的时间运动粒度。在运动参考视频上进行优化后,这种嵌入可应用于各种目标图像,生成具有语义相似运动的视频。我们的方法不要求运动参考视频和目标图像之间的空间对齐,可通用于各种领域,并可应用于各种任务,如全身和面部再现,以及控制无生命物体和摄像机的运动。我们通过实证证明了我们的方法在语义视频运动转移任务中的有效性,在这方面明显优于现有的方法。
{"title":"Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion","authors":"Manuel Kansy, Jacek Naruniec, Christopher Schroers, Markus Gross, Romann M. Weber","doi":"arxiv-2408.00458","DOIUrl":"https://doi.org/arxiv-2408.00458","url":null,"abstract":"Recent years have seen a tremendous improvement in the quality of video\u0000generation and editing approaches. While several techniques focus on editing\u0000appearance, few address motion. Current approaches using text, trajectories, or\u0000bounding boxes are limited to simple motions, so we specify motions with a\u0000single motion reference video instead. We further propose to use a pre-trained\u0000image-to-video model rather than a text-to-video model. This approach allows us\u0000to preserve the exact appearance and position of a target object or scene and\u0000helps disentangle appearance from motion. Our method, called motion-textual\u0000inversion, leverages our observation that image-to-video models extract\u0000appearance mainly from the (latent) image input, while the text/image embedding\u0000injected via cross-attention predominantly controls motion. We thus represent\u0000motion using text/image embedding tokens. By operating on an inflated\u0000motion-text embedding containing multiple text/image embedding tokens per\u0000frame, we achieve a high temporal motion granularity. Once optimized on the\u0000motion reference video, this embedding can be applied to various target images\u0000to generate videos with semantically similar motions. Our approach does not\u0000require spatial alignment between the motion reference video and target image,\u0000generalizes across various domains, and can be applied to various tasks such as\u0000full-body and face reenactment, as well as controlling the motion of inanimate\u0000objects and the camera. We empirically demonstrate the effectiveness of our\u0000method in the semantic video motion transfer task, significantly outperforming\u0000existing methods in this context.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141885798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StyleRF-VolVis: Style Transfer of Neural Radiance Fields for Expressive Volume Visualization StyleRF-VolVis:神经辐射场的风格转移,实现富有表现力的体量可视化
Pub Date : 2024-07-31 DOI: arxiv-2408.00150
Kaiyuan Tang, Chaoli Wang
In volume visualization, visualization synthesis has attracted much attentiondue to its ability to generate novel visualizations without following theconventional rendering pipeline. However, existing solutions based ongenerative adversarial networks often require many training images and takesignificant training time. Still, issues such as low quality, consistency, andflexibility persist. This paper introduces StyleRF-VolVis, an innovative styletransfer framework for expressive volume visualization (VolVis) via neuralradiance field (NeRF). The expressiveness of StyleRF-VolVis is upheld by itsability to accurately separate the underlying scene geometry (i.e., content)and color appearance (i.e., style), conveniently modify color, opacity, andlighting of the original rendering while maintaining visual content consistencyacross the views, and effectively transfer arbitrary styles from referenceimages to the reconstructed 3D scene. To achieve these, we design a base NeRFmodel for scene geometry extraction, a palette color network to classifyregions of the radiance field for photorealistic editing, and an unrestrictedcolor network to lift the color palette constraint via knowledge distillationfor non-photorealistic editing. We demonstrate the superior quality,consistency, and flexibility of StyleRF-VolVis by experimenting with variousvolume rendering scenes and reference images and comparing StyleRF-VolVisagainst other image-based (AdaIN), video-based (ReReVST), and NeRF-based (ARFand SNeRF) style rendering solutions.
在体量可视化领域,可视化合成因其无需遵循传统渲染管道即可生成新颖可视化效果而备受关注。然而,现有的基于生成对抗网络的解决方案往往需要许多训练图像,并耗费大量的训练时间。然而,低质量、一致性和灵活性等问题依然存在。本文介绍了 StyleRF-VolVis,这是一种通过神经辐射场(NeRF)实现富有表现力的体积可视化(VolVis)的创新风格转换框架。StyleRF-VolVis 的表现力体现在以下几个方面:准确分离底层场景几何图形(即内容)和色彩外观(即风格);在保持各视图视觉内容一致性的同时方便地修改原始渲染的色彩、不透明度和照明;以及有效地将任意风格从参考图像转移到重建的三维场景。为了实现这些目标,我们设计了用于场景几何提取的基本 NeRF 模型、用于逼真编辑的辐射场区域分类的调色板颜色网络,以及用于非逼真编辑的通过知识提炼解除调色板限制的无限制颜色网络。我们通过对各种体积渲染场景和参考图像进行实验,并将 StyleRF-VolVis 与其他基于图像(AdaIN)、基于视频(ReReVST)和基于 NeRF(ARF 和 SNeRF)的样式渲染解决方案进行比较,证明了 StyleRF-VolVis 的卓越质量、一致性和灵活性。
{"title":"StyleRF-VolVis: Style Transfer of Neural Radiance Fields for Expressive Volume Visualization","authors":"Kaiyuan Tang, Chaoli Wang","doi":"arxiv-2408.00150","DOIUrl":"https://doi.org/arxiv-2408.00150","url":null,"abstract":"In volume visualization, visualization synthesis has attracted much attention\u0000due to its ability to generate novel visualizations without following the\u0000conventional rendering pipeline. However, existing solutions based on\u0000generative adversarial networks often require many training images and take\u0000significant training time. Still, issues such as low quality, consistency, and\u0000flexibility persist. This paper introduces StyleRF-VolVis, an innovative style\u0000transfer framework for expressive volume visualization (VolVis) via neural\u0000radiance field (NeRF). The expressiveness of StyleRF-VolVis is upheld by its\u0000ability to accurately separate the underlying scene geometry (i.e., content)\u0000and color appearance (i.e., style), conveniently modify color, opacity, and\u0000lighting of the original rendering while maintaining visual content consistency\u0000across the views, and effectively transfer arbitrary styles from reference\u0000images to the reconstructed 3D scene. To achieve these, we design a base NeRF\u0000model for scene geometry extraction, a palette color network to classify\u0000regions of the radiance field for photorealistic editing, and an unrestricted\u0000color network to lift the color palette constraint via knowledge distillation\u0000for non-photorealistic editing. We demonstrate the superior quality,\u0000consistency, and flexibility of StyleRF-VolVis by experimenting with various\u0000volume rendering scenes and reference images and comparing StyleRF-VolVis\u0000against other image-based (AdaIN), video-based (ReReVST), and NeRF-based (ARF\u0000and SNeRF) style rendering solutions.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141885795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seamless Parametrization in Penner Coordinates 彭纳坐标中的无缝参数化
Pub Date : 2024-07-31 DOI: arxiv-2407.21342
Ryan Capouellez, Denis Zorin
We introduce a conceptually simple and efficient algorithm for seamlessparametrization, a key element in constructing quad layouts and texture chartson surfaces. More specifically, we consider the construction ofparametrizations with prescribed holonomy signatures i.e., a set of angles atsingularities, and rotations along homology loops, preserving which isessential for constructing parametrizations following an input field, as wellas for user control of the parametrization structure. Our algorithm performsexceptionally well on a large dataset based on Thingi10k [Zhou and Jacobson2016], (16156 meshes) as well as on a challenging smaller dataset of [Myles etal. 2014], converging, on average, in 9 iterations. Although the algorithmlacks a formal mathematical guarantee, presented empirical evidence and theconnections between convex optimization and closely related algorithms, suggestthat a similar formulation can be found for this algorithm in the future.
我们介绍了一种概念简单、效率高的无缝参数化算法,它是构建曲面四维布局和纹理图的关键要素。更具体地说,我们考虑的是构建具有规定整体性特征的参数化,即一组角度、奇偶性和沿同源环路的旋转,保留这些特征对于按照输入字段构建参数化以及用户控制参数化结构至关重要。我们的算法在基于 Thingi10k [Zhou and Jacobson 2016]的大型数据集(16156 个网格)以及[Myles etal. 2014]的具有挑战性的较小数据集上表现异常出色,平均在 9 次迭代中收敛。虽然该算法缺乏正式的数学保证,但提出的经验证据以及凸优化和密切相关算法之间的联系表明,未来可以为该算法找到类似的表述。
{"title":"Seamless Parametrization in Penner Coordinates","authors":"Ryan Capouellez, Denis Zorin","doi":"arxiv-2407.21342","DOIUrl":"https://doi.org/arxiv-2407.21342","url":null,"abstract":"We introduce a conceptually simple and efficient algorithm for seamless\u0000parametrization, a key element in constructing quad layouts and texture charts\u0000on surfaces. More specifically, we consider the construction of\u0000parametrizations with prescribed holonomy signatures i.e., a set of angles at\u0000singularities, and rotations along homology loops, preserving which is\u0000essential for constructing parametrizations following an input field, as well\u0000as for user control of the parametrization structure. Our algorithm performs\u0000exceptionally well on a large dataset based on Thingi10k [Zhou and Jacobson\u00002016], (16156 meshes) as well as on a challenging smaller dataset of [Myles et\u0000al. 2014], converging, on average, in 9 iterations. Although the algorithm\u0000lacks a formal mathematical guarantee, presented empirical evidence and the\u0000connections between convex optimization and closely related algorithms, suggest\u0000that a similar formulation can be found for this algorithm in the future.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141863841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deformable 3D Shape Diffusion Model 可变形三维形状扩散模型
Pub Date : 2024-07-31 DOI: arxiv-2407.21428
Dengsheng Chen, Jie Hu, Xiaoming Wei, Enhua Wu
The Gaussian diffusion model, initially designed for image generation, hasrecently been adapted for 3D point cloud generation. However, these adaptationshave not fully considered the intrinsic geometric characteristics of 3D shapes,thereby constraining the diffusion model's potential for 3D shape manipulation.To address this limitation, we introduce a novel deformable 3D shape diffusionmodel that facilitates comprehensive 3D shape manipulation, including pointcloud generation, mesh deformation, and facial animation. Our approachinnovatively incorporates a differential deformation kernel, which deconstructsthe generation of geometric structures into successive non-rigid deformationstages. By leveraging a probabilistic diffusion model to simulate thisstep-by-step process, our method provides a versatile and efficient solutionfor a wide range of applications, spanning from graphics rendering to facialexpression animation. Empirical evidence highlights the effectiveness of ourapproach, demonstrating state-of-the-art performance in point cloud generationand competitive results in mesh deformation. Additionally, extensive visualdemonstrations reveal the significant potential of our approach for practicalapplications. Our method presents a unique pathway for advancing 3D shapemanipulation and unlocking new opportunities in the realm of virtual reality.
高斯扩散模型最初是为生成图像而设计的,最近已被用于生成三维点云。为了解决这一局限性,我们引入了一种新型的可变形三维形状扩散模型,该模型有助于进行全面的三维形状操作,包括点云生成、网格变形和面部动画。我们的方法创新性地加入了微分变形内核,将几何结构的生成分解为连续的非刚性变形阶段。通过利用概率扩散模型模拟这一逐级过程,我们的方法为从图形渲染到面部表情动画等广泛应用提供了多功能、高效的解决方案。经验证据凸显了我们方法的有效性,在点云生成方面展示了最先进的性能,在网格变形方面也取得了具有竞争力的结果。此外,大量的可视化演示揭示了我们的方法在实际应用中的巨大潜力。我们的方法为推进三维形状操纵和开启虚拟现实领域的新机遇提供了一条独特的途径。
{"title":"Deformable 3D Shape Diffusion Model","authors":"Dengsheng Chen, Jie Hu, Xiaoming Wei, Enhua Wu","doi":"arxiv-2407.21428","DOIUrl":"https://doi.org/arxiv-2407.21428","url":null,"abstract":"The Gaussian diffusion model, initially designed for image generation, has\u0000recently been adapted for 3D point cloud generation. However, these adaptations\u0000have not fully considered the intrinsic geometric characteristics of 3D shapes,\u0000thereby constraining the diffusion model's potential for 3D shape manipulation.\u0000To address this limitation, we introduce a novel deformable 3D shape diffusion\u0000model that facilitates comprehensive 3D shape manipulation, including point\u0000cloud generation, mesh deformation, and facial animation. Our approach\u0000innovatively incorporates a differential deformation kernel, which deconstructs\u0000the generation of geometric structures into successive non-rigid deformation\u0000stages. By leveraging a probabilistic diffusion model to simulate this\u0000step-by-step process, our method provides a versatile and efficient solution\u0000for a wide range of applications, spanning from graphics rendering to facial\u0000expression animation. Empirical evidence highlights the effectiveness of our\u0000approach, demonstrating state-of-the-art performance in point cloud generation\u0000and competitive results in mesh deformation. Additionally, extensive visual\u0000demonstrations reveal the significant potential of our approach for practical\u0000applications. Our method presents a unique pathway for advancing 3D shape\u0000manipulation and unlocking new opportunities in the realm of virtual reality.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141863854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Graphics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1