Boxiang Rong, Artur Grigorev, Wenbo Wang, Michael J. Black, Bernhard Thomaszewski, Christina Tsalicoglou, Otmar Hilliges
We introduce Gaussian Garments, a novel approach for reconstructing realistic simulation-ready garment assets from multi-view videos. Our method represents garments with a combination of a 3D mesh and a Gaussian texture that encodes both the color and high-frequency surface details. This representation enables accurate registration of garment geometries to multi-view videos and helps disentangle albedo textures from lighting effects. Furthermore, we demonstrate how a pre-trained graph neural network (GNN) can be fine-tuned to replicate the real behavior of each garment. The reconstructed Gaussian Garments can be automatically combined into multi-garment outfits and animated with the fine-tuned GNN.
{"title":"Gaussian Garments: Reconstructing Simulation-Ready Clothing with Photorealistic Appearance from Multi-View Video","authors":"Boxiang Rong, Artur Grigorev, Wenbo Wang, Michael J. Black, Bernhard Thomaszewski, Christina Tsalicoglou, Otmar Hilliges","doi":"arxiv-2409.08189","DOIUrl":"https://doi.org/arxiv-2409.08189","url":null,"abstract":"We introduce Gaussian Garments, a novel approach for reconstructing realistic\u0000simulation-ready garment assets from multi-view videos. Our method represents\u0000garments with a combination of a 3D mesh and a Gaussian texture that encodes\u0000both the color and high-frequency surface details. This representation enables\u0000accurate registration of garment geometries to multi-view videos and helps\u0000disentangle albedo textures from lighting effects. Furthermore, we demonstrate\u0000how a pre-trained graph neural network (GNN) can be fine-tuned to replicate the\u0000real behavior of each garment. The reconstructed Gaussian Garments can be\u0000automatically combined into multi-garment outfits and animated with the\u0000fine-tuned GNN.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel framework for converting 2D videos to immersive stereoscopic 3D, addressing the growing demand for 3D content in immersive experience. Leveraging foundation models as priors, our approach overcomes the limitations of traditional methods and boosts the performance to ensure the high-fidelity generation required by the display devices. The proposed system consists of two main steps: depth-based video splatting for warping and extracting occlusion mask, and stereo video inpainting. We utilize pre-trained stable video diffusion as the backbone and introduce a fine-tuning protocol for the stereo video inpainting task. To handle input video with varying lengths and resolutions, we explore auto-regressive strategies and tiled processing. Finally, a sophisticated data processing pipeline has been developed to reconstruct a large-scale and high-quality dataset to support our training. Our framework demonstrates significant improvements in 2D-to-3D video conversion, offering a practical solution for creating immersive content for 3D devices like Apple Vision Pro and 3D displays. In summary, this work contributes to the field by presenting an effective method for generating high-quality stereoscopic videos from monocular input, potentially transforming how we experience digital media.
本文提出了一种将 2D 视频转换为沉浸式立体 3D 的新型框架,以满足沉浸式体验对 3D 内容日益增长的需求。利用基础模型作为先验,我们的方法克服了传统方法的局限性,并提高了性能,以确保显示设备所需的高保真生成。我们提出的系统包括两个主要步骤:基于深度的视频拼接(用于扭曲和提取遮挡)和立体视频内绘。我们利用预训练的稳定视频扩散作为骨干,并为立体视频绘制任务引入了微调协议。为了处理不同长度和分辨率的输入视频,我们探索了自动回归策略和平铺处理方法。最后,我们开发了一个复杂的数据处理管道,以重建一个大规模、高质量的数据集来支持我们的训练。我们的框架在 2D 到 3D 视频转换方面取得了重大改进,为苹果 Vision Pro 等 3D 设备和 3D 显示器创建身临其境的内容提供了实用的解决方案。总之,这项工作提出了一种从单眼输入生成高质量立体视频的有效方法,可能会改变我们体验数字媒体的方式,从而为该领域做出贡献。
{"title":"StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos","authors":"Sijie Zhao, Wenbo Hu, Xiaodong Cun, Yong Zhang, Xiaoyu Li, Zhe Kong, Xiangjun Gao, Muyao Niu, Ying Shan","doi":"arxiv-2409.07447","DOIUrl":"https://doi.org/arxiv-2409.07447","url":null,"abstract":"This paper presents a novel framework for converting 2D videos to immersive\u0000stereoscopic 3D, addressing the growing demand for 3D content in immersive\u0000experience. Leveraging foundation models as priors, our approach overcomes the\u0000limitations of traditional methods and boosts the performance to ensure the\u0000high-fidelity generation required by the display devices. The proposed system\u0000consists of two main steps: depth-based video splatting for warping and\u0000extracting occlusion mask, and stereo video inpainting. We utilize pre-trained\u0000stable video diffusion as the backbone and introduce a fine-tuning protocol for\u0000the stereo video inpainting task. To handle input video with varying lengths\u0000and resolutions, we explore auto-regressive strategies and tiled processing.\u0000Finally, a sophisticated data processing pipeline has been developed to\u0000reconstruct a large-scale and high-quality dataset to support our training. Our\u0000framework demonstrates significant improvements in 2D-to-3D video conversion,\u0000offering a practical solution for creating immersive content for 3D devices\u0000like Apple Vision Pro and 3D displays. In summary, this work contributes to the\u0000field by presenting an effective method for generating high-quality\u0000stereoscopic videos from monocular input, potentially transforming how we\u0000experience digital media.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dafei Qin, Hongyang Lin, Qixuan Zhang, Kaichun Qiao, Longwen Zhang, Zijun Zhao, Jun Saito, Jingyi Yu, Lan Xu, Taku Komura
We propose GauFace, a novel Gaussian Splatting representation, tailored for efficient animation and rendering of physically-based facial assets. Leveraging strong geometric priors and constrained optimization, GauFace ensures a neat and structured Gaussian representation, delivering high fidelity and real-time facial interaction of 30fps@1440p on a Snapdragon 8 Gen 2 mobile platform. Then, we introduce TransGS, a diffusion transformer that instantly translates physically-based facial assets into the corresponding GauFace representations. Specifically, we adopt a patch-based pipeline to handle the vast number of Gaussians effectively. We also introduce a novel pixel-aligned sampling scheme with UV positional encoding to ensure the throughput and rendering quality of GauFace assets generated by our TransGS. Once trained, TransGS can instantly translate facial assets with lighting conditions to GauFace representation, With the rich conditioning modalities, it also enables editing and animation capabilities reminiscent of traditional CG pipelines. We conduct extensive evaluations and user studies, compared to traditional offline and online renderers, as well as recent neural rendering methods, which demonstrate the superior performance of our approach for facial asset rendering. We also showcase diverse immersive applications of facial assets using our TransGS approach and GauFace representation, across various platforms like PCs, phones and even VR headsets.
{"title":"Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering","authors":"Dafei Qin, Hongyang Lin, Qixuan Zhang, Kaichun Qiao, Longwen Zhang, Zijun Zhao, Jun Saito, Jingyi Yu, Lan Xu, Taku Komura","doi":"arxiv-2409.07441","DOIUrl":"https://doi.org/arxiv-2409.07441","url":null,"abstract":"We propose GauFace, a novel Gaussian Splatting representation, tailored for\u0000efficient animation and rendering of physically-based facial assets. Leveraging\u0000strong geometric priors and constrained optimization, GauFace ensures a neat\u0000and structured Gaussian representation, delivering high fidelity and real-time\u0000facial interaction of 30fps@1440p on a Snapdragon 8 Gen 2 mobile platform. Then, we introduce TransGS, a diffusion transformer that instantly translates\u0000physically-based facial assets into the corresponding GauFace representations.\u0000Specifically, we adopt a patch-based pipeline to handle the vast number of\u0000Gaussians effectively. We also introduce a novel pixel-aligned sampling scheme\u0000with UV positional encoding to ensure the throughput and rendering quality of\u0000GauFace assets generated by our TransGS. Once trained, TransGS can instantly\u0000translate facial assets with lighting conditions to GauFace representation,\u0000With the rich conditioning modalities, it also enables editing and animation\u0000capabilities reminiscent of traditional CG pipelines. We conduct extensive evaluations and user studies, compared to traditional\u0000offline and online renderers, as well as recent neural rendering methods, which\u0000demonstrate the superior performance of our approach for facial asset\u0000rendering. We also showcase diverse immersive applications of facial assets\u0000using our TransGS approach and GauFace representation, across various platforms\u0000like PCs, phones and even VR headsets.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Phu Pham, Aradhya N. Mathur, Ojaswa Sharma, Aniket Bera
The field of text-to-3D content generation has made significant progress in generating realistic 3D objects, with existing methodologies like Score Distillation Sampling (SDS) offering promising guidance. However, these methods often encounter the "Janus" problem-multi-face ambiguities due to imprecise guidance. Additionally, while recent advancements in 3D gaussian splitting have shown its efficacy in representing 3D volumes, optimization of this representation remains largely unexplored. This paper introduces a unified framework for text-to-3D content generation that addresses these critical gaps. Our approach utilizes multi-view guidance to iteratively form the structure of the 3D model, progressively enhancing detail and accuracy. We also introduce a novel densification algorithm that aligns gaussians close to the surface, optimizing the structural integrity and fidelity of the generated models. Extensive experiments validate our approach, demonstrating that it produces high-quality visual outputs with minimal time cost. Notably, our method achieves high-quality results within half an hour of training, offering a substantial efficiency gain over most existing methods, which require hours of training time to achieve comparable results.
{"title":"MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification","authors":"Phu Pham, Aradhya N. Mathur, Ojaswa Sharma, Aniket Bera","doi":"arxiv-2409.06620","DOIUrl":"https://doi.org/arxiv-2409.06620","url":null,"abstract":"The field of text-to-3D content generation has made significant progress in\u0000generating realistic 3D objects, with existing methodologies like Score\u0000Distillation Sampling (SDS) offering promising guidance. However, these methods\u0000often encounter the \"Janus\" problem-multi-face ambiguities due to imprecise\u0000guidance. Additionally, while recent advancements in 3D gaussian splitting have\u0000shown its efficacy in representing 3D volumes, optimization of this\u0000representation remains largely unexplored. This paper introduces a unified\u0000framework for text-to-3D content generation that addresses these critical gaps.\u0000Our approach utilizes multi-view guidance to iteratively form the structure of\u0000the 3D model, progressively enhancing detail and accuracy. We also introduce a\u0000novel densification algorithm that aligns gaussians close to the surface,\u0000optimizing the structural integrity and fidelity of the generated models.\u0000Extensive experiments validate our approach, demonstrating that it produces\u0000high-quality visual outputs with minimal time cost. Notably, our method\u0000achieves high-quality results within half an hour of training, offering a\u0000substantial efficiency gain over most existing methods, which require hours of\u0000training time to achieve comparable results.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zixuan Li, Pengfei Shen, Hanxiao Sun, Zibo Zhang, Yu Guo, Ligang Liu, Ling-Qi Yan, Steve Marschner, Milos Hasan, Beibei Wang
Accurately rendering the appearance of fabrics is challenging, due to their complex 3D microstructures and specialized optical properties. If we model the geometry and optics of fabrics down to the fiber level, we can achieve unprecedented rendering realism, but this raises the difficulty of authoring or capturing the fiber-level assets. Existing approaches can obtain fiber-level geometry with special devices (e.g., CT) or complex hand-designed procedural pipelines (manually tweaking a set of parameters). In this paper, we propose a unified framework to capture fiber-level geometry and appearance of woven fabrics using a single low-cost microscope image. We first use a simple neural network to predict initial parameters of our geometric and appearance models. From this starting point, we further optimize the parameters of procedural fiber geometry and an approximated shading model via differentiable rasterization to match the microscope photo more accurately. Finally, we refine the fiber appearance parameters via differentiable path tracing, converging to accurate fiber optical parameters, which are suitable for physically-based light simulations to produce high-quality rendered results. We believe that our method is the first to utilize differentiable rendering at the microscopic level, supporting physically-based scattering from explicit fiber assemblies. Our fabric parameter estimation achieves high-quality re-rendering of measured woven fabric samples in both distant and close-up views. These results can further be used for efficient rendering or converted to downstream representations. We also propose a patch-space fiber geometry procedural generation and a two-scale path tracing framework for efficient rendering of fabric scenes.
{"title":"Fiber-level Woven Fabric Capture from a Single Photo","authors":"Zixuan Li, Pengfei Shen, Hanxiao Sun, Zibo Zhang, Yu Guo, Ligang Liu, Ling-Qi Yan, Steve Marschner, Milos Hasan, Beibei Wang","doi":"arxiv-2409.06368","DOIUrl":"https://doi.org/arxiv-2409.06368","url":null,"abstract":"Accurately rendering the appearance of fabrics is challenging, due to their\u0000complex 3D microstructures and specialized optical properties. If we model the\u0000geometry and optics of fabrics down to the fiber level, we can achieve\u0000unprecedented rendering realism, but this raises the difficulty of authoring or\u0000capturing the fiber-level assets. Existing approaches can obtain fiber-level\u0000geometry with special devices (e.g., CT) or complex hand-designed procedural\u0000pipelines (manually tweaking a set of parameters). In this paper, we propose a\u0000unified framework to capture fiber-level geometry and appearance of woven\u0000fabrics using a single low-cost microscope image. We first use a simple neural\u0000network to predict initial parameters of our geometric and appearance models.\u0000From this starting point, we further optimize the parameters of procedural\u0000fiber geometry and an approximated shading model via differentiable\u0000rasterization to match the microscope photo more accurately. Finally, we refine\u0000the fiber appearance parameters via differentiable path tracing, converging to\u0000accurate fiber optical parameters, which are suitable for physically-based\u0000light simulations to produce high-quality rendered results. We believe that our\u0000method is the first to utilize differentiable rendering at the microscopic\u0000level, supporting physically-based scattering from explicit fiber assemblies.\u0000Our fabric parameter estimation achieves high-quality re-rendering of measured\u0000woven fabric samples in both distant and close-up views. These results can\u0000further be used for efficient rendering or converted to downstream\u0000representations. We also propose a patch-space fiber geometry procedural\u0000generation and a two-scale path tracing framework for efficient rendering of\u0000fabric scenes.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image vectorization is a process to convert a raster image into a scalable vector graphic format. Objective is to effectively remove the pixelization effect while representing boundaries of image by scaleable parameterized curves. We propose new image vectorization with depth which considers depth ordering among shapes and use curvature-based inpainting for convexifying shapes in vectorization process.From a given color quantized raster image, we first define each connected component of the same color as a shape layer, and construct depth ordering among them using a newly proposed depth ordering energy. Global depth ordering among all shapes is described by a directed graph, and we propose an energy to remove cycle within the graph. After constructing depth ordering of shapes, we convexify occluded regions by Euler's elastica curvature-based variational inpainting, and leverage on the stability of Modica-Mortola double-well potential energy to inpaint large regions. This is following human vision perception that boundaries of shapes extend smoothly, and we assume shapes are likely to be convex. Finally, we fit B'{e}zier curves to the boundaries and save vectorization as a SVG file which allows superposition of curvature-based inpainted shapes following the depth ordering. This is a new way to vectorize images, by decomposing an image into scalable shape layers with computed depth ordering. This approach makes editing shapes and images more natural and intuitive. We also consider grouping shape layers for semantic vectorization. We present various numerical results and comparisons against recent layer-based vectorization methods to validate the proposed model.
{"title":"Image Vectorization with Depth: convexified shape layers with depth ordering","authors":"Ho Law, Sung Ha Kang","doi":"arxiv-2409.06648","DOIUrl":"https://doi.org/arxiv-2409.06648","url":null,"abstract":"Image vectorization is a process to convert a raster image into a scalable\u0000vector graphic format. Objective is to effectively remove the pixelization\u0000effect while representing boundaries of image by scaleable parameterized\u0000curves. We propose new image vectorization with depth which considers depth\u0000ordering among shapes and use curvature-based inpainting for convexifying\u0000shapes in vectorization process.From a given color quantized raster image, we\u0000first define each connected component of the same color as a shape layer, and\u0000construct depth ordering among them using a newly proposed depth ordering\u0000energy. Global depth ordering among all shapes is described by a directed\u0000graph, and we propose an energy to remove cycle within the graph. After\u0000constructing depth ordering of shapes, we convexify occluded regions by Euler's\u0000elastica curvature-based variational inpainting, and leverage on the stability\u0000of Modica-Mortola double-well potential energy to inpaint large regions. This\u0000is following human vision perception that boundaries of shapes extend smoothly,\u0000and we assume shapes are likely to be convex. Finally, we fit B'{e}zier curves\u0000to the boundaries and save vectorization as a SVG file which allows\u0000superposition of curvature-based inpainted shapes following the depth ordering.\u0000This is a new way to vectorize images, by decomposing an image into scalable\u0000shape layers with computed depth ordering. This approach makes editing shapes\u0000and images more natural and intuitive. We also consider grouping shape layers\u0000for semantic vectorization. We present various numerical results and\u0000comparisons against recent layer-based vectorization methods to validate the\u0000proposed model.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Pang, Zhongtian Zheng, Yilong Li, Guoping Wang, Peng-Shuai Wang
The discrete Laplacian operator holds a crucial role in 3D geometry processing, yet it is still challenging to define it on point clouds. Previous works mainly focused on constructing a local triangulation around each point to approximate the underlying manifold for defining the Laplacian operator, which may not be robust or accurate. In contrast, we simply use the K-nearest neighbors (KNN) graph constructed from the input point cloud and learn the Laplacian operator on the KNN graph with graph neural networks (GNNs). However, the ground-truth Laplacian operator is defined on a manifold mesh with a different connectivity from the KNN graph and thus cannot be directly used for training. To train the GNN, we propose a novel training scheme by imitating the behavior of the ground-truth Laplacian operator on a set of probe functions so that the learned Laplacian operator behaves similarly to the ground-truth Laplacian operator. We train our network on a subset of ShapeNet and evaluate it across a variety of point clouds. Compared with previous methods, our method reduces the error by an order of magnitude and excels in handling sparse point clouds with thin structures or sharp features. Our method also demonstrates a strong generalization ability to unseen shapes. With our learned Laplacian operator, we further apply a series of Laplacian-based geometry processing algorithms directly to point clouds and achieve accurate results, enabling many exciting possibilities for geometry processing on point clouds. The code and trained models are available at https://github.com/IntelligentGeometry/NeLo.
{"title":"Neural Laplacian Operator for 3D Point Clouds","authors":"Bo Pang, Zhongtian Zheng, Yilong Li, Guoping Wang, Peng-Shuai Wang","doi":"arxiv-2409.06506","DOIUrl":"https://doi.org/arxiv-2409.06506","url":null,"abstract":"The discrete Laplacian operator holds a crucial role in 3D geometry\u0000processing, yet it is still challenging to define it on point clouds. Previous\u0000works mainly focused on constructing a local triangulation around each point to\u0000approximate the underlying manifold for defining the Laplacian operator, which\u0000may not be robust or accurate. In contrast, we simply use the K-nearest\u0000neighbors (KNN) graph constructed from the input point cloud and learn the\u0000Laplacian operator on the KNN graph with graph neural networks (GNNs). However,\u0000the ground-truth Laplacian operator is defined on a manifold mesh with a\u0000different connectivity from the KNN graph and thus cannot be directly used for\u0000training. To train the GNN, we propose a novel training scheme by imitating the\u0000behavior of the ground-truth Laplacian operator on a set of probe functions so\u0000that the learned Laplacian operator behaves similarly to the ground-truth\u0000Laplacian operator. We train our network on a subset of ShapeNet and evaluate\u0000it across a variety of point clouds. Compared with previous methods, our method\u0000reduces the error by an order of magnitude and excels in handling sparse point\u0000clouds with thin structures or sharp features. Our method also demonstrates a\u0000strong generalization ability to unseen shapes. With our learned Laplacian\u0000operator, we further apply a series of Laplacian-based geometry processing\u0000algorithms directly to point clouds and achieve accurate results, enabling many\u0000exciting possibilities for geometry processing on point clouds. The code and\u0000trained models are available at https://github.com/IntelligentGeometry/NeLo.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qimin Chen, Zhiqin Chen, Vladimir G. Kim, Noam Aigerman, Hao Zhang, Siddhartha Chaudhuri
We present a 3D modeling method which enables end-users to refine or detailize 3D shapes using machine learning, expanding the capabilities of AI-assisted 3D content creation. Given a coarse voxel shape (e.g., one produced with a simple box extrusion tool or via generative modeling), a user can directly "paint" desired target styles representing compelling geometric details, from input exemplar shapes, over different regions of the coarse shape. These regions are then up-sampled into high-resolution geometries which adhere with the painted styles. To achieve such controllable and localized 3D detailization, we build on top of a Pyramid GAN by making it masking-aware. We devise novel structural losses and priors to ensure that our method preserves both desired coarse structures and fine-grained features even if the painted styles are borrowed from diverse sources, e.g., different semantic parts and even different shape categories. Through extensive experiments, we show that our ability to localize details enables novel interactive creative workflows and applications. Our experiments further demonstrate that in comparison to prior techniques built on global detailization, our method generates structure-preserving, high-resolution stylized geometries with more coherent shape details and style transitions.
我们介绍了一种三维建模方法,它能让最终用户利用机器学习对三维形状进行细化,从而扩展人工智能辅助三维内容创建的能力。给定一个粗体素形状(例如,用简单的方框挤出工具或通过生成建模生成的形状),用户可以直接在粗形状的不同区域 "绘制 "所需的目标样式,这些样式来自输入的示例形状,代表了引人注目的几何细节。然后,这些区域会被上采样成高分辨率的几何图形,并与绘制的样式保持一致。为了实现这种可控的局部 3D 细节化,我们在金字塔 GAN 的基础上对其进行了遮罩感知。我们设计了新颖的结构损失和先验,以确保我们的方法既能保留所需的粗略结构,又能保留细粒度特征,即使绘制的样式来自不同的来源,例如不同的语义部分,甚至不同的形状类别。通过大量实验,我们证明了我们的细节定位能力能够实现新颖的交互式创意工作流程和应用。我们的实验进一步证明,与基于全局细节化的其他技术相比,我们的方法能生成保留结构的高分辨率风格化几何图形,而且形状细节和风格过渡更加连贯。
{"title":"DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement","authors":"Qimin Chen, Zhiqin Chen, Vladimir G. Kim, Noam Aigerman, Hao Zhang, Siddhartha Chaudhuri","doi":"arxiv-2409.06129","DOIUrl":"https://doi.org/arxiv-2409.06129","url":null,"abstract":"We present a 3D modeling method which enables end-users to refine or\u0000detailize 3D shapes using machine learning, expanding the capabilities of\u0000AI-assisted 3D content creation. Given a coarse voxel shape (e.g., one produced\u0000with a simple box extrusion tool or via generative modeling), a user can\u0000directly \"paint\" desired target styles representing compelling geometric\u0000details, from input exemplar shapes, over different regions of the coarse\u0000shape. These regions are then up-sampled into high-resolution geometries which\u0000adhere with the painted styles. To achieve such controllable and localized 3D\u0000detailization, we build on top of a Pyramid GAN by making it masking-aware. We\u0000devise novel structural losses and priors to ensure that our method preserves\u0000both desired coarse structures and fine-grained features even if the painted\u0000styles are borrowed from diverse sources, e.g., different semantic parts and\u0000even different shape categories. Through extensive experiments, we show that\u0000our ability to localize details enables novel interactive creative workflows\u0000and applications. Our experiments further demonstrate that in comparison to\u0000prior techniques built on global detailization, our method generates\u0000structure-preserving, high-resolution stylized geometries with more coherent\u0000shape details and style transitions.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farhan Rasheed, Abrar Naseer, Emma Nilsson, Talha Bin Masood, Ingrid Hotz
This paper presents a nested tracking framework for analyzing cycles in 2D force networks within granular materials. These materials are composed of interacting particles, whose interactions are described by a force network. Understanding the cycles within these networks at various scales and their evolution under external loads is crucial, as they significantly contribute to the mechanical and kinematic properties of the system. Our approach involves computing a cycle hierarchy by partitioning the 2D domain into segments bounded by cycles in the force network. We can adapt concepts from nested tracking graphs originally developed for merge trees by leveraging the duality between this partitioning and the cycles. We demonstrate the effectiveness of our method on two force networks derived from experiments with photoelastic disks.
{"title":"Multi-scale Cycle Tracking in Dynamic Planar Graphs","authors":"Farhan Rasheed, Abrar Naseer, Emma Nilsson, Talha Bin Masood, Ingrid Hotz","doi":"arxiv-2409.06476","DOIUrl":"https://doi.org/arxiv-2409.06476","url":null,"abstract":"This paper presents a nested tracking framework for analyzing cycles in 2D\u0000force networks within granular materials. These materials are composed of\u0000interacting particles, whose interactions are described by a force network.\u0000Understanding the cycles within these networks at various scales and their\u0000evolution under external loads is crucial, as they significantly contribute to\u0000the mechanical and kinematic properties of the system. Our approach involves\u0000computing a cycle hierarchy by partitioning the 2D domain into segments bounded\u0000by cycles in the force network. We can adapt concepts from nested tracking\u0000graphs originally developed for merge trees by leveraging the duality between\u0000this partitioning and the cycles. We demonstrate the effectiveness of our\u0000method on two force networks derived from experiments with photoelastic disks.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Longhao Zhang, Shuang Liang, Zhipeng Ge, Tianshu Hu
For audio-driven visual dubbing, it remains a considerable challenge to uphold and highlight speaker's persona while synthesizing accurate lip synchronization. Existing methods fall short of capturing speaker's unique speaking style or preserving facial details. In this paper, we present PersonaTalk, an attention-based two-stage framework, including geometry construction and face rendering, for high-fidelity and personalized visual dubbing. In the first stage, we propose a style-aware audio encoding module that injects speaking style into audio features through a cross-attention layer. The stylized audio features are then used to drive speaker's template geometry to obtain lip-synced geometries. In the second stage, a dual-attention face renderer is introduced to render textures for the target geometries. It consists of two parallel cross-attention layers, namely Lip-Attention and Face-Attention, which respectively sample textures from different reference frames to render the entire face. With our innovative design, intricate facial details can be well preserved. Comprehensive experiments and user studies demonstrate our advantages over other state-of-the-art methods in terms of visual quality, lip-sync accuracy and persona preservation. Furthermore, as a person-generic framework, PersonaTalk can achieve competitive performance as state-of-the-art person-specific methods. Project Page: https://grisoon.github.io/PersonaTalk/.
{"title":"PersonaTalk: Bring Attention to Your Persona in Visual Dubbing","authors":"Longhao Zhang, Shuang Liang, Zhipeng Ge, Tianshu Hu","doi":"arxiv-2409.05379","DOIUrl":"https://doi.org/arxiv-2409.05379","url":null,"abstract":"For audio-driven visual dubbing, it remains a considerable challenge to\u0000uphold and highlight speaker's persona while synthesizing accurate lip\u0000synchronization. Existing methods fall short of capturing speaker's unique\u0000speaking style or preserving facial details. In this paper, we present\u0000PersonaTalk, an attention-based two-stage framework, including geometry\u0000construction and face rendering, for high-fidelity and personalized visual\u0000dubbing. In the first stage, we propose a style-aware audio encoding module\u0000that injects speaking style into audio features through a cross-attention\u0000layer. The stylized audio features are then used to drive speaker's template\u0000geometry to obtain lip-synced geometries. In the second stage, a dual-attention\u0000face renderer is introduced to render textures for the target geometries. It\u0000consists of two parallel cross-attention layers, namely Lip-Attention and\u0000Face-Attention, which respectively sample textures from different reference\u0000frames to render the entire face. With our innovative design, intricate facial\u0000details can be well preserved. Comprehensive experiments and user studies\u0000demonstrate our advantages over other state-of-the-art methods in terms of\u0000visual quality, lip-sync accuracy and persona preservation. Furthermore, as a\u0000person-generic framework, PersonaTalk can achieve competitive performance as\u0000state-of-the-art person-specific methods. Project Page:\u0000https://grisoon.github.io/PersonaTalk/.","PeriodicalId":501174,"journal":{"name":"arXiv - CS - Graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142221827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}