首页 > 最新文献

ACM Transactions on Graphics (TOG)最新文献

英文 中文
MetaLayer: A Meta-Learned BSDF Model for Layered Materials MetaLayer:层状材料的元学习 BSDF 模型
Pub Date : 2023-12-04 DOI: 10.1145/3618365
Jie Guo, Zeru Li, Xueyan He, Beibei Wang, Wenbin Li, Yanwen Guo, Ling-Qi Yan
Reproducing the appearance of arbitrary layered materials has long been a critical challenge in computer graphics, with regard to the demanding requirements of both physical accuracy and low computation cost. Recent studies have demonstrated promising results by learning-based representations that implicitly encode the appearance of complex (layered) materials by neural networks. However, existing generally-learned models often struggle between strong representation ability and high runtime performance, and also lack physical parameters for material editing. To address these concerns, we introduce MetaLayer, a new methodology leveraging meta-learning for modeling and rendering layered materials. MetaLayer contains two networks: a BSDFNet that compactly encodes layered materials into implicit neural representations, and a MetaNet that establishes the mapping between the physical parameters of each material and the weights of its corresponding implicit neural representation. A new positional encoding method and a well-designed training strategy are employed to improve the performance and quality of the neural model. As a new learning-based representation, the proposed MetaLayer model provides both fast responses to material editing and high-quality results for a wide range of layered materials, outperforming existing layered BSDF models.
由于对物理精度和低计算成本的要求很高,重现任意层状材料的外观一直是计算机图形学中的一个关键挑战。最近的研究表明,通过神经网络对复杂(分层)材料的外观进行隐式编码的基于学习的表征取得了可喜的结果。然而,现有的通用学习模型往往在较强的表示能力和较高的运行时性能之间挣扎,并且缺乏用于材料编辑的物理参数。为了解决这些问题,我们引入了MetaLayer,这是一种利用元学习来建模和渲染分层材料的新方法。MetaLayer包含两个网络:一个是BSDFNet,它将分层材料紧凑地编码为隐式神经表示,另一个是MetaNet,它在每种材料的物理参数和相应的隐式神经表示的权重之间建立映射。采用一种新的位置编码方法和精心设计的训练策略来提高神经模型的性能和质量。作为一种新的基于学习的表示,本文提出的MetaLayer模型既能快速响应材料编辑,又能对大范围的分层材料提供高质量的结果,优于现有的分层BSDF模型。
{"title":"MetaLayer: A Meta-Learned BSDF Model for Layered Materials","authors":"Jie Guo, Zeru Li, Xueyan He, Beibei Wang, Wenbin Li, Yanwen Guo, Ling-Qi Yan","doi":"10.1145/3618365","DOIUrl":"https://doi.org/10.1145/3618365","url":null,"abstract":"Reproducing the appearance of arbitrary layered materials has long been a critical challenge in computer graphics, with regard to the demanding requirements of both physical accuracy and low computation cost. Recent studies have demonstrated promising results by learning-based representations that implicitly encode the appearance of complex (layered) materials by neural networks. However, existing generally-learned models often struggle between strong representation ability and high runtime performance, and also lack physical parameters for material editing. To address these concerns, we introduce MetaLayer, a new methodology leveraging meta-learning for modeling and rendering layered materials. MetaLayer contains two networks: a BSDFNet that compactly encodes layered materials into implicit neural representations, and a MetaNet that establishes the mapping between the physical parameters of each material and the weights of its corresponding implicit neural representation. A new positional encoding method and a well-designed training strategy are employed to improve the performance and quality of the neural model. As a new learning-based representation, the proposed MetaLayer model provides both fast responses to material editing and high-quality results for a wide range of layered materials, outperforming existing layered BSDF models.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138603795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VASCO: Volume and Surface Co-Decomposition for Hybrid Manufacturing VASCO:用于混合制造的体积和表面共分解技术
Pub Date : 2023-12-04 DOI: 10.1145/3618324
Fanchao Zhong, Haisen Zhao, Haochen Li, Xin Yan, Jikai Liu, Baoquan Chen, Lin Lu
Additive and subtractive hybrid manufacturing (ASHM) involves the alternating use of additive and subtractive manufacturing techniques, which provides unique advantages for fabricating complex geometries with otherwise inaccessible surfaces. However, a significant challenge lies in ensuring tool accessibility during both fabrication procedures, as the object shape may change dramatically, and different parts of the shape are interdependent. In this study, we propose a computational framework to optimize the planning of additive and subtractive sequences while ensuring tool accessibility. Our goal is to minimize the switching between additive and subtractive processes to achieve efficient fabrication while maintaining product quality. We approach the problem by formulating it as a Volume-And-Surface-CO-decomposition (VASCO) problem. First, we slice volumes into slabs and build a dynamic-directed graph to encode manufacturing constraints, with each node representing a slab and direction reflecting operation order. We introduce a novel geometry property called hybrid-fabricability for a pair of additive and subtractive procedures. Then, we propose a beam-guided top-down block decomposition algorithm to solve the VASCO problem. We apply our solution to a 5-axis hybrid manufacturing platform and evaluate various 3D shapes. Finally, we assess the performance of our approach through both physical and simulated manufacturing evaluations.
增材和减法混合制造(ASHM)涉及到增材和减法制造技术的交替使用,这为制造具有其他不可接近表面的复杂几何形状提供了独特的优势。然而,由于物体形状可能会发生巨大变化,并且形状的不同部分是相互依存的,因此在两种制造过程中确保工具的可及性是一个重大挑战。在本研究中,我们提出了一个计算框架来优化加减法序列的规划,同时确保工具的可及性。我们的目标是最大限度地减少增材和减材工艺之间的切换,以实现高效制造,同时保持产品质量。我们通过将其表述为体积和表面co分解(VASCO)问题来解决问题。首先,我们将体积切成薄片,并构建一个动态有向图来编码制造约束,每个节点代表一个薄片,方向反映操作顺序。我们引入了一种新的几何性质,称为混合可加工性的一对加法和减法程序。然后,我们提出了一种波束引导自顶向下的块分解算法来解决VASCO问题。我们将我们的解决方案应用于五轴混合制造平台,并评估各种3D形状。最后,我们通过物理和模拟制造评估来评估我们的方法的性能。
{"title":"VASCO: Volume and Surface Co-Decomposition for Hybrid Manufacturing","authors":"Fanchao Zhong, Haisen Zhao, Haochen Li, Xin Yan, Jikai Liu, Baoquan Chen, Lin Lu","doi":"10.1145/3618324","DOIUrl":"https://doi.org/10.1145/3618324","url":null,"abstract":"Additive and subtractive hybrid manufacturing (ASHM) involves the alternating use of additive and subtractive manufacturing techniques, which provides unique advantages for fabricating complex geometries with otherwise inaccessible surfaces. However, a significant challenge lies in ensuring tool accessibility during both fabrication procedures, as the object shape may change dramatically, and different parts of the shape are interdependent. In this study, we propose a computational framework to optimize the planning of additive and subtractive sequences while ensuring tool accessibility. Our goal is to minimize the switching between additive and subtractive processes to achieve efficient fabrication while maintaining product quality. We approach the problem by formulating it as a Volume-And-Surface-CO-decomposition (VASCO) problem. First, we slice volumes into slabs and build a dynamic-directed graph to encode manufacturing constraints, with each node representing a slab and direction reflecting operation order. We introduce a novel geometry property called hybrid-fabricability for a pair of additive and subtractive procedures. Then, we propose a beam-guided top-down block decomposition algorithm to solve the VASCO problem. We apply our solution to a 5-axis hybrid manufacturing platform and evaluate various 3D shapes. Finally, we assess the performance of our approach through both physical and simulated manufacturing evaluations.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138604319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Diffusion Models with 3D Perspective Geometry Constraints 利用三维透视几何约束增强扩散模型
Pub Date : 2023-12-01 DOI: 10.1145/3618389
Rishi Upadhyay, Howard Zhang, Yunhao Ba, Ethan Yang, Blake Gella, Sicheng Jiang, Alex Wong, A. Kadambi
While perspective is a well-studied topic in art, it is generally taken for granted in images. However, for the recent wave of high-quality image synthesis methods such as latent diffusion models, perspective accuracy is not an explicit requirement. Since these methods are capable of outputting a wide gamut of possible images, it is difficult for these synthesized images to adhere to the principles of linear perspective. We introduce a novel geometric constraint in the training process of generative models to enforce perspective accuracy. We show that outputs of models trained with this constraint both appear more realistic and improve performance of downstream models trained on generated images. Subjective human trials show that images generated with latent diffusion models trained with our constraint are preferred over images from the Stable Diffusion V2 model 70% of the time. SOTA monocular depth estimation models such as DPT and PixelFormer, fine-tuned on our images, outperform the original models trained on real images by up to 7.03% in RMSE and 19.3% in SqRel on the KITTI test set for zero-shot transfer.
虽然透视在艺术中是一个被充分研究的话题,但它在图像中通常被认为是理所当然的。然而,对于最近兴起的高质量图像合成方法,如潜在扩散模型,透视精度并不是一个明确的要求。由于这些方法能够输出大范围的可能图像,因此这些合成图像很难坚持线性透视的原则。我们在生成模型的训练过程中引入了一种新的几何约束来增强透视精度。我们表明,使用此约束训练的模型的输出既显得更真实,又提高了在生成图像上训练的下游模型的性能。主观的人体试验表明,使用我们的约束训练的潜在扩散模型生成的图像在70%的情况下优于来自稳定扩散V2模型的图像。SOTA单目深度估计模型,如DPT和PixelFormer,在我们的图像上进行了微调,在KITTI测试集的零镜头转移上,在RMSE和SqRel上的表现比在真实图像上训练的原始模型高出7.03%和19.3%。
{"title":"Enhancing Diffusion Models with 3D Perspective Geometry Constraints","authors":"Rishi Upadhyay, Howard Zhang, Yunhao Ba, Ethan Yang, Blake Gella, Sicheng Jiang, Alex Wong, A. Kadambi","doi":"10.1145/3618389","DOIUrl":"https://doi.org/10.1145/3618389","url":null,"abstract":"While perspective is a well-studied topic in art, it is generally taken for granted in images. However, for the recent wave of high-quality image synthesis methods such as latent diffusion models, perspective accuracy is not an explicit requirement. Since these methods are capable of outputting a wide gamut of possible images, it is difficult for these synthesized images to adhere to the principles of linear perspective. We introduce a novel geometric constraint in the training process of generative models to enforce perspective accuracy. We show that outputs of models trained with this constraint both appear more realistic and improve performance of downstream models trained on generated images. Subjective human trials show that images generated with latent diffusion models trained with our constraint are preferred over images from the Stable Diffusion V2 model 70% of the time. SOTA monocular depth estimation models such as DPT and PixelFormer, fine-tuned on our images, outperform the original models trained on real images by up to 7.03% in RMSE and 19.3% in SqRel on the KITTI test set for zero-shot transfer.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138627617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Adaptive Fast-Multipole-Accelerated Hybrid Boundary Integral Equation Method for Accurate Diffusion Curves 精确扩散曲线的自适应快速多极-加速混合边界积分方程法
Pub Date : 2023-11-24 DOI: 10.1145/3618374
Seungbae Bang, Kirill Serkh, Oded Stein, Alec Jacobson
In theory, diffusion curves promise complex color gradations for infinite-resolution vector graphics. In practice, existing realizations suffer from poor scaling, discretization artifacts, or insufficient support for rich boundary conditions. Previous applications of the boundary element method to diffusion curves have relied on polygonal approximations, which either forfeit the high-order smoothness of Bézier curves, or, when the polygonal approximation is extremely detailed, result in large and costly systems of equations that must be solved. In this paper, we utilize the boundary integral equation method to accurately and efficiently solve the underlying partial differential equation. Given a desired resolution and viewport, we then interpolate this solution and use the boundary element method to render it. We couple this hybrid approach with the fast multipole method on a non-uniform quadtree for efficient computation. Furthermore, we introduce an adaptive strategy to enable truly scalable infinite-resolution diffusion curves.
从理论上讲,扩散曲线可以为无限分辨率的矢量图形提供复杂的色彩渐变。在实践中,现有的实现方法存在缩放性差、离散化人工痕迹或对丰富的边界条件支持不足等问题。以前将边界元方法应用于扩散曲线时,都是依赖多边形近似,这要么放弃了贝塞尔曲线的高阶平滑性,要么在多边形近似极其精细时,导致必须求解庞大而昂贵的方程组。在本文中,我们利用边界积分方程法来精确高效地求解底层偏微分方程。给定所需的分辨率和视口后,我们对该解法进行插值,并使用边界元素法对其进行渲染。我们将这种混合方法与非均匀四叉树上的快速多极法结合起来,以实现高效计算。此外,我们还引入了一种自适应策略,以实现真正可扩展的无限分辨率扩散曲线。
{"title":"An Adaptive Fast-Multipole-Accelerated Hybrid Boundary Integral Equation Method for Accurate Diffusion Curves","authors":"Seungbae Bang, Kirill Serkh, Oded Stein, Alec Jacobson","doi":"10.1145/3618374","DOIUrl":"https://doi.org/10.1145/3618374","url":null,"abstract":"In theory, diffusion curves promise complex color gradations for infinite-resolution vector graphics. In practice, existing realizations suffer from poor scaling, discretization artifacts, or insufficient support for rich boundary conditions. Previous applications of the boundary element method to diffusion curves have relied on polygonal approximations, which either forfeit the high-order smoothness of Bézier curves, or, when the polygonal approximation is extremely detailed, result in large and costly systems of equations that must be solved. In this paper, we utilize the boundary integral equation method to accurately and efficiently solve the underlying partial differential equation. Given a desired resolution and viewport, we then interpolate this solution and use the boundary element method to render it. We couple this hybrid approach with the fast multipole method on a non-uniform quadtree for efficient computation. Furthermore, we introduce an adaptive strategy to enable truly scalable infinite-resolution diffusion curves.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139240081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ART-Owen Scrambling ART-Owen Scrambling
Pub Date : 2023-11-20 DOI: 10.1145/3618307
Abdalla G. M. Ahmed, Matt Pharr, Peter Wonka
We present a novel algorithm for implementing Owen-scrambling, combining the generation and distribution of the scrambling bits in a single self-contained compact process. We employ a context-free grammar to build a binary tree of symbols, and equip each symbol with a scrambling code that affects all descendant nodes. We nominate the grammar of adaptive regular tiles (ART) derived from the repetition-avoiding Thue-Morse word, and we discuss its potential advantages and shortcomings. Our algorithm has many advantages, including random access to samples, fixed time complexity, GPU friendliness, and scalability to any memory budget. Further, it provides two unique features over known methods: it admits optimization, and it is in-vertible, enabling screen-space scrambling of the high-dimensional Sobol sampler.
我们提出了一种实现欧文加扰的新算法,它将加扰比特的生成和分配结合在一个独立紧凑的过程中。我们采用无上下文语法构建二进制符号树,并为每个符号配备影响所有后代节点的扰码。我们提名了从避免重复的 Thue-Morse 词衍生出的自适应正则瓦片(ART)语法,并讨论了其潜在的优势和不足。我们的算法有很多优点,包括随机访问样本、时间复杂度固定、GPU 友好性以及可扩展至任何内存预算。此外,与已知方法相比,我们的算法还有两个独特之处:它允许优化,而且是不可反转的,从而可以对高维 Sobol 采样器进行屏幕空间扰乱。
{"title":"ART-Owen Scrambling","authors":"Abdalla G. M. Ahmed, Matt Pharr, Peter Wonka","doi":"10.1145/3618307","DOIUrl":"https://doi.org/10.1145/3618307","url":null,"abstract":"We present a novel algorithm for implementing Owen-scrambling, combining the generation and distribution of the scrambling bits in a single self-contained compact process. We employ a context-free grammar to build a binary tree of symbols, and equip each symbol with a scrambling code that affects all descendant nodes. We nominate the grammar of adaptive regular tiles (ART) derived from the repetition-avoiding Thue-Morse word, and we discuss its potential advantages and shortcomings. Our algorithm has many advantages, including random access to samples, fixed time complexity, GPU friendliness, and scalability to any memory budget. Further, it provides two unique features over known methods: it admits optimization, and it is in-vertible, enabling screen-space scrambling of the high-dimensional Sobol sampler.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139256651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Shells for Efficient Neural Radiance Field Rendering 用于高效神经辐照场渲染的自适应外壳
Pub Date : 2023-11-16 DOI: 10.1145/3618390
Zian Wang, Tianchang Shen, Merlin Nimier-David, Nicholas Sharp, Jun Gao, Alexander Keller, Sanja Fidler, Thomas Muller, Zan Gojcic
Neural radiance fields achieve unprecedented quality for novel view synthesis, but their volumetric formulation remains expensive, requiring a huge number of samples to render high-resolution images. Volumetric encodings are essential to represent fuzzy geometry such as foliage and hair, and they are well-suited for stochastic optimization. Yet, many scenes ultimately consist largely of solid surfaces which can be accurately rendered by a single sample per pixel. Based on this insight, we propose a neural radiance formulation that smoothly transitions between volumetric- and surface-based rendering, greatly accelerating rendering speed and even improving visual fidelity. Our method constructs an explicit mesh envelope which spatially bounds a neural volumetric representation. In solid regions, the envelope nearly converges to a surface and can often be rendered with a single sample. To this end, we generalize the NeuS [Wang et al. 2021] formulation with a learned spatially-varying kernel size which encodes the spread of the density, fitting a wide kernel to volume-like regions and a tight kernel to surface-like regions. We then extract an explicit mesh of a narrow band around the surface, with width determined by the kernel size, and fine-tune the radiance field within this band. At inference time, we cast rays against the mesh and evaluate the radiance field only within the enclosed region, greatly reducing the number of samples required. Experiments show that our approach enables efficient rendering at very high fidelity. We also demonstrate that the extracted envelope enables downstream applications such as animation and simulation.
神经辐射场为新颖的视图合成提供了前所未有的质量,但其体积形式仍然很昂贵,需要大量样本才能渲染出高分辨率图像。体积编码对于表示树叶和头发等模糊几何体至关重要,而且非常适合随机优化。然而,许多场景最终主要由实体表面组成,每个像素只需一个样本就能准确渲染。基于这一观点,我们提出了一种神经辐射率公式,它能在体积渲染和基于表面的渲染之间平滑转换,从而大大加快渲染速度,甚至提高视觉保真度。我们的方法构建了一个明确的网格包络线,在空间上对神经体积表示法进行了约束。在实体区域,包络线几乎趋近于曲面,通常只需一个样本就能完成渲染。为此,我们对 NeuS [Wang 等人,2021 年] 公式进行了概括,将空间变化的内核大小与密度分布进行编码,将宽内核拟合到类体积区域,将紧内核拟合到类曲面区域。然后,我们提取表面周围窄带的显式网格,其宽度由核大小决定,并对该窄带内的辐射场进行微调。在推理时,我们根据网格投射光线,仅在封闭区域内评估辐射场,从而大大减少了所需的样本数量。实验表明,我们的方法能够以极高的保真度进行高效渲染。我们还证明,提取的包络线可用于动画和模拟等下游应用。
{"title":"Adaptive Shells for Efficient Neural Radiance Field Rendering","authors":"Zian Wang, Tianchang Shen, Merlin Nimier-David, Nicholas Sharp, Jun Gao, Alexander Keller, Sanja Fidler, Thomas Muller, Zan Gojcic","doi":"10.1145/3618390","DOIUrl":"https://doi.org/10.1145/3618390","url":null,"abstract":"Neural radiance fields achieve unprecedented quality for novel view synthesis, but their volumetric formulation remains expensive, requiring a huge number of samples to render high-resolution images. Volumetric encodings are essential to represent fuzzy geometry such as foliage and hair, and they are well-suited for stochastic optimization. Yet, many scenes ultimately consist largely of solid surfaces which can be accurately rendered by a single sample per pixel. Based on this insight, we propose a neural radiance formulation that smoothly transitions between volumetric- and surface-based rendering, greatly accelerating rendering speed and even improving visual fidelity. Our method constructs an explicit mesh envelope which spatially bounds a neural volumetric representation. In solid regions, the envelope nearly converges to a surface and can often be rendered with a single sample. To this end, we generalize the NeuS [Wang et al. 2021] formulation with a learned spatially-varying kernel size which encodes the spread of the density, fitting a wide kernel to volume-like regions and a tight kernel to surface-like regions. We then extract an explicit mesh of a narrow band around the surface, with width determined by the kernel size, and fine-tune the radiance field within this band. At inference time, we cast rays against the mesh and evaluate the radiance field only within the enclosed region, greatly reducing the number of samples required. Experiments show that our approach enables efficient rendering at very high fidelity. We also demonstrate that the extracted envelope enables downstream applications such as animation and simulation.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139270315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Controllable Group Choreography Using Contrastive Diffusion 利用对比扩散进行可控的群体编舞
Pub Date : 2023-10-29 DOI: 10.1145/3618356
Nhat Le, Tuong Khanh Long Do, Khoa Do, Hien Nguyen, Erman Tjiputra, Quang D. Tran, Anh Nguyen
Music-driven group choreography poses a considerable challenge but holds significant potential for a wide range of industrial applications. The ability to generate synchronized and visually appealing group dance motions that are aligned with music opens up opportunities in many fields such as entertainment, advertising, and virtual performances. However, most of the recent works are not able to generate high-fidelity long-term motions, or fail to enable controllable experience. In this work, we aim to address the demand for high-quality and customizable group dance generation by effectively governing the consistency and diversity of group choreographies. In particular, we utilize a diffusion-based generative approach to enable the synthesis of flexible number of dancers and long-term group dances, while ensuring coherence to the input music. Ultimately, we introduce a Group Contrastive Diffusion (GCD) strategy to enhance the connection between dancers and their group, presenting the ability to control the consistency or diversity level of the synthesized group animation via the classifier-guidance sampling technique. Through intensive experiments and evaluation, we demonstrate the effectiveness of our approach in producing visually captivating and consistent group dance motions. The experimental results show the capability of our method to achieve the desired levels of consistency and diversity, while maintaining the overall quality of the generated group choreography.
音乐驱动的集体舞蹈编排是一项巨大的挑战,但在广泛的行业应用中却蕴含着巨大的潜力。能够生成与音乐同步且具有视觉吸引力的集体舞蹈动作,为娱乐、广告和虚拟表演等许多领域带来了机遇。然而,最近的大多数研究都无法生成高保真的长期动作,或者无法实现可控体验。在这项工作中,我们旨在通过有效控制群舞编排的一致性和多样性,满足高质量和可定制群舞生成的需求。特别是,我们利用基于扩散的生成方法,实现了灵活的舞者人数和长期群舞的合成,同时确保了与输入音乐的一致性。最后,我们引入了群体对比扩散(GCD)策略,以增强舞者与其群体之间的联系,并通过分类器-指导采样技术来控制合成群体动画的一致性或多样性水平。通过深入的实验和评估,我们证明了我们的方法在制作具有视觉吸引力和一致性的群舞动作方面的有效性。实验结果表明,我们的方法既能达到理想的一致性和多样性水平,又能保持生成的集体舞编排的整体质量。
{"title":"Controllable Group Choreography Using Contrastive Diffusion","authors":"Nhat Le, Tuong Khanh Long Do, Khoa Do, Hien Nguyen, Erman Tjiputra, Quang D. Tran, Anh Nguyen","doi":"10.1145/3618356","DOIUrl":"https://doi.org/10.1145/3618356","url":null,"abstract":"Music-driven group choreography poses a considerable challenge but holds significant potential for a wide range of industrial applications. The ability to generate synchronized and visually appealing group dance motions that are aligned with music opens up opportunities in many fields such as entertainment, advertising, and virtual performances. However, most of the recent works are not able to generate high-fidelity long-term motions, or fail to enable controllable experience. In this work, we aim to address the demand for high-quality and customizable group dance generation by effectively governing the consistency and diversity of group choreographies. In particular, we utilize a diffusion-based generative approach to enable the synthesis of flexible number of dancers and long-term group dances, while ensuring coherence to the input music. Ultimately, we introduce a Group Contrastive Diffusion (GCD) strategy to enhance the connection between dancers and their group, presenting the ability to control the consistency or diversity level of the synthesized group animation via the classifier-guidance sampling technique. Through intensive experiments and evaluation, we demonstrate the effectiveness of our approach in producing visually captivating and consistent group dance motions. The experimental results show the capability of our method to achieve the desired levels of consistency and diversity, while maintaining the overall quality of the generated group choreography.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139311679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Packing: from Visual Sensing to Reinforcement Learning 神经打包:从视觉传感到强化学习
Pub Date : 2023-10-17 DOI: 10.1145/3618354
Juzhan Xu, Minglun Gong, Hao Zhang, Hui Huang, Ruizhen Hu
We present a novel learning framework to solve the transport-and-packing (TAP) problem in 3D. It constitutes a full solution pipeline from partial observations of input objects via RGBD sensing and recognition to final box placement, via robotic motion planning, to arrive at a compact packing in a target container. The technical core of our method is a neural network for TAP, trained via reinforcement learning (RL), to solve the NP-hard combinatorial optimization problem. Our network simultaneously selects an object to pack and determines the final packing location, based on a judicious encoding of the continuously evolving states of partially observed source objects and available spaces in the target container, using separate encoders both enabled with attention mechanisms. The encoded feature vectors are employed to compute the matching scores and feasibility masks of different pairings of box selection and available space configuration for packing strategy optimization. Extensive experiments, including ablation studies and physical packing execution by a real robot (Universal Robot UR5e), are conducted to evaluate our method in terms of its design choices, scalability, generalizability, and comparisons to baselines, including the most recent RL-based TAP solution. We also contribute the first benchmark for TAP which covers a variety of input settings and difficulty levels.
我们提出了一种新颖的学习框架,用于解决三维运输和包装(TAP)问题。它构成了一个完整的解决方案流水线,从通过 RGBD 感测和识别对输入对象进行部分观测,到通过机器人运动规划进行最终的箱子放置,最终实现目标集装箱内的紧凑包装。我们方法的技术核心是通过强化学习(RL)训练的 TAP 神经网络,以解决 NP 难度的组合优化问题。我们的网络同时选择要打包的对象并确定最终的打包位置,其依据是对部分观察到的源对象和目标容器中可用空间的不断变化的状态进行明智的编码。编码后的特征向量可用于计算不同箱体选择和可用空间配置配对的匹配分数和可行性掩码,以优化打包策略。我们进行了广泛的实验,包括烧蚀研究和由真实机器人(通用机器人 UR5e)执行的物理打包,以评估我们的方法在设计选择、可扩展性、通用性以及与基线(包括最新的基于 RL 的 TAP 解决方案)的比较方面的效果。我们还为 TAP 提供了首个基准,涵盖了各种输入设置和难度级别。
{"title":"Neural Packing: from Visual Sensing to Reinforcement Learning","authors":"Juzhan Xu, Minglun Gong, Hao Zhang, Hui Huang, Ruizhen Hu","doi":"10.1145/3618354","DOIUrl":"https://doi.org/10.1145/3618354","url":null,"abstract":"We present a novel learning framework to solve the transport-and-packing (TAP) problem in 3D. It constitutes a full solution pipeline from partial observations of input objects via RGBD sensing and recognition to final box placement, via robotic motion planning, to arrive at a compact packing in a target container. The technical core of our method is a neural network for TAP, trained via reinforcement learning (RL), to solve the NP-hard combinatorial optimization problem. Our network simultaneously selects an object to pack and determines the final packing location, based on a judicious encoding of the continuously evolving states of partially observed source objects and available spaces in the target container, using separate encoders both enabled with attention mechanisms. The encoded feature vectors are employed to compute the matching scores and feasibility masks of different pairings of box selection and available space configuration for packing strategy optimization. Extensive experiments, including ablation studies and physical packing execution by a real robot (Universal Robot UR5e), are conducted to evaluate our method in terms of its design choices, scalability, generalizability, and comparisons to baselines, including the most recent RL-based TAP solution. We also contribute the first benchmark for TAP which covers a variety of input settings and difficulty levels.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139318076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variational Barycentric Coordinates 变分重心坐标
Pub Date : 2023-10-06 DOI: 10.1145/3618403
Ana Dodik, Oded Stein, Vincent Sitzmann, Justin Solomon
We propose a variational technique to optimize for generalized barycentric coordinates that offers additional control compared to existing models. Prior work represents barycentric coordinates using meshes or closed-form formulae, in practice limiting the choice of objective function. In contrast, we directly parameterize the continuous function that maps any coordinate in a polytope's interior to its barycentric coordinates using a neural field. This formulation is enabled by our theoretical characterization of barycentric coordinates, which allows us to construct neural fields that parameterize the entire function class of valid coordinates. We demonstrate the flexibility of our model using a variety of objective functions, including multiple smoothness and deformation-aware energies; as a side contribution, we also present mathematically-justified means of measuring and minimizing objectives like total variation on discontinuous neural fields. We offer a practical acceleration strategy, present a thorough validation of our algorithm, and demonstrate several applications.
我们提出了一种针对广义重心坐标进行优化的变分技术,与现有模型相比,它提供了额外的控制。之前的研究使用网格或封闭式公式表示重心坐标,实际上限制了目标函数的选择。与此相反,我们使用神经场直接将连续函数参数化,该函数可将多面体内部的任意坐标映射到其偏心坐标。我们对重心坐标的理论表征使这种表述成为可能,这让我们能够构建神经场,对有效坐标的整个函数类别进行参数化。我们使用多种目标函数(包括多重平滑度和变形感知能量)展示了我们模型的灵活性;作为附带贡献,我们还提出了数学上合理的方法来测量和最小化不连续神经场的总变异等目标。我们提供了一种实用的加速策略,对我们的算法进行了全面验证,并演示了几种应用。
{"title":"Variational Barycentric Coordinates","authors":"Ana Dodik, Oded Stein, Vincent Sitzmann, Justin Solomon","doi":"10.1145/3618403","DOIUrl":"https://doi.org/10.1145/3618403","url":null,"abstract":"We propose a variational technique to optimize for generalized barycentric coordinates that offers additional control compared to existing models. Prior work represents barycentric coordinates using meshes or closed-form formulae, in practice limiting the choice of objective function. In contrast, we directly parameterize the continuous function that maps any coordinate in a polytope's interior to its barycentric coordinates using a neural field. This formulation is enabled by our theoretical characterization of barycentric coordinates, which allows us to construct neural fields that parameterize the entire function class of valid coordinates. We demonstrate the flexibility of our model using a variety of objective functions, including multiple smoothness and deformation-aware energies; as a side contribution, we also present mathematically-justified means of measuring and minimizing objectives like total variation on discontinuous neural fields. We offer a practical acceleration strategy, present a thorough validation of our algorithm, and demonstrate several applications.","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139322445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decaf: Monocular Deformation Capture for Face and Hand Interactions Decaf:用于面部和手部交互的单目变形捕捉
Pub Date : 2023-09-28 DOI: 10.1145/3618329
Soshi Shimada, Vladislav Golyanik, Patrick P'erez, C. Theobalt
Existing methods for 3D tracking from monocular RGB videos predominantly consider articulated and rigid objects (e.g., two hands or humans interacting with rigid environments). Modelling dense non-rigid object deformations in this setting (e.g. when hands are interacting with a face), remained largely unaddressed so far, although such effects can improve the realism of the downstream applications such as AR/VR, 3D virtual avatar communications, and character animations. This is due to the severe ill-posedness of the monocular view setting and the associated challenges (e.g., in acquiring a dataset for training and evaluation or obtaining the reasonable non-uniform stiffness of the deformable object). While it is possible to naïvely track multiple non-rigid objects independently using 3D templates or parametric 3D models, such an approach would suffer from multiple artefacts in the resulting 3D estimates such as depth ambiguity, unnatural intra-object collisions and missing or implausible deformations. Hence, this paper introduces the first method that addresses the fundamental challenges depicted above and that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos. We model hands as articulated objects inducing non-rigid face deformations during an active interaction. Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system. As a pivotal step in its creation, we process the reconstructed raw 3D shapes with position-based dynamics and an approach for non-uniform stiffness estimation of the head tissues, which results in plausible annotations of the surface deformations, hand-face contact regions and head-hand positions. At the core of our neural approach are a variational auto-encoder supplying the hand-face depth prior and modules that guide the 3D tracking by estimating the contacts and the deformations. Our final 3D hand and face reconstructions are realistic and more plausible compared to several baselines applicable in our setting, both quantitatively and qualitatively. https://vcai.mpi-inf.mpg.de/projects/Decaf
现有的单目 RGB 视频 3D 跟踪方法主要考虑的是铰接和刚性物体(如两只手或人与刚性环境的互动)。虽然这种效果可以提高 AR/VR、3D 虚拟化身通信和角色动画等下游应用的逼真度,但在这种情况下(如手与脸部互动时)对密集的非刚性物体变形进行建模的问题至今仍未得到解决。这是因为单目视图设置存在严重的不确定性和相关挑战(例如,在获取用于训练和评估的数据集或获得可变形物体的合理非均匀刚度方面)。虽然可以使用三维模板或参数化三维模型对多个非刚性物体进行独立的天真追踪,但这种方法会在生成的三维估计值中产生多种伪影,例如深度模糊、不自然的物体内部碰撞以及缺失或难以置信的变形。因此,本文介绍了第一种解决上述基本挑战的方法,该方法可通过单目 RGB 视频以三维方式跟踪与人脸互动的人手。我们将手建模为铰接物体,在主动交互过程中产生非刚性面部变形。我们的方法依赖于一个新的手脸运动和交互捕捉数据集,该数据集具有通过无标记多视角摄像系统获取的逼真脸部变形。作为其创建的关键步骤,我们利用基于位置的动力学和头部组织非均匀刚度估计方法处理重建的原始三维形状,从而获得表面变形、手-脸接触区域和头-手位置的合理注释。我们的神经方法的核心是提供手面深度先验的变分自动编码器,以及通过估计接触和变形来指导三维跟踪的模块。我们最终的手部和面部三维重建无论在定量还是定性方面都比适用于我们环境的几种基线更加真实可信。https://vcai.mpi-inf.mpg.de/projects/Decaf。
{"title":"Decaf: Monocular Deformation Capture for Face and Hand Interactions","authors":"Soshi Shimada, Vladislav Golyanik, Patrick P'erez, C. Theobalt","doi":"10.1145/3618329","DOIUrl":"https://doi.org/10.1145/3618329","url":null,"abstract":"Existing methods for 3D tracking from monocular RGB videos predominantly consider articulated and rigid objects (e.g., two hands or humans interacting with rigid environments). Modelling dense non-rigid object deformations in this setting (e.g. when hands are interacting with a face), remained largely unaddressed so far, although such effects can improve the realism of the downstream applications such as AR/VR, 3D virtual avatar communications, and character animations. This is due to the severe ill-posedness of the monocular view setting and the associated challenges (e.g., in acquiring a dataset for training and evaluation or obtaining the reasonable non-uniform stiffness of the deformable object). While it is possible to naïvely track multiple non-rigid objects independently using 3D templates or parametric 3D models, such an approach would suffer from multiple artefacts in the resulting 3D estimates such as depth ambiguity, unnatural intra-object collisions and missing or implausible deformations. Hence, this paper introduces the first method that addresses the fundamental challenges depicted above and that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos. We model hands as articulated objects inducing non-rigid face deformations during an active interaction. Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system. As a pivotal step in its creation, we process the reconstructed raw 3D shapes with position-based dynamics and an approach for non-uniform stiffness estimation of the head tissues, which results in plausible annotations of the surface deformations, hand-face contact regions and head-hand positions. At the core of our neural approach are a variational auto-encoder supplying the hand-face depth prior and modules that guide the 3D tracking by estimating the contacts and the deformations. Our final 3D hand and face reconstructions are realistic and more plausible compared to several baselines applicable in our setting, both quantitatively and qualitatively. https://vcai.mpi-inf.mpg.de/projects/Decaf","PeriodicalId":7077,"journal":{"name":"ACM Transactions on Graphics (TOG)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139335218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Graphics (TOG)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1