ACM Transactions on Graphics最新文献_第3页

Particle-Laden Fluid on Flow Maps 流动图上的含颗粒流体

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687916

Zhiqi Li, Duowen Chen, Candong Lin, Jinyuan Liu, Bo Zhu

We propose a novel framework for simulating ink as a particle-laden flow using particle flow maps. Our method addresses the limitations of existing flow-map techniques, which struggle with dissipative forces like viscosity and drag, thereby extending the application scope from solving the Euler equations to solving the Navier-Stokes equations with accurate viscosity and laden-particle treatment. Our key contribution lies in a coupling mechanism for two particle systems, coupling physical sediment particles and virtual flow-map particles on a background grid by solving a Poisson system. We implemented a novel path integral formula to incorporate viscosity and drag forces into the particle flow map process. Our approach enables state-of-the-art simulation of various particle-laden flow phenomena, exemplified by the bulging and breakup of suspension drop tails, torus formation, torus disintegration, and the coalescence of sedimenting drops. In particular, our method delivered high-fidelity ink diffusion simulations by accurately capturing vortex bulbs, viscous tails, fractal branching, and hierarchical structures.

我们提出了一个新颖的框架，利用粒子流图将墨水模拟为粒子流。我们的方法解决了现有流动图技术在处理粘性和阻力等耗散力方面的局限性，从而将应用范围从求解欧拉方程扩展到求解纳维-斯托克斯方程，并进行了精确的粘性和负载粒子处理。我们的主要贡献在于两个粒子系统的耦合机制，即通过求解泊松系统，在背景网格上耦合物理沉积物粒子和虚拟流图粒子。我们采用了新颖的路径积分公式，将粘滞力和阻力纳入粒子流图过程。我们的方法能够对各种颗粒载流现象进行最先进的模拟，例如悬浮水滴尾部的隆起和破裂、环的形成、环的解体以及沉降水滴的凝聚。特别是，我们的方法通过准确捕捉涡球、粘性尾流、分形分支和层次结构，实现了高保真油墨扩散模拟。

引用次数: 0

ToonCrafter: Generative Cartoon Interpolation ToonCrafter：生成式卡通插值

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687761

Jinbo Xing, Hanyuan Liu, Menghan Xia, Yong Zhang, Xintao Wang, Ying Shan, Tien-Tsin Wong

We introduce ToonCrafter, a novel approach that transcends traditional correspondence-based cartoon video interpolation, paving the way for generative interpolation. Traditional methods, that implicitly assume linear motion and the absence of complicated phenomena like dis-occlusion, often struggle with the exaggerated non-linear and large motions with occlusion commonly found in cartoons, resulting in implausible or even failed interpolation results. To overcome these limitations, we explore the potential of adapting live-action video priors to better suit cartoon interpolation within a generative framework. ToonCrafter effectively addresses the challenges faced when applying live-action video motion priors to generative cartoon interpolation. First, we design a toon rectification learning strategy that seamlessly adapts live-action video priors to the cartoon domain, resolving the domain gap and content leakage issues. Next, we introduce a dual-reference-based 3D decoder to compensate for lost details due to the highly compressed latent prior spaces, ensuring the preservation of fine details in interpolation results. Finally, we design a flexible sketch encoder that empowers users with interactive control over the interpolation results. Experimental results demonstrate that our proposed method not only produces visually convincing and more natural dynamics, but also effectively handles dis-occlusion. The comparative evaluation demonstrates the notable superiority of our approach over existing competitors. Code and model weights are available at https://doubiiu.github.io/projects/ToonCrafter

我们介绍的 ToonCrafter 是一种新颖的方法，它超越了传统的基于对应关系的卡通视频插值，为生成插值铺平了道路。传统方法隐含地假定了线性运动和不存在不闭塞等复杂现象，但在处理动画片中常见的夸张非线性运动和带有闭塞的大运动时往往力不从心，导致插值结果难以置信甚至失败。为了克服这些局限性，我们探索了在生成框架内调整真人视频先验以更好地适应卡通插值的可能性。ToonCrafter 有效地解决了将真人视频运动先验应用于生成式卡通插值时所面临的挑战。首先，我们设计了一种卡通矫正学习策略，将真人视频前验无缝调整到卡通领域，解决了领域差距和内容泄漏问题。其次，我们引入了基于双参考的 3D 解码器，以补偿因高度压缩的潜先验空间而丢失的细节，确保插值结果中精细细节的保留。最后，我们设计了一种灵活的草图编码器，使用户能够对插值结果进行交互式控制。实验结果表明，我们提出的方法不仅能产生视觉上令人信服且更自然的动态效果，还能有效处理不闭塞现象。对比评估结果表明，我们的方法明显优于现有的竞争对手。代码和模型权重见 https://doubiiu.github.io/projects/ToonCrafter。

{"title":"ToonCrafter: Generative Cartoon Interpolation","authors":"Jinbo Xing, Hanyuan Liu, Menghan Xia, Yong Zhang, Xintao Wang, Ying Shan, Tien-Tsin Wong","doi":"10.1145/3687761","DOIUrl":"https://doi.org/10.1145/3687761","url":null,"abstract":"We introduce ToonCrafter, a novel approach that transcends traditional correspondence-based cartoon video interpolation, paving the way for generative interpolation. Traditional methods, that implicitly assume linear motion and the absence of complicated phenomena like dis-occlusion, often struggle with the exaggerated non-linear and large motions with occlusion commonly found in cartoons, resulting in implausible or even failed interpolation results. To overcome these limitations, we explore the potential of adapting live-action video priors to better suit cartoon interpolation within a generative framework. ToonCrafter effectively addresses the challenges faced when applying live-action video motion priors to generative cartoon interpolation. First, we design a toon rectification learning strategy that seamlessly adapts live-action video priors to the cartoon domain, resolving the domain gap and content leakage issues. Next, we introduce a dual-reference-based 3D decoder to compensate for lost details due to the highly compressed latent prior spaces, ensuring the preservation of fine details in interpolation results. Finally, we design a flexible sketch encoder that empowers users with interactive control over the interpolation results. Experimental results demonstrate that our proposed method not only produces visually convincing and more natural dynamics, but also effectively handles dis-occlusion. The comparative evaluation demonstrates the notable superiority of our approach over existing competitors. Code and model weights are available at https://doubiiu.github.io/projects/ToonCrafter","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"250 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gaussian Opacity Fields: Efficient Adaptive Surface Reconstruction in Unbounded Scenes 高斯不透明度场：无边界场景中的高效自适应曲面重构

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687937

Zehao Yu, Torsten Sattler, Andreas Geiger

Recently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, while allowing the rendering of high-resolution images in real-time. However, leveraging 3D Gaussians for surface reconstruction poses significant challenges due to the explicit and disconnected nature of 3D Gaussians. In this work, we present Gaussian Opacity Fields (GOF), a novel approach for efficient, high-quality, and adaptive surface reconstruction in unbounded scenes. Our GOF is derived from ray-tracing-based volume rendering of 3D Gaussians, enabling direct geometry extraction from 3D Gaussians by identifying its levelset, without resorting to Poisson reconstruction or TSDF fusion as in previous work. We approximate the surface normal of Gaussians as the normal of the ray-Gaussian intersection plane, enabling the application of regularization that significantly enhances geometry. Furthermore, we develop an efficient geometry extraction method utilizing Marching Tetrahedra, where the tetrahedral grids are induced from 3D Gaussians and thus adapt to the scene's complexity. Our evaluations reveal that GOF surpasses existing 3DGS-based methods in surface reconstruction and novel view synthesis. Further, it compares favorably to or even outperforms, neural implicit methods in both quality and speed.

最近，三维高斯拼接技术（3DGS）展示了令人印象深刻的新颖视图合成结果，同时允许实时渲染高分辨率图像。然而，由于三维高斯的显性和不连续性，利用三维高斯进行曲面重建面临着巨大挑战。在这项工作中，我们提出了高斯不透明度场 (GOF)，这是一种在无边界场景中进行高效、高质量和自适应表面重建的新方法。我们的 GOF 源自基于光线跟踪的三维高斯体积渲染，通过识别其水平集，可直接从三维高斯中提取几何图形，而无需像以前的工作那样求助于泊松重建或 TSDF 融合。我们将高斯的表面法线近似为射线-高斯交点平面的法线，从而能够应用正则化，显著增强几何效果。此外，我们还开发了一种利用行进四面体的高效几何图形提取方法，其中的四面体网格由三维高斯诱导而成，因此能适应场景的复杂性。我们的评估结果表明，GOF 在曲面重建和新颖视图合成方面超越了现有的基于 3DGS 的方法。此外，它在质量和速度上都优于甚至超过了神经隐式方法。

{"title":"Gaussian Opacity Fields: Efficient Adaptive Surface Reconstruction in Unbounded Scenes","authors":"Zehao Yu, Torsten Sattler, Andreas Geiger","doi":"10.1145/3687937","DOIUrl":"https://doi.org/10.1145/3687937","url":null,"abstract":"Recently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, while allowing the rendering of high-resolution images in real-time. However, leveraging 3D Gaussians for surface reconstruction poses significant challenges due to the explicit and disconnected nature of 3D Gaussians. In this work, we present Gaussian Opacity Fields (GOF), a novel approach for efficient, high-quality, and adaptive surface reconstruction in unbounded scenes. Our GOF is derived from ray-tracing-based volume rendering of 3D Gaussians, enabling direct geometry extraction from 3D Gaussians by identifying its levelset, without resorting to Poisson reconstruction or TSDF fusion as in previous work. We approximate the surface normal of Gaussians as the normal of the ray-Gaussian intersection plane, enabling the application of regularization that significantly enhances geometry. Furthermore, we develop an efficient geometry extraction method utilizing Marching Tetrahedra, where the tetrahedral grids are induced from 3D Gaussians and thus adapt to the scene's complexity. Our evaluations reveal that GOF surpasses existing 3DGS-based methods in surface reconstruction and novel view synthesis. Further, it compares favorably to or even outperforms, neural implicit methods in both quality and speed.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"33 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives LetsGo：通过激光雷达辅助高斯原型进行大规模车库建模和渲染

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687762

Jiadi Cui, Junming Cao, Fuqiang Zhao, Zhipeng He, Yifan Chen, Yuhui Zhong, Lan Xu, Yujiao Shi, Yingliang Zhang, Jingyi Yu

Large garages are ubiquitous yet intricate scenes that present unique challenges due to their monotonous colors, repetitive patterns, reflective surfaces, and transparent vehicle glass. Conventional Structure from Motion (SfM) methods for camera pose estimation and 3D reconstruction often fail in these environments due to poor correspondence construction. To address these challenges, we introduce LetsGo, a LiDAR-assisted Gaussian splatting framework for large-scale garage modeling and rendering. We develop a handheld scanner, Polar, equipped with IMU, LiDAR, and a fisheye camera, to facilitate accurate data acquisition. Using this Polar device, we present the GarageWorld dataset, consisting of eight expansive garage scenes with diverse geometric structures, which will be made publicly available for further research. Our approach demonstrates that LiDAR point clouds collected by the Polar device significantly enhance a suite of 3D Gaussian splatting algorithms for garage scene modeling and rendering. We introduce a novel depth regularizer that effectively eliminates floating artifacts in rendered images. Additionally, we propose a multi-resolution 3D Gaussian representation designed for Level-of-Detail (LOD) rendering. This includes adapted scaling factors for individual levels and a random-resolution-level training scheme to optimize the Gaussians across different resolutions. This representation enables efficient rendering of large-scale garage scenes on lightweight devices via a web-based renderer. Experimental results on our GarageWorld dataset, as well as on ScanNet++ and KITTI-360, demonstrate the superiority of our method in terms of rendering quality and resource efficiency.

大型车库是无处不在但又错综复杂的场景，其单调的颜色、重复的图案、反光的表面和透明的车辆玻璃给我们带来了独特的挑战。传统的运动结构（SfM）摄像机姿态估计和三维重建方法在这些环境中往往会因为对应关系构造不佳而失败。为了应对这些挑战，我们推出了用于大规模车库建模和渲染的激光雷达辅助高斯拼接框架 LetsGo。我们开发了一种手持式扫描仪 Polar，它配备了 IMU、激光雷达和鱼眼相机，便于准确采集数据。利用该极点设备，我们展示了 GarageWorld 数据集，该数据集由八个具有不同几何结构的广阔车库场景组成，我们将公开该数据集，供进一步研究使用。我们的方法表明，Polar 设备收集的激光雷达点云可显著增强用于车库场景建模和渲染的一套三维高斯拼接算法。我们引入了一种新颖的深度规整器，可有效消除渲染图像中的浮动伪影。此外，我们还提出了一种专为细节等级（LOD）渲染设计的多分辨率三维高斯表示法。这包括针对单个级别的调整缩放因子和随机分辨率级别训练方案，以优化不同分辨率的高斯。这种表示法可通过基于网络的渲染器在轻量级设备上高效渲染大规模车库场景。在 GarageWorld 数据集以及 ScanNet++ 和 KITTI-360 上的实验结果表明，我们的方法在渲染质量和资源效率方面都具有优势。

{"title":"LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives","authors":"Jiadi Cui, Junming Cao, Fuqiang Zhao, Zhipeng He, Yifan Chen, Yuhui Zhong, Lan Xu, Yujiao Shi, Yingliang Zhang, Jingyi Yu","doi":"10.1145/3687762","DOIUrl":"https://doi.org/10.1145/3687762","url":null,"abstract":"Large garages are ubiquitous yet intricate scenes that present unique challenges due to their monotonous colors, repetitive patterns, reflective surfaces, and transparent vehicle glass. Conventional Structure from Motion (SfM) methods for camera pose estimation and 3D reconstruction often fail in these environments due to poor correspondence construction. To address these challenges, we introduce LetsGo, a LiDAR-assisted Gaussian splatting framework for large-scale garage modeling and rendering. We develop a handheld scanner, Polar, equipped with IMU, LiDAR, and a fisheye camera, to facilitate accurate data acquisition. Using this Polar device, we present the GarageWorld dataset, consisting of eight expansive garage scenes with diverse geometric structures, which will be made publicly available for further research. Our approach demonstrates that LiDAR point clouds collected by the Polar device significantly enhance a suite of 3D Gaussian splatting algorithms for garage scene modeling and rendering. We introduce a novel depth regularizer that effectively eliminates floating artifacts in rendered images. Additionally, we propose a multi-resolution 3D Gaussian representation designed for Level-of-Detail (LOD) rendering. This includes adapted scaling factors for individual levels and a random-resolution-level training scheme to optimize the Gaussians across different resolutions. This representation enables efficient rendering of large-scale garage scenes on lightweight devices via a web-based renderer. Experimental results on our GarageWorld dataset, as well as on ScanNet++ and KITTI-360, demonstrate the superiority of our method in terms of rendering quality and resource efficiency.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"55 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling ELMO：通过升采样增强实时激光雷达运动捕捉功能

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687991

Deok-Kyeong Jang, Dongseok Yang, Deok-Yun Jang, Byeoli Choi, Sung-Hee Lee, Donghoon Shin

This paper introduces ELMO, a real-time upsampling motion capture framework designed for a single LiDAR sensor. Modeled as a conditional autoregressive transformer-based upsampling motion generator, ELMO achieves 60 fps motion capture from a 20 fps LiDAR point cloud sequence. The key feature of ELMO is the coupling of the self-attention mechanism with thoughtfully designed embedding modules for motion and point clouds, significantly elevating the motion quality. To facilitate accurate motion capture, we develop a one-time skeleton calibration model capable of predicting user skeleton off-sets from a single-frame point cloud. Additionally, we introduce a novel data augmentation technique utilizing a LiDAR simulator, which enhances global root tracking to improve environmental understanding. To demonstrate the effectiveness of our method, we compare ELMO with state-of-the-art methods in both image-based and point cloud-based motion capture. We further conduct an ablation study to validate our design principles. ELMO's fast inference time makes it well-suited for real-time applications, exemplified in our demo video featuring live streaming and interactive gaming scenarios. Furthermore, we contribute a high-quality LiDAR-mocap synchronized dataset comprising 20 different subjects performing a range of motions, which can serve as a valuable resource for future research. The dataset and evaluation code are available at https://movin3d.github.io/ELMO_SIGASIA2024/

本文介绍了专为单个激光雷达传感器设计的实时上采样运动捕捉框架 ELMO。ELMO 以基于条件自回归变压器的上采样运动发生器为模型，通过 20 fps 的激光雷达点云序列实现了 60 fps 的运动捕捉。ELMO 的主要特点是将自保持机制与精心设计的运动和点云嵌入模块相结合，从而大大提高了运动质量。为了促进精确的运动捕捉，我们开发了一种一次性骨骼校准模型，能够从单帧点云预测用户骨骼偏移。此外，我们还利用激光雷达模拟器引入了一种新颖的数据增强技术，该技术可增强全局根跟踪，从而提高对环境的理解。为了证明我们方法的有效性，我们在基于图像和基于点云的运动捕捉中将 ELMO 与最先进的方法进行了比较。我们还进行了一项消融研究，以验证我们的设计原则。ELMO 的快速推理时间使其非常适合实时应用，我们的演示视频中的直播流媒体和互动游戏场景就是很好的例子。此外，我们还提供了一个高质量的激光雷达-mocap同步数据集，其中包括 20 个不同的被试在进行一系列动作时的数据，这可以作为未来研究的宝贵资源。数据集和评估代码可在 https://movin3d.github.io/ELMO_SIGASIA2024/ 上获取。

{"title":"ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling","authors":"Deok-Kyeong Jang, Dongseok Yang, Deok-Yun Jang, Byeoli Choi, Sung-Hee Lee, Donghoon Shin","doi":"10.1145/3687991","DOIUrl":"https://doi.org/10.1145/3687991","url":null,"abstract":"This paper introduces ELMO, a real-time upsampling motion capture framework designed for a single LiDAR sensor. Modeled as a conditional autoregressive transformer-based upsampling motion generator, ELMO achieves 60 fps motion capture from a 20 fps LiDAR point cloud sequence. The key feature of ELMO is the coupling of the self-attention mechanism with thoughtfully designed embedding modules for motion and point clouds, significantly elevating the motion quality. To facilitate accurate motion capture, we develop a one-time skeleton calibration model capable of predicting user skeleton off-sets from a single-frame point cloud. Additionally, we introduce a novel data augmentation technique utilizing a LiDAR simulator, which enhances global root tracking to improve environmental understanding. To demonstrate the effectiveness of our method, we compare ELMO with state-of-the-art methods in both image-based and point cloud-based motion capture. We further conduct an ablation study to validate our design principles. ELMO's fast inference time makes it well-suited for real-time applications, exemplified in our demo video featuring live streaming and interactive gaming scenarios. Furthermore, we contribute a high-quality LiDAR-mocap synchronized dataset comprising 20 different subjects performing a range of motions, which can serve as a valuable resource for future research. The dataset and evaluation code are available at https://movin3d.github.io/ELMO_SIGASIA2024/","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"36 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Time-Dependent Inclusion-Based Method for Continuous Collision Detection between Parametric Surfaces 参数曲面间连续碰撞检测的时变包含法

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687960

Xuwen Chen, Cheng Yu, Xingyu Ni, Mengyu Chu, Bin Wang, Baoquan Chen

Continuous collision detection (CCD) between parametric surfaces is typically formulated as a five-dimensional constrained optimization problem. In the field of CAD and computer graphics, common approaches to solving this problem rely on linearization or sampling strategies. Alternatively, inclusion-based techniques detect collisions by employing 5D inclusion functions, which are typically designed to represent the swept volumes of parametric surfaces over a given time span, and narrowing down the earliest collision moment through subdivision in both spatial and temporal dimensions. However, when high detection accuracy is required, all these approaches significantly increases computational consumption due to the high-dimensional searching space. In this work, we develop a new time-dependent inclusion-based CCD framework that eliminates the need for temporal subdivision and can speedup conventional methods by a factor ranging from 36 to 138. To achieve this, we propose a novel time-dependent inclusion function that provides a continuous representation of a moving surface, along with a corresponding intersection detection algorithm that quickly identifies the time intervals when collisions are likely to occur. We validate our method across various primitive types, demonstrate its efficacy within the simulation pipeline and show that it significantly improves CCD efficiency while maintaining accuracy.

参数曲面之间的连续碰撞检测（CCD）通常被表述为一个五维约束优化问题。在 CAD 和计算机图形学领域，解决这一问题的常见方法依赖于线性化或采样策略。另外，基于包含的技术通过采用五维包含函数来检测碰撞，这些函数通常用于表示参数曲面在给定时间跨度内的扫过体积，并通过空间和时间维度的细分来缩小最早碰撞时刻的范围。然而，当需要高检测精度时，所有这些方法都会因高维搜索空间而大大增加计算消耗。在这项工作中，我们开发了一种新的基于时间依赖性包含的 CCD 框架，无需进行时间细分，可将传统方法的速度提高 36 到 138 倍。为实现这一目标，我们提出了一种新颖的随时间变化的包含函数，该函数提供了运动表面的连续表示，同时还提出了相应的交叉点检测算法，该算法可快速识别可能发生碰撞的时间间隔。我们在各种基元类型中验证了我们的方法，证明了它在模拟管道中的功效，并表明它在保持精度的同时显著提高了 CCD 效率。

{"title":"A Time-Dependent Inclusion-Based Method for Continuous Collision Detection between Parametric Surfaces","authors":"Xuwen Chen, Cheng Yu, Xingyu Ni, Mengyu Chu, Bin Wang, Baoquan Chen","doi":"10.1145/3687960","DOIUrl":"https://doi.org/10.1145/3687960","url":null,"abstract":"Continuous collision detection (CCD) between parametric surfaces is typically formulated as a five-dimensional constrained optimization problem. In the field of CAD and computer graphics, common approaches to solving this problem rely on linearization or sampling strategies. Alternatively, inclusion-based techniques detect collisions by employing 5D inclusion functions, which are typically designed to represent the swept volumes of parametric surfaces over a given time span, and narrowing down the earliest collision moment through subdivision in both spatial and temporal dimensions. However, when high detection accuracy is required, all these approaches significantly increases computational consumption due to the high-dimensional searching space. In this work, we develop a new time-dependent inclusion-based CCD framework that eliminates the need for temporal subdivision and can speedup conventional methods by a factor ranging from 36 to 138. To achieve this, we propose a novel time-dependent inclusion function that provides a continuous representation of a moving surface, along with a corresponding intersection detection algorithm that quickly identifies the time intervals when collisions are likely to occur. We validate our method across various primitive types, demonstrate its efficacy within the simulation pipeline and show that it significantly improves CCD efficiency while maintaining accuracy.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"6 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Look Ma, no markers: holistic performance capture without the hassle 看马，无标记：整体性能捕捉，无需麻烦

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687772

Charlie Hewitt, Fatemeh Saleh, Sadegh Aliakbarian, Lohit Petikam, Shideh Rezaeifar, Louis Florentin, Zafiirah Hosenie, Thomas J. Cashman, Julien Valentin, Darren Cosker, Tadas Baltrusaitis

We tackle the problem of highly-accurate, holistic performance capture for the face, body and hands simultaneously. Motion-capture technologies used in film and game production typically focus only on face, body or hand capture independently, involve complex and expensive hardware and a high degree of manual intervention from skilled operators. While machine-learning-based approaches exist to overcome these problems, they usually only support a single camera, often operate on a single part of the body, do not produce precise world-space results, and rarely generalize outside specific contexts. In this work, we introduce the first technique for markerfree, high-quality reconstruction of the complete human body, including eyes and tongue, without requiring any calibration, manual intervention or custom hardware. Our approach produces stable world-space results from arbitrary camera rigs as well as supporting varied capture environments and clothing. We achieve this through a hybrid approach that leverages machine learning models trained exclusively on synthetic data and powerful parametric models of human shape and motion. We evaluate our method on a number of body, face and hand reconstruction benchmarks and demonstrate state-of-the-art results that generalize on diverse datasets.

我们同时解决面部、身体和手部的高精度、整体性表演捕捉问题。电影和游戏制作中使用的动作捕捉技术通常只关注脸部、身体或手部的独立捕捉，涉及复杂昂贵的硬件和熟练操作员的高度人工干预。虽然基于机器学习的方法可以克服这些问题，但它们通常只支持单台摄像机，通常只对身体的单一部位进行操作，不能产生精确的世界空间结果，而且很少能在特定环境之外进行推广。在这项工作中，我们首次提出了无需校准、人工干预或定制硬件，就能对包括眼睛和舌头在内的整个人体进行无标记、高质量重建的技术。我们的方法可从任意摄像机装备中生成稳定的世界空间结果，并支持不同的捕捉环境和服装。我们通过一种混合方法来实现这一目标，该方法利用了专门在合成数据上训练的机器学习模型以及强大的人体形状和运动参数模型。我们在一些身体、面部和手部重建基准上对我们的方法进行了评估，并展示了在不同数据集上通用的最先进结果。

{"title":"Look Ma, no markers: holistic performance capture without the hassle","authors":"Charlie Hewitt, Fatemeh Saleh, Sadegh Aliakbarian, Lohit Petikam, Shideh Rezaeifar, Louis Florentin, Zafiirah Hosenie, Thomas J. Cashman, Julien Valentin, Darren Cosker, Tadas Baltrusaitis","doi":"10.1145/3687772","DOIUrl":"https://doi.org/10.1145/3687772","url":null,"abstract":"We tackle the problem of highly-accurate, holistic performance capture for the face, body and hands simultaneously. Motion-capture technologies used in film and game production typically focus only on face, body or hand capture independently, involve complex and expensive hardware and a high degree of manual intervention from skilled operators. While machine-learning-based approaches exist to overcome these problems, they usually only support a single camera, often operate on a single part of the body, do not produce precise world-space results, and rarely generalize outside specific contexts. In this work, we introduce the first technique for markerfree, high-quality reconstruction of the complete human body, including eyes and tongue, without requiring any calibration, manual intervention or custom hardware. Our approach produces stable world-space results from arbitrary camera rigs as well as supporting varied capture environments and clothing. We achieve this through a hybrid approach that leverages machine learning models trained exclusively on synthetic data and powerful parametric models of human shape and motion. We evaluate our method on a number of body, face and hand reconstruction benchmarks and demonstrate state-of-the-art results that generalize on diverse datasets.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"13 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generative Portrait Shadow Removal 生成肖像阴影消除

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687903

Jae Shin Yoon, Zhixin Shu, Mengwei Ren, Cecilia Zhang, Yannick Hold-Geoffroy, Krishna kumar Singh, He Zhang

We introduce a high-fidelity portrait shadow removal model that can effectively enhance the image of a portrait by predicting its appearance under disturbing shadows and highlights. Portrait shadow removal is a highly ill-posed problem where multiple plausible solutions can be found based on a single image. For example, disentangling complex environmental lighting from original skin color is a non-trivial problem. While existing works have solved this problem by predicting the appearance residuals that can propagate local shadow distribution, such methods are often incomplete and lead to unnatural predictions, especially for portraits with hard shadows. We overcome the limitations of existing local propagation methods by formulating the removal problem as a generation task where a diffusion model learns to globally rebuild the human appearance from scratch as a condition of an input portrait image. For robust and natural shadow removal, we propose to train the diffusion model with a compositional repurposing framework: a pre-trained text-guided image generation model is first fine-tuned to harmonize the lighting and color of the foreground with a background scene by using a background harmonization dataset; and then the model is further fine-tuned to generate a shadow-free portrait image via a shadow-paired dataset. To overcome the limitation of losing fine details in the latent diffusion model, we propose a guided-upsampling network to restore the original high-frequency details (e.g. , wrinkles and dots) from the input image. To enable our compositional training framework, we construct a high-fidelity and large-scale dataset using a lightstage capturing system and synthetic graphics simulation. Our generative framework effectively removes shadows caused by both self and external occlusions while maintaining original lighting distribution and high-frequency details. Our method also demonstrates robustness to diverse subjects captured in real environments.

我们介绍了一种高保真人像阴影去除模型，它可以通过预测人像在阴影和高光干扰下的外观，有效增强人像图像的效果。人像阴影去除是一个高难度问题，根据单张图像可以找到多种可信的解决方案。例如，从原始肤色中分离出复杂的环境光照就是一个非难解决的问题。虽然现有的研究通过预测可以传播局部阴影分布的外观残差来解决这个问题，但这些方法往往不完整，导致预测结果不自然，尤其是对于有硬阴影的肖像。我们克服了现有局部传播方法的局限性，将阴影去除问题表述为一项生成任务，在这项任务中，扩散模型将根据输入肖像图像的条件，学习从头开始全局重建人体外观。为了稳健而自然地去除阴影，我们建议使用组合重用框架来训练扩散模型：首先使用背景协调数据集对预先训练好的文本引导图像生成模型进行微调，以协调前景与背景场景的光照和颜色；然后通过阴影配对数据集进一步微调该模型，以生成无阴影人像图像。为了克服潜在扩散模型丢失精细细节的局限性，我们提出了一种引导上采样网络，以还原输入图像中的原始高频细节（如皱纹和圆点）。为了实现我们的合成训练框架，我们利用光台捕捉系统和合成图形模拟构建了一个高保真、大规模的数据集。我们的生成框架能有效去除自身和外部遮挡造成的阴影，同时保持原始光照分布和高频细节。我们的方法还证明了在真实环境中捕捉到的不同主体的鲁棒性。

{"title":"Generative Portrait Shadow Removal","authors":"Jae Shin Yoon, Zhixin Shu, Mengwei Ren, Cecilia Zhang, Yannick Hold-Geoffroy, Krishna kumar Singh, He Zhang","doi":"10.1145/3687903","DOIUrl":"https://doi.org/10.1145/3687903","url":null,"abstract":"We introduce a high-fidelity portrait shadow removal model that can effectively enhance the image of a portrait by predicting its appearance under disturbing shadows and highlights. Portrait shadow removal is a highly ill-posed problem where multiple plausible solutions can be found based on a single image. For example, disentangling complex environmental lighting from original skin color is a non-trivial problem. While existing works have solved this problem by predicting the appearance residuals that can propagate local shadow distribution, such methods are often incomplete and lead to unnatural predictions, especially for portraits with hard shadows. We overcome the limitations of existing local propagation methods by formulating the removal problem as a generation task where a diffusion model learns to globally rebuild the human appearance from scratch as a condition of an input portrait image. For robust and natural shadow removal, we propose to train the diffusion model with a compositional repurposing framework: a pre-trained text-guided image generation model is first fine-tuned to harmonize the lighting and color of the foreground with a background scene by using a background harmonization dataset; and then the model is further fine-tuned to generate a shadow-free portrait image via a shadow-paired dataset. To overcome the limitation of losing fine details in the latent diffusion model, we propose a guided-upsampling network to restore the original high-frequency details (e.g. , wrinkles and dots) from the input image. To enable our compositional training framework, we construct a high-fidelity and large-scale dataset using a lightstage capturing system and synthetic graphics simulation. Our generative framework effectively removes shadows caused by both self and external occlusions while maintaining original lighting distribution and high-frequency details. Our method also demonstrates robustness to diverse subjects captured in real environments.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"99 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Perspective-Aligned AR Mirror with Under-Display Camera 带显示屏下摄像头的透视配准 AR 镜子

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687995

Jian Wang, Sizhuo Ma, Karl Bayer, Yi Zhang, Peihao Wang, Bing Zhou, Shree Nayar, Gurunandan Krishnan

Augmented reality (AR) mirrors are novel displays that have great potential for commercial applications such as virtual apparel try-on. Typically the camera is placed beside the display, leading to distorted perspectives during user interaction. In this paper, we present a novel approach to address this problem by placing the camera behind a transparent display, thereby providing users with a perspective-aligned experience. Simply placing the camera behind the display can compromise image quality due to optical effects. We meticulously analyze the image formation process, and present an image restoration algorithm that benefits from physics-based data synthesis and network design. Our method significantly improves image quality and outperforms existing methods especially on the underexplored wire and backscatter artifacts. We then carefully design a full AR mirror system including display and camera selection, real-time processing pipeline, and mechanical design. Our user study demonstrates that the system is exceptionally well-received by users, highlighting its advantages over existing camera configurations not only as an AR mirror, but also for video conferencing. Our work represents a step forward in the development of AR mirrors, with potential applications in retail, cosmetics, fashion, etc. The image restoration dataset and code are available at https://perspective-armirror.github.io/.

增强现实（AR）镜是一种新颖的显示器，在虚拟服装试穿等商业应用中具有巨大潜力。通常情况下，摄像头被放置在显示屏旁边，导致用户交互时视角失真。在本文中，我们提出了一种解决这一问题的新方法，即把摄像头放在透明显示屏后面，从而为用户提供视角对齐的体验。由于光学效应，简单地将摄像头置于显示屏后可能会影响图像质量。我们对图像形成过程进行了细致分析，并提出了一种受益于基于物理的数据合成和网络设计的图像修复算法。我们的方法大大提高了图像质量，并优于现有方法，特别是在未充分开发的线和反向散射伪影方面。然后，我们精心设计了一个完整的 AR 镜系统，包括显示器和摄像头的选择、实时处理管道和机械设计。我们的用户研究表明，该系统受到了用户的极大欢迎，凸显了其相对于现有摄像头配置的优势，不仅可用作 AR 镜，还可用于视频会议。我们的工作标志着 AR 镜的开发向前迈进了一步，有望应用于零售、化妆品、时尚等领域。图像还原数据集和代码可在 https://perspective-armirror.github.io/ 上获取。

{"title":"Perspective-Aligned AR Mirror with Under-Display Camera","authors":"Jian Wang, Sizhuo Ma, Karl Bayer, Yi Zhang, Peihao Wang, Bing Zhou, Shree Nayar, Gurunandan Krishnan","doi":"10.1145/3687995","DOIUrl":"https://doi.org/10.1145/3687995","url":null,"abstract":"Augmented reality (AR) mirrors are novel displays that have great potential for commercial applications such as virtual apparel try-on. Typically the camera is placed beside the display, leading to distorted perspectives during user interaction. In this paper, we present a novel approach to address this problem by placing the camera behind a transparent display, thereby providing users with a perspective-aligned experience. Simply placing the camera behind the display can compromise image quality due to optical effects. We meticulously analyze the image formation process, and present an image restoration algorithm that benefits from physics-based data synthesis and network design. Our method significantly improves image quality and outperforms existing methods especially on the underexplored wire and backscatter artifacts. We then carefully design a full AR mirror system including display and camera selection, real-time processing pipeline, and mechanical design. Our user study demonstrates that the system is exceptionally well-received by users, highlighting its advantages over existing camera configurations not only as an AR mirror, but also for video conferencing. Our work represents a step forward in the development of AR mirrors, with potential applications in retail, cosmetics, fashion, etc. The image restoration dataset and code are available at https://perspective-armirror.github.io/.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"36 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MiNNIE: a Mixed Multigrid Method for Real-time Simulation of Nonlinear Near-Incompressible Elastics MiNNIE：用于非线性近不可压缩弹性体实时模拟的混合多网格方法

IF 6.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Graphics

Pub Date : 2024-11-19 DOI: 10.1145/3687758

Liangwang Ruan, Bin Wang, Tiantian Liu, Baoquan Chen

We propose MiNNIE, a simple yet comprehensive framework for real-time simulation of nonlinear near-incompressible elastics. To avoid the common volumetric locking issues at high Poisson's ratios of linear finite element methods (FEM), we build MiNNIE upon a mixed FEM framework and further incorporate a pressure stabilization term to ensure excellent convergence of multigrid solvers. Our pressure stabilization strategy injects bounded influence on nodal displacement which can be eliminated using a quasiNewton method. MiNNIE has a specially tailored GPU multigrid solver including a modified skinning-space interpolation scheme, a novel vertex Vanka smoother, and an efficient dense solver using Schur complement. MiNNIE supports various elastic material models and simulates them in real-time, supporting a full range of Poisson's ratios up to 0.5 while handling large deformations, element inversions, and self-collisions at the same time.

我们提出的 MiNNIE 是一种简单而全面的非线性近可压缩弹性体实时模拟框架。为了避免线性有限元方法（FEM）在高泊松比时常见的体积锁定问题，我们在混合有限元框架的基础上构建了 MiNNIE，并进一步加入了压力稳定项，以确保多网格求解器的出色收敛性。我们的压力稳定策略会对节点位移产生有界影响，这种影响可以通过准牛顿方法消除。MiNNIE 有一个专门定制的 GPU 多网格求解器，包括一个改进的趋肤空间插值方案、一个新颖的顶点 Vanka 平滑器和一个使用舒尔补法的高效密集求解器。MiNNIE 支持各种弹性材料模型并对其进行实时仿真，支持高达 0.5 的各种泊松比，同时还能处理大变形、元素反转和自碰撞。

引用次数: 0