Recently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, while allowing the rendering of high-resolution images in real-time. However, leveraging 3D Gaussians for surface reconstruction poses significant challenges due to the explicit and disconnected nature of 3D Gaussians. In this work, we present Gaussian Opacity Fields (GOF), a novel approach for efficient, high-quality, and adaptive surface reconstruction in unbounded scenes. Our GOF is derived from ray-tracing-based volume rendering of 3D Gaussians, enabling direct geometry extraction from 3D Gaussians by identifying its levelset, without resorting to Poisson reconstruction or TSDF fusion as in previous work. We approximate the surface normal of Gaussians as the normal of the ray-Gaussian intersection plane, enabling the application of regularization that significantly enhances geometry. Furthermore, we develop an efficient geometry extraction method utilizing Marching Tetrahedra, where the tetrahedral grids are induced from 3D Gaussians and thus adapt to the scene's complexity. Our evaluations reveal that GOF surpasses existing 3DGS-based methods in surface reconstruction and novel view synthesis. Further, it compares favorably to or even outperforms, neural implicit methods in both quality and speed.
{"title":"Gaussian Opacity Fields: Efficient Adaptive Surface Reconstruction in Unbounded Scenes","authors":"Zehao Yu, Torsten Sattler, Andreas Geiger","doi":"10.1145/3687937","DOIUrl":"https://doi.org/10.1145/3687937","url":null,"abstract":"Recently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, while allowing the rendering of high-resolution images in real-time. However, leveraging 3D Gaussians for surface reconstruction poses significant challenges due to the explicit and disconnected nature of 3D Gaussians. In this work, we present Gaussian Opacity Fields (GOF), a novel approach for efficient, high-quality, and adaptive surface reconstruction in unbounded scenes. Our GOF is derived from ray-tracing-based volume rendering of 3D Gaussians, enabling direct geometry extraction from 3D Gaussians by identifying its levelset, without resorting to Poisson reconstruction or TSDF fusion as in previous work. We approximate the surface normal of Gaussians as the normal of the ray-Gaussian intersection plane, enabling the application of regularization that significantly enhances geometry. Furthermore, we develop an efficient geometry extraction method utilizing Marching Tetrahedra, where the tetrahedral grids are induced from 3D Gaussians and thus adapt to the scene's complexity. Our evaluations reveal that GOF surpasses existing 3DGS-based methods in surface reconstruction and novel view synthesis. Further, it compares favorably to or even outperforms, neural implicit methods in both quality and speed.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"33 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large garages are ubiquitous yet intricate scenes that present unique challenges due to their monotonous colors, repetitive patterns, reflective surfaces, and transparent vehicle glass. Conventional Structure from Motion (SfM) methods for camera pose estimation and 3D reconstruction often fail in these environments due to poor correspondence construction. To address these challenges, we introduce LetsGo, a LiDAR-assisted Gaussian splatting framework for large-scale garage modeling and rendering. We develop a handheld scanner, Polar, equipped with IMU, LiDAR, and a fisheye camera, to facilitate accurate data acquisition. Using this Polar device, we present the GarageWorld dataset, consisting of eight expansive garage scenes with diverse geometric structures, which will be made publicly available for further research. Our approach demonstrates that LiDAR point clouds collected by the Polar device significantly enhance a suite of 3D Gaussian splatting algorithms for garage scene modeling and rendering. We introduce a novel depth regularizer that effectively eliminates floating artifacts in rendered images. Additionally, we propose a multi-resolution 3D Gaussian representation designed for Level-of-Detail (LOD) rendering. This includes adapted scaling factors for individual levels and a random-resolution-level training scheme to optimize the Gaussians across different resolutions. This representation enables efficient rendering of large-scale garage scenes on lightweight devices via a web-based renderer. Experimental results on our GarageWorld dataset, as well as on ScanNet++ and KITTI-360, demonstrate the superiority of our method in terms of rendering quality and resource efficiency.
{"title":"LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives","authors":"Jiadi Cui, Junming Cao, Fuqiang Zhao, Zhipeng He, Yifan Chen, Yuhui Zhong, Lan Xu, Yujiao Shi, Yingliang Zhang, Jingyi Yu","doi":"10.1145/3687762","DOIUrl":"https://doi.org/10.1145/3687762","url":null,"abstract":"Large garages are ubiquitous yet intricate scenes that present unique challenges due to their monotonous colors, repetitive patterns, reflective surfaces, and transparent vehicle glass. Conventional Structure from Motion (SfM) methods for camera pose estimation and 3D reconstruction often fail in these environments due to poor correspondence construction. To address these challenges, we introduce LetsGo, a LiDAR-assisted Gaussian splatting framework for large-scale garage modeling and rendering. We develop a handheld scanner, Polar, equipped with IMU, LiDAR, and a fisheye camera, to facilitate accurate data acquisition. Using this Polar device, we present the GarageWorld dataset, consisting of eight expansive garage scenes with diverse geometric structures, which will be made publicly available for further research. Our approach demonstrates that LiDAR point clouds collected by the Polar device significantly enhance a suite of 3D Gaussian splatting algorithms for garage scene modeling and rendering. We introduce a novel depth regularizer that effectively eliminates floating artifacts in rendered images. Additionally, we propose a multi-resolution 3D Gaussian representation designed for Level-of-Detail (LOD) rendering. This includes adapted scaling factors for individual levels and a random-resolution-level training scheme to optimize the Gaussians across different resolutions. This representation enables efficient rendering of large-scale garage scenes on lightweight devices via a web-based renderer. Experimental results on our GarageWorld dataset, as well as on ScanNet++ and KITTI-360, demonstrate the superiority of our method in terms of rendering quality and resource efficiency.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"55 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces ELMO, a real-time upsampling motion capture framework designed for a single LiDAR sensor. Modeled as a conditional autoregressive transformer-based upsampling motion generator, ELMO achieves 60 fps motion capture from a 20 fps LiDAR point cloud sequence. The key feature of ELMO is the coupling of the self-attention mechanism with thoughtfully designed embedding modules for motion and point clouds, significantly elevating the motion quality. To facilitate accurate motion capture, we develop a one-time skeleton calibration model capable of predicting user skeleton off-sets from a single-frame point cloud. Additionally, we introduce a novel data augmentation technique utilizing a LiDAR simulator, which enhances global root tracking to improve environmental understanding. To demonstrate the effectiveness of our method, we compare ELMO with state-of-the-art methods in both image-based and point cloud-based motion capture. We further conduct an ablation study to validate our design principles. ELMO's fast inference time makes it well-suited for real-time applications, exemplified in our demo video featuring live streaming and interactive gaming scenarios. Furthermore, we contribute a high-quality LiDAR-mocap synchronized dataset comprising 20 different subjects performing a range of motions, which can serve as a valuable resource for future research. The dataset and evaluation code are available at https://movin3d.github.io/ELMO_SIGASIA2024/
{"title":"ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling","authors":"Deok-Kyeong Jang, Dongseok Yang, Deok-Yun Jang, Byeoli Choi, Sung-Hee Lee, Donghoon Shin","doi":"10.1145/3687991","DOIUrl":"https://doi.org/10.1145/3687991","url":null,"abstract":"This paper introduces ELMO, a real-time upsampling motion capture framework designed for a single LiDAR sensor. Modeled as a conditional autoregressive transformer-based upsampling motion generator, ELMO achieves 60 fps motion capture from a 20 fps LiDAR point cloud sequence. The key feature of ELMO is the coupling of the self-attention mechanism with thoughtfully designed embedding modules for motion and point clouds, significantly elevating the motion quality. To facilitate accurate motion capture, we develop a one-time skeleton calibration model capable of predicting user skeleton off-sets from a single-frame point cloud. Additionally, we introduce a novel data augmentation technique utilizing a LiDAR simulator, which enhances global root tracking to improve environmental understanding. To demonstrate the effectiveness of our method, we compare ELMO with state-of-the-art methods in both image-based and point cloud-based motion capture. We further conduct an ablation study to validate our design principles. ELMO's fast inference time makes it well-suited for real-time applications, exemplified in our demo video featuring live streaming and interactive gaming scenarios. Furthermore, we contribute a high-quality LiDAR-mocap synchronized dataset comprising 20 different subjects performing a range of motions, which can serve as a valuable resource for future research. The dataset and evaluation code are available at https://movin3d.github.io/ELMO_SIGASIA2024/","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"36 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Continuous collision detection (CCD) between parametric surfaces is typically formulated as a five-dimensional constrained optimization problem. In the field of CAD and computer graphics, common approaches to solving this problem rely on linearization or sampling strategies. Alternatively, inclusion-based techniques detect collisions by employing 5D inclusion functions, which are typically designed to represent the swept volumes of parametric surfaces over a given time span, and narrowing down the earliest collision moment through subdivision in both spatial and temporal dimensions. However, when high detection accuracy is required, all these approaches significantly increases computational consumption due to the high-dimensional searching space. In this work, we develop a new time-dependent inclusion-based CCD framework that eliminates the need for temporal subdivision and can speedup conventional methods by a factor ranging from 36 to 138. To achieve this, we propose a novel time-dependent inclusion function that provides a continuous representation of a moving surface, along with a corresponding intersection detection algorithm that quickly identifies the time intervals when collisions are likely to occur. We validate our method across various primitive types, demonstrate its efficacy within the simulation pipeline and show that it significantly improves CCD efficiency while maintaining accuracy.
{"title":"A Time-Dependent Inclusion-Based Method for Continuous Collision Detection between Parametric Surfaces","authors":"Xuwen Chen, Cheng Yu, Xingyu Ni, Mengyu Chu, Bin Wang, Baoquan Chen","doi":"10.1145/3687960","DOIUrl":"https://doi.org/10.1145/3687960","url":null,"abstract":"Continuous collision detection (CCD) between parametric surfaces is typically formulated as a five-dimensional constrained optimization problem. In the field of CAD and computer graphics, common approaches to solving this problem rely on linearization or sampling strategies. Alternatively, inclusion-based techniques detect collisions by employing 5D inclusion functions, which are typically designed to represent the swept volumes of parametric surfaces over a given time span, and narrowing down the earliest collision moment through subdivision in both spatial and temporal dimensions. However, when high detection accuracy is required, all these approaches significantly increases computational consumption due to the high-dimensional searching space. In this work, we develop a new time-dependent inclusion-based CCD framework that eliminates the need for temporal subdivision and can speedup conventional methods by a factor ranging from 36 to 138. To achieve this, we propose a novel time-dependent inclusion function that provides a continuous representation of a moving surface, along with a corresponding intersection detection algorithm that quickly identifies the time intervals when collisions are likely to occur. We validate our method across various primitive types, demonstrate its efficacy within the simulation pipeline and show that it significantly improves CCD efficiency while maintaining accuracy.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"6 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charlie Hewitt, Fatemeh Saleh, Sadegh Aliakbarian, Lohit Petikam, Shideh Rezaeifar, Louis Florentin, Zafiirah Hosenie, Thomas J. Cashman, Julien Valentin, Darren Cosker, Tadas Baltrusaitis
We tackle the problem of highly-accurate, holistic performance capture for the face, body and hands simultaneously. Motion-capture technologies used in film and game production typically focus only on face, body or hand capture independently, involve complex and expensive hardware and a high degree of manual intervention from skilled operators. While machine-learning-based approaches exist to overcome these problems, they usually only support a single camera, often operate on a single part of the body, do not produce precise world-space results, and rarely generalize outside specific contexts. In this work, we introduce the first technique for markerfree, high-quality reconstruction of the complete human body, including eyes and tongue, without requiring any calibration, manual intervention or custom hardware. Our approach produces stable world-space results from arbitrary camera rigs as well as supporting varied capture environments and clothing. We achieve this through a hybrid approach that leverages machine learning models trained exclusively on synthetic data and powerful parametric models of human shape and motion. We evaluate our method on a number of body, face and hand reconstruction benchmarks and demonstrate state-of-the-art results that generalize on diverse datasets.
{"title":"Look Ma, no markers: holistic performance capture without the hassle","authors":"Charlie Hewitt, Fatemeh Saleh, Sadegh Aliakbarian, Lohit Petikam, Shideh Rezaeifar, Louis Florentin, Zafiirah Hosenie, Thomas J. Cashman, Julien Valentin, Darren Cosker, Tadas Baltrusaitis","doi":"10.1145/3687772","DOIUrl":"https://doi.org/10.1145/3687772","url":null,"abstract":"We tackle the problem of highly-accurate, holistic performance capture for the face, body and hands simultaneously. Motion-capture technologies used in film and game production typically focus only on face, body or hand capture independently, involve complex and expensive hardware and a high degree of manual intervention from skilled operators. While machine-learning-based approaches exist to overcome these problems, they usually only support a single camera, often operate on a single part of the body, do not produce precise world-space results, and rarely generalize outside specific contexts. In this work, we introduce the first technique for markerfree, high-quality reconstruction of the complete human body, including eyes and tongue, without requiring any calibration, manual intervention or custom hardware. Our approach produces stable world-space results from arbitrary camera rigs as well as supporting varied capture environments and clothing. We achieve this through a hybrid approach that leverages machine learning models trained exclusively on synthetic data and powerful parametric models of human shape and motion. We evaluate our method on a number of body, face and hand reconstruction benchmarks and demonstrate state-of-the-art results that generalize on diverse datasets.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"13 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a high-fidelity portrait shadow removal model that can effectively enhance the image of a portrait by predicting its appearance under disturbing shadows and highlights. Portrait shadow removal is a highly ill-posed problem where multiple plausible solutions can be found based on a single image. For example, disentangling complex environmental lighting from original skin color is a non-trivial problem. While existing works have solved this problem by predicting the appearance residuals that can propagate local shadow distribution, such methods are often incomplete and lead to unnatural predictions, especially for portraits with hard shadows. We overcome the limitations of existing local propagation methods by formulating the removal problem as a generation task where a diffusion model learns to globally rebuild the human appearance from scratch as a condition of an input portrait image. For robust and natural shadow removal, we propose to train the diffusion model with a compositional repurposing framework: a pre-trained text-guided image generation model is first fine-tuned to harmonize the lighting and color of the foreground with a background scene by using a background harmonization dataset; and then the model is further fine-tuned to generate a shadow-free portrait image via a shadow-paired dataset. To overcome the limitation of losing fine details in the latent diffusion model, we propose a guided-upsampling network to restore the original high-frequency details (e.g. , wrinkles and dots) from the input image. To enable our compositional training framework, we construct a high-fidelity and large-scale dataset using a lightstage capturing system and synthetic graphics simulation. Our generative framework effectively removes shadows caused by both self and external occlusions while maintaining original lighting distribution and high-frequency details. Our method also demonstrates robustness to diverse subjects captured in real environments.
{"title":"Generative Portrait Shadow Removal","authors":"Jae Shin Yoon, Zhixin Shu, Mengwei Ren, Cecilia Zhang, Yannick Hold-Geoffroy, Krishna kumar Singh, He Zhang","doi":"10.1145/3687903","DOIUrl":"https://doi.org/10.1145/3687903","url":null,"abstract":"We introduce a high-fidelity portrait shadow removal model that can effectively enhance the image of a portrait by predicting its appearance under disturbing shadows and highlights. Portrait shadow removal is a highly ill-posed problem where multiple plausible solutions can be found based on a single image. For example, disentangling complex environmental lighting from original skin color is a non-trivial problem. While existing works have solved this problem by predicting the appearance residuals that can propagate local shadow distribution, such methods are often incomplete and lead to unnatural predictions, especially for portraits with hard shadows. We overcome the limitations of existing local propagation methods by formulating the removal problem as a generation task where a diffusion model learns to globally rebuild the human appearance from scratch as a condition of an input portrait image. For robust and natural shadow removal, we propose to train the diffusion model with a compositional repurposing framework: a pre-trained text-guided image generation model is first fine-tuned to harmonize the lighting and color of the foreground with a background scene by using a background harmonization dataset; and then the model is further fine-tuned to generate a shadow-free portrait image via a shadow-paired dataset. To overcome the limitation of losing fine details in the latent diffusion model, we propose a guided-upsampling network to restore the original high-frequency details <jats:italic>(e.g.</jats:italic> , wrinkles and dots) from the input image. To enable our compositional training framework, we construct a high-fidelity and large-scale dataset using a lightstage capturing system and synthetic graphics simulation. Our generative framework effectively removes shadows caused by both self and external occlusions while maintaining original lighting distribution and high-frequency details. Our method also demonstrates robustness to diverse subjects captured in real environments.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"99 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jian Wang, Sizhuo Ma, Karl Bayer, Yi Zhang, Peihao Wang, Bing Zhou, Shree Nayar, Gurunandan Krishnan
Augmented reality (AR) mirrors are novel displays that have great potential for commercial applications such as virtual apparel try-on. Typically the camera is placed beside the display, leading to distorted perspectives during user interaction. In this paper, we present a novel approach to address this problem by placing the camera behind a transparent display, thereby providing users with a perspective-aligned experience. Simply placing the camera behind the display can compromise image quality due to optical effects. We meticulously analyze the image formation process, and present an image restoration algorithm that benefits from physics-based data synthesis and network design. Our method significantly improves image quality and outperforms existing methods especially on the underexplored wire and backscatter artifacts. We then carefully design a full AR mirror system including display and camera selection, real-time processing pipeline, and mechanical design. Our user study demonstrates that the system is exceptionally well-received by users, highlighting its advantages over existing camera configurations not only as an AR mirror, but also for video conferencing. Our work represents a step forward in the development of AR mirrors, with potential applications in retail, cosmetics, fashion, etc. The image restoration dataset and code are available at https://perspective-armirror.github.io/.
增强现实(AR)镜是一种新颖的显示器,在虚拟服装试穿等商业应用中具有巨大潜力。通常情况下,摄像头被放置在显示屏旁边,导致用户交互时视角失真。在本文中,我们提出了一种解决这一问题的新方法,即把摄像头放在透明显示屏后面,从而为用户提供视角对齐的体验。由于光学效应,简单地将摄像头置于显示屏后可能会影响图像质量。我们对图像形成过程进行了细致分析,并提出了一种受益于基于物理的数据合成和网络设计的图像修复算法。我们的方法大大提高了图像质量,并优于现有方法,特别是在未充分开发的线和反向散射伪影方面。然后,我们精心设计了一个完整的 AR 镜系统,包括显示器和摄像头的选择、实时处理管道和机械设计。我们的用户研究表明,该系统受到了用户的极大欢迎,凸显了其相对于现有摄像头配置的优势,不仅可用作 AR 镜,还可用于视频会议。我们的工作标志着 AR 镜的开发向前迈进了一步,有望应用于零售、化妆品、时尚等领域。图像还原数据集和代码可在 https://perspective-armirror.github.io/ 上获取。
{"title":"Perspective-Aligned AR Mirror with Under-Display Camera","authors":"Jian Wang, Sizhuo Ma, Karl Bayer, Yi Zhang, Peihao Wang, Bing Zhou, Shree Nayar, Gurunandan Krishnan","doi":"10.1145/3687995","DOIUrl":"https://doi.org/10.1145/3687995","url":null,"abstract":"Augmented reality (AR) mirrors are novel displays that have great potential for commercial applications such as virtual apparel try-on. Typically the camera is placed beside the display, leading to distorted perspectives during user interaction. In this paper, we present a novel approach to address this problem by placing the camera behind a transparent display, thereby providing users with a perspective-aligned experience. Simply placing the camera behind the display can compromise image quality due to optical effects. We meticulously analyze the image formation process, and present an image restoration algorithm that benefits from physics-based data synthesis and network design. Our method significantly improves image quality and outperforms existing methods especially on the underexplored wire and backscatter artifacts. We then carefully design a full AR mirror system including display and camera selection, real-time processing pipeline, and mechanical design. Our user study demonstrates that the system is exceptionally well-received by users, highlighting its advantages over existing camera configurations not only as an AR mirror, but also for video conferencing. Our work represents a step forward in the development of AR mirrors, with potential applications in retail, cosmetics, fashion, <jats:italic>etc.</jats:italic> The image restoration dataset and code are available at https://perspective-armirror.github.io/.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"36 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liangwang Ruan, Bin Wang, Tiantian Liu, Baoquan Chen
We propose MiNNIE, a simple yet comprehensive framework for real-time simulation of nonlinear near-incompressible elastics. To avoid the common volumetric locking issues at high Poisson's ratios of linear finite element methods (FEM), we build MiNNIE upon a mixed FEM framework and further incorporate a pressure stabilization term to ensure excellent convergence of multigrid solvers. Our pressure stabilization strategy injects bounded influence on nodal displacement which can be eliminated using a quasiNewton method. MiNNIE has a specially tailored GPU multigrid solver including a modified skinning-space interpolation scheme, a novel vertex Vanka smoother, and an efficient dense solver using Schur complement. MiNNIE supports various elastic material models and simulates them in real-time, supporting a full range of Poisson's ratios up to 0.5 while handling large deformations, element inversions, and self-collisions at the same time.
{"title":"MiNNIE: a Mixed Multigrid Method for Real-time Simulation of Nonlinear Near-Incompressible Elastics","authors":"Liangwang Ruan, Bin Wang, Tiantian Liu, Baoquan Chen","doi":"10.1145/3687758","DOIUrl":"https://doi.org/10.1145/3687758","url":null,"abstract":"We propose MiNNIE, a simple yet comprehensive framework for real-time simulation of nonlinear near-incompressible elastics. To avoid the common volumetric locking issues at high Poisson's ratios of linear finite element methods (FEM), we build MiNNIE upon a mixed FEM framework and further incorporate a pressure stabilization term to ensure excellent convergence of multigrid solvers. Our pressure stabilization strategy injects bounded influence on nodal displacement which can be eliminated using a quasiNewton method. MiNNIE has a specially tailored GPU multigrid solver including a modified skinning-space interpolation scheme, a novel vertex Vanka smoother, and an efficient dense solver using Schur complement. MiNNIE supports various elastic material models and simulates them in real-time, supporting a full range of Poisson's ratios up to 0.5 while handling large deformations, element inversions, and self-collisions at the same time.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"251 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Wang, Lianghao Zhang, Fangzhou Gao, Yuzhen Kang, Jiawan Zhang
Recovering spatial-varying bi-directional reflectance distribution function (SVBRDF) from a few hand-held captured images has been a challenging task in computer graphics. Benefiting from the learned priors from data, single-image methods can obtain plausible SVBRDF estimation results. However, the extremely limited appearance information in a single image does not suffice for high-quality SVBRDF reconstruction. Although increasing the number of inputs can improve the reconstruction quality, it also affects the efficiency of real data capture and adds significant computational burdens. Therefore, the key challenge is to minimize the required number of inputs, while keeping high-quality results. To address this, we propose maximizing the effective information in each input through a novel co-located capture strategy that combines near-field and far-field point lighting. To further enhance effectiveness, we theoretically investigate the inherent relation between two images. The extracted relation is strongly correlated with the slope of specular reflectance, substantially enhancing the precision of roughness map estimation. Additionally, we designed the registration and denoising modules to meet the practical requirements of hand-held capture. Quantitative assessments and qualitative analysis have demonstrated that our method achieves superior SVBRDF estimations compared to previous approaches. All source codes will be publicly released.
{"title":"NFPLight: Deep SVBRDF Estimation via the Combination of Near and Far Field Point Lighting","authors":"Li Wang, Lianghao Zhang, Fangzhou Gao, Yuzhen Kang, Jiawan Zhang","doi":"10.1145/3687978","DOIUrl":"https://doi.org/10.1145/3687978","url":null,"abstract":"Recovering spatial-varying bi-directional reflectance distribution function (SVBRDF) from a few hand-held captured images has been a challenging task in computer graphics. Benefiting from the learned priors from data, single-image methods can obtain plausible SVBRDF estimation results. However, the extremely limited appearance information in a single image does not suffice for high-quality SVBRDF reconstruction. Although increasing the number of inputs can improve the reconstruction quality, it also affects the efficiency of real data capture and adds significant computational burdens. Therefore, the key challenge is to minimize the required number of inputs, while keeping high-quality results. To address this, we propose maximizing the effective information in each input through a novel co-located capture strategy that combines near-field and far-field point lighting. To further enhance effectiveness, we theoretically investigate the inherent relation between two images. The extracted relation is strongly correlated with the slope of specular reflectance, substantially enhancing the precision of roughness map estimation. Additionally, we designed the registration and denoising modules to meet the practical requirements of hand-held capture. Quantitative assessments and qualitative analysis have demonstrated that our method achieves superior SVBRDF estimations compared to previous approaches. All source codes will be publicly released.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"55 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jenny Han Lin, Yuka Ikarashi, Gilbert Louis Bernstein, James McCann
Programming low-level controls for knitting machines is a meticulous, time-consuming task that demands specialized expertise. Recently, there has been a shift towards automatically generating low-level knitting machine programs from high-level knit representations that describe knit objects in a more intuitive, user-friendly way. Current high-level systems trade off expressivity for ease-of-use, requiring ad-hoc trapdoors to access the full space of machine capabilities, or eschewing completeness in the name of utility. Thus, advanced techniques either require ad-hoc extensions from domain experts, or are entirely unsupported. Furthermore, errors may emerge during the compilation from knit object representations to machine instructions. While the generated program may describe a valid machine control sequence, the fabricated object is topologically different from the specified input, with little recourse for understanding and fixing the issue. To address these limitations, we introduce instruction graphs , an intermediate representation capable of capturing the full range of machine knitting programs. We define a semantic mapping from instruction graphs to fenced tangles, which make them compatible with the established formal semantics for machine knitting instructions. We establish a semantics-preserving bijection between machine knittable instruction graphs and knit programs that proves three properties - upward, forward, and ordered (UFO) - are both necessary and sufficient to ensure the existence of a machine knitting program that can fabricate the fenced tangle denoted by the graph. As a proof-of-concept, we implement an instruction graph editor and compiler that allows a user to transform an instruction graph into UFO presentation and then compile it to a machine program, all while maintaining semantic equivalence. In addition, we use the UFO properties to more precisely characterize the limitations of existing compilers. This work lays the groundwork for more expressive and reliable automated knitting machine programming systems by providing a formal characterization of machine knittability.
{"title":"UFO Instruction Graphs Are Machine Knittable","authors":"Jenny Han Lin, Yuka Ikarashi, Gilbert Louis Bernstein, James McCann","doi":"10.1145/3687948","DOIUrl":"https://doi.org/10.1145/3687948","url":null,"abstract":"Programming low-level controls for knitting machines is a meticulous, time-consuming task that demands specialized expertise. Recently, there has been a shift towards automatically generating low-level knitting machine programs from high-level knit representations that describe knit objects in a more intuitive, user-friendly way. Current high-level systems trade off expressivity for ease-of-use, requiring ad-hoc trapdoors to access the full space of machine capabilities, or eschewing completeness in the name of utility. Thus, advanced techniques either require ad-hoc extensions from domain experts, or are entirely unsupported. Furthermore, errors may emerge during the compilation from knit object representations to machine instructions. While the generated program may describe a valid machine control sequence, the fabricated object is topologically different from the specified input, with little recourse for understanding and fixing the issue. To address these limitations, we introduce <jats:italic>instruction graphs</jats:italic> , an intermediate representation capable of capturing the full range of machine knitting programs. We define a semantic mapping from instruction graphs to fenced tangles, which make them compatible with the established formal semantics for machine knitting instructions. We establish a semantics-preserving bijection between machine knittable instruction graphs and knit programs that proves three properties - upward, forward, and ordered (UFO) - are both necessary and sufficient to ensure the existence of a machine knitting program that can fabricate the fenced tangle denoted by the graph. As a proof-of-concept, we implement an instruction graph editor and compiler that allows a user to transform an instruction graph into UFO presentation and then compile it to a machine program, all while maintaining semantic equivalence. In addition, we use the UFO properties to more precisely characterize the limitations of existing compilers. This work lays the groundwork for more expressive and reliable automated knitting machine programming systems by providing a formal characterization of machine knittability.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"10 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}