We introduce ToonCrafter, a novel approach that transcends traditional correspondence-based cartoon video interpolation, paving the way for generative interpolation. Traditional methods, that implicitly assume linear motion and the absence of complicated phenomena like dis-occlusion, often struggle with the exaggerated non-linear and large motions with occlusion commonly found in cartoons, resulting in implausible or even failed interpolation results. To overcome these limitations, we explore the potential of adapting live-action video priors to better suit cartoon interpolation within a generative framework. ToonCrafter effectively addresses the challenges faced when applying live-action video motion priors to generative cartoon interpolation. First, we design a toon rectification learning strategy that seamlessly adapts live-action video priors to the cartoon domain, resolving the domain gap and content leakage issues. Next, we introduce a dual-reference-based 3D decoder to compensate for lost details due to the highly compressed latent prior spaces, ensuring the preservation of fine details in interpolation results. Finally, we design a flexible sketch encoder that empowers users with interactive control over the interpolation results. Experimental results demonstrate that our proposed method not only produces visually convincing and more natural dynamics, but also effectively handles dis-occlusion. The comparative evaluation demonstrates the notable superiority of our approach over existing competitors. Code and model weights are available at https://doubiiu.github.io/projects/ToonCrafter
我们介绍的 ToonCrafter 是一种新颖的方法,它超越了传统的基于对应关系的卡通视频插值,为生成插值铺平了道路。传统方法隐含地假定了线性运动和不存在不闭塞等复杂现象,但在处理动画片中常见的夸张非线性运动和带有闭塞的大运动时往往力不从心,导致插值结果难以置信甚至失败。为了克服这些局限性,我们探索了在生成框架内调整真人视频先验以更好地适应卡通插值的可能性。ToonCrafter 有效地解决了将真人视频运动先验应用于生成式卡通插值时所面临的挑战。首先,我们设计了一种卡通矫正学习策略,将真人视频前验无缝调整到卡通领域,解决了领域差距和内容泄漏问题。其次,我们引入了基于双参考的 3D 解码器,以补偿因高度压缩的潜先验空间而丢失的细节,确保插值结果中精细细节的保留。最后,我们设计了一种灵活的草图编码器,使用户能够对插值结果进行交互式控制。实验结果表明,我们提出的方法不仅能产生视觉上令人信服且更自然的动态效果,还能有效处理不闭塞现象。对比评估结果表明,我们的方法明显优于现有的竞争对手。代码和模型权重见 https://doubiiu.github.io/projects/ToonCrafter。
{"title":"ToonCrafter: Generative Cartoon Interpolation","authors":"Jinbo Xing, Hanyuan Liu, Menghan Xia, Yong Zhang, Xintao Wang, Ying Shan, Tien-Tsin Wong","doi":"10.1145/3687761","DOIUrl":"https://doi.org/10.1145/3687761","url":null,"abstract":"We introduce ToonCrafter, a novel approach that transcends traditional correspondence-based cartoon video interpolation, paving the way for generative interpolation. Traditional methods, that implicitly assume linear motion and the absence of complicated phenomena like dis-occlusion, often struggle with the exaggerated non-linear and large motions with occlusion commonly found in cartoons, resulting in implausible or even failed interpolation results. To overcome these limitations, we explore the potential of adapting live-action video priors to better suit cartoon interpolation within a generative framework. ToonCrafter effectively addresses the challenges faced when applying live-action video motion priors to generative cartoon interpolation. First, we design a toon rectification learning strategy that seamlessly adapts live-action video priors to the cartoon domain, resolving the domain gap and content leakage issues. Next, we introduce a dual-reference-based 3D decoder to compensate for lost details due to the highly compressed latent prior spaces, ensuring the preservation of fine details in interpolation results. Finally, we design a flexible sketch encoder that empowers users with interactive control over the interpolation results. Experimental results demonstrate that our proposed method not only produces visually convincing and more natural dynamics, but also effectively handles dis-occlusion. The comparative evaluation demonstrates the notable superiority of our approach over existing competitors. Code and model weights are available at https://doubiiu.github.io/projects/ToonCrafter","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"250 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, while allowing the rendering of high-resolution images in real-time. However, leveraging 3D Gaussians for surface reconstruction poses significant challenges due to the explicit and disconnected nature of 3D Gaussians. In this work, we present Gaussian Opacity Fields (GOF), a novel approach for efficient, high-quality, and adaptive surface reconstruction in unbounded scenes. Our GOF is derived from ray-tracing-based volume rendering of 3D Gaussians, enabling direct geometry extraction from 3D Gaussians by identifying its levelset, without resorting to Poisson reconstruction or TSDF fusion as in previous work. We approximate the surface normal of Gaussians as the normal of the ray-Gaussian intersection plane, enabling the application of regularization that significantly enhances geometry. Furthermore, we develop an efficient geometry extraction method utilizing Marching Tetrahedra, where the tetrahedral grids are induced from 3D Gaussians and thus adapt to the scene's complexity. Our evaluations reveal that GOF surpasses existing 3DGS-based methods in surface reconstruction and novel view synthesis. Further, it compares favorably to or even outperforms, neural implicit methods in both quality and speed.
{"title":"Gaussian Opacity Fields: Efficient Adaptive Surface Reconstruction in Unbounded Scenes","authors":"Zehao Yu, Torsten Sattler, Andreas Geiger","doi":"10.1145/3687937","DOIUrl":"https://doi.org/10.1145/3687937","url":null,"abstract":"Recently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, while allowing the rendering of high-resolution images in real-time. However, leveraging 3D Gaussians for surface reconstruction poses significant challenges due to the explicit and disconnected nature of 3D Gaussians. In this work, we present Gaussian Opacity Fields (GOF), a novel approach for efficient, high-quality, and adaptive surface reconstruction in unbounded scenes. Our GOF is derived from ray-tracing-based volume rendering of 3D Gaussians, enabling direct geometry extraction from 3D Gaussians by identifying its levelset, without resorting to Poisson reconstruction or TSDF fusion as in previous work. We approximate the surface normal of Gaussians as the normal of the ray-Gaussian intersection plane, enabling the application of regularization that significantly enhances geometry. Furthermore, we develop an efficient geometry extraction method utilizing Marching Tetrahedra, where the tetrahedral grids are induced from 3D Gaussians and thus adapt to the scene's complexity. Our evaluations reveal that GOF surpasses existing 3DGS-based methods in surface reconstruction and novel view synthesis. Further, it compares favorably to or even outperforms, neural implicit methods in both quality and speed.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"33 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large garages are ubiquitous yet intricate scenes that present unique challenges due to their monotonous colors, repetitive patterns, reflective surfaces, and transparent vehicle glass. Conventional Structure from Motion (SfM) methods for camera pose estimation and 3D reconstruction often fail in these environments due to poor correspondence construction. To address these challenges, we introduce LetsGo, a LiDAR-assisted Gaussian splatting framework for large-scale garage modeling and rendering. We develop a handheld scanner, Polar, equipped with IMU, LiDAR, and a fisheye camera, to facilitate accurate data acquisition. Using this Polar device, we present the GarageWorld dataset, consisting of eight expansive garage scenes with diverse geometric structures, which will be made publicly available for further research. Our approach demonstrates that LiDAR point clouds collected by the Polar device significantly enhance a suite of 3D Gaussian splatting algorithms for garage scene modeling and rendering. We introduce a novel depth regularizer that effectively eliminates floating artifacts in rendered images. Additionally, we propose a multi-resolution 3D Gaussian representation designed for Level-of-Detail (LOD) rendering. This includes adapted scaling factors for individual levels and a random-resolution-level training scheme to optimize the Gaussians across different resolutions. This representation enables efficient rendering of large-scale garage scenes on lightweight devices via a web-based renderer. Experimental results on our GarageWorld dataset, as well as on ScanNet++ and KITTI-360, demonstrate the superiority of our method in terms of rendering quality and resource efficiency.
{"title":"LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives","authors":"Jiadi Cui, Junming Cao, Fuqiang Zhao, Zhipeng He, Yifan Chen, Yuhui Zhong, Lan Xu, Yujiao Shi, Yingliang Zhang, Jingyi Yu","doi":"10.1145/3687762","DOIUrl":"https://doi.org/10.1145/3687762","url":null,"abstract":"Large garages are ubiquitous yet intricate scenes that present unique challenges due to their monotonous colors, repetitive patterns, reflective surfaces, and transparent vehicle glass. Conventional Structure from Motion (SfM) methods for camera pose estimation and 3D reconstruction often fail in these environments due to poor correspondence construction. To address these challenges, we introduce LetsGo, a LiDAR-assisted Gaussian splatting framework for large-scale garage modeling and rendering. We develop a handheld scanner, Polar, equipped with IMU, LiDAR, and a fisheye camera, to facilitate accurate data acquisition. Using this Polar device, we present the GarageWorld dataset, consisting of eight expansive garage scenes with diverse geometric structures, which will be made publicly available for further research. Our approach demonstrates that LiDAR point clouds collected by the Polar device significantly enhance a suite of 3D Gaussian splatting algorithms for garage scene modeling and rendering. We introduce a novel depth regularizer that effectively eliminates floating artifacts in rendered images. Additionally, we propose a multi-resolution 3D Gaussian representation designed for Level-of-Detail (LOD) rendering. This includes adapted scaling factors for individual levels and a random-resolution-level training scheme to optimize the Gaussians across different resolutions. This representation enables efficient rendering of large-scale garage scenes on lightweight devices via a web-based renderer. Experimental results on our GarageWorld dataset, as well as on ScanNet++ and KITTI-360, demonstrate the superiority of our method in terms of rendering quality and resource efficiency.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"55 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces ELMO, a real-time upsampling motion capture framework designed for a single LiDAR sensor. Modeled as a conditional autoregressive transformer-based upsampling motion generator, ELMO achieves 60 fps motion capture from a 20 fps LiDAR point cloud sequence. The key feature of ELMO is the coupling of the self-attention mechanism with thoughtfully designed embedding modules for motion and point clouds, significantly elevating the motion quality. To facilitate accurate motion capture, we develop a one-time skeleton calibration model capable of predicting user skeleton off-sets from a single-frame point cloud. Additionally, we introduce a novel data augmentation technique utilizing a LiDAR simulator, which enhances global root tracking to improve environmental understanding. To demonstrate the effectiveness of our method, we compare ELMO with state-of-the-art methods in both image-based and point cloud-based motion capture. We further conduct an ablation study to validate our design principles. ELMO's fast inference time makes it well-suited for real-time applications, exemplified in our demo video featuring live streaming and interactive gaming scenarios. Furthermore, we contribute a high-quality LiDAR-mocap synchronized dataset comprising 20 different subjects performing a range of motions, which can serve as a valuable resource for future research. The dataset and evaluation code are available at https://movin3d.github.io/ELMO_SIGASIA2024/
{"title":"ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling","authors":"Deok-Kyeong Jang, Dongseok Yang, Deok-Yun Jang, Byeoli Choi, Sung-Hee Lee, Donghoon Shin","doi":"10.1145/3687991","DOIUrl":"https://doi.org/10.1145/3687991","url":null,"abstract":"This paper introduces ELMO, a real-time upsampling motion capture framework designed for a single LiDAR sensor. Modeled as a conditional autoregressive transformer-based upsampling motion generator, ELMO achieves 60 fps motion capture from a 20 fps LiDAR point cloud sequence. The key feature of ELMO is the coupling of the self-attention mechanism with thoughtfully designed embedding modules for motion and point clouds, significantly elevating the motion quality. To facilitate accurate motion capture, we develop a one-time skeleton calibration model capable of predicting user skeleton off-sets from a single-frame point cloud. Additionally, we introduce a novel data augmentation technique utilizing a LiDAR simulator, which enhances global root tracking to improve environmental understanding. To demonstrate the effectiveness of our method, we compare ELMO with state-of-the-art methods in both image-based and point cloud-based motion capture. We further conduct an ablation study to validate our design principles. ELMO's fast inference time makes it well-suited for real-time applications, exemplified in our demo video featuring live streaming and interactive gaming scenarios. Furthermore, we contribute a high-quality LiDAR-mocap synchronized dataset comprising 20 different subjects performing a range of motions, which can serve as a valuable resource for future research. The dataset and evaluation code are available at https://movin3d.github.io/ELMO_SIGASIA2024/","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"36 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Continuous collision detection (CCD) between parametric surfaces is typically formulated as a five-dimensional constrained optimization problem. In the field of CAD and computer graphics, common approaches to solving this problem rely on linearization or sampling strategies. Alternatively, inclusion-based techniques detect collisions by employing 5D inclusion functions, which are typically designed to represent the swept volumes of parametric surfaces over a given time span, and narrowing down the earliest collision moment through subdivision in both spatial and temporal dimensions. However, when high detection accuracy is required, all these approaches significantly increases computational consumption due to the high-dimensional searching space. In this work, we develop a new time-dependent inclusion-based CCD framework that eliminates the need for temporal subdivision and can speedup conventional methods by a factor ranging from 36 to 138. To achieve this, we propose a novel time-dependent inclusion function that provides a continuous representation of a moving surface, along with a corresponding intersection detection algorithm that quickly identifies the time intervals when collisions are likely to occur. We validate our method across various primitive types, demonstrate its efficacy within the simulation pipeline and show that it significantly improves CCD efficiency while maintaining accuracy.
{"title":"A Time-Dependent Inclusion-Based Method for Continuous Collision Detection between Parametric Surfaces","authors":"Xuwen Chen, Cheng Yu, Xingyu Ni, Mengyu Chu, Bin Wang, Baoquan Chen","doi":"10.1145/3687960","DOIUrl":"https://doi.org/10.1145/3687960","url":null,"abstract":"Continuous collision detection (CCD) between parametric surfaces is typically formulated as a five-dimensional constrained optimization problem. In the field of CAD and computer graphics, common approaches to solving this problem rely on linearization or sampling strategies. Alternatively, inclusion-based techniques detect collisions by employing 5D inclusion functions, which are typically designed to represent the swept volumes of parametric surfaces over a given time span, and narrowing down the earliest collision moment through subdivision in both spatial and temporal dimensions. However, when high detection accuracy is required, all these approaches significantly increases computational consumption due to the high-dimensional searching space. In this work, we develop a new time-dependent inclusion-based CCD framework that eliminates the need for temporal subdivision and can speedup conventional methods by a factor ranging from 36 to 138. To achieve this, we propose a novel time-dependent inclusion function that provides a continuous representation of a moving surface, along with a corresponding intersection detection algorithm that quickly identifies the time intervals when collisions are likely to occur. We validate our method across various primitive types, demonstrate its efficacy within the simulation pipeline and show that it significantly improves CCD efficiency while maintaining accuracy.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"6 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Charlie Hewitt, Fatemeh Saleh, Sadegh Aliakbarian, Lohit Petikam, Shideh Rezaeifar, Louis Florentin, Zafiirah Hosenie, Thomas J. Cashman, Julien Valentin, Darren Cosker, Tadas Baltrusaitis
We tackle the problem of highly-accurate, holistic performance capture for the face, body and hands simultaneously. Motion-capture technologies used in film and game production typically focus only on face, body or hand capture independently, involve complex and expensive hardware and a high degree of manual intervention from skilled operators. While machine-learning-based approaches exist to overcome these problems, they usually only support a single camera, often operate on a single part of the body, do not produce precise world-space results, and rarely generalize outside specific contexts. In this work, we introduce the first technique for markerfree, high-quality reconstruction of the complete human body, including eyes and tongue, without requiring any calibration, manual intervention or custom hardware. Our approach produces stable world-space results from arbitrary camera rigs as well as supporting varied capture environments and clothing. We achieve this through a hybrid approach that leverages machine learning models trained exclusively on synthetic data and powerful parametric models of human shape and motion. We evaluate our method on a number of body, face and hand reconstruction benchmarks and demonstrate state-of-the-art results that generalize on diverse datasets.
{"title":"Look Ma, no markers: holistic performance capture without the hassle","authors":"Charlie Hewitt, Fatemeh Saleh, Sadegh Aliakbarian, Lohit Petikam, Shideh Rezaeifar, Louis Florentin, Zafiirah Hosenie, Thomas J. Cashman, Julien Valentin, Darren Cosker, Tadas Baltrusaitis","doi":"10.1145/3687772","DOIUrl":"https://doi.org/10.1145/3687772","url":null,"abstract":"We tackle the problem of highly-accurate, holistic performance capture for the face, body and hands simultaneously. Motion-capture technologies used in film and game production typically focus only on face, body or hand capture independently, involve complex and expensive hardware and a high degree of manual intervention from skilled operators. While machine-learning-based approaches exist to overcome these problems, they usually only support a single camera, often operate on a single part of the body, do not produce precise world-space results, and rarely generalize outside specific contexts. In this work, we introduce the first technique for markerfree, high-quality reconstruction of the complete human body, including eyes and tongue, without requiring any calibration, manual intervention or custom hardware. Our approach produces stable world-space results from arbitrary camera rigs as well as supporting varied capture environments and clothing. We achieve this through a hybrid approach that leverages machine learning models trained exclusively on synthetic data and powerful parametric models of human shape and motion. We evaluate our method on a number of body, face and hand reconstruction benchmarks and demonstrate state-of-the-art results that generalize on diverse datasets.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"13 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a high-fidelity portrait shadow removal model that can effectively enhance the image of a portrait by predicting its appearance under disturbing shadows and highlights. Portrait shadow removal is a highly ill-posed problem where multiple plausible solutions can be found based on a single image. For example, disentangling complex environmental lighting from original skin color is a non-trivial problem. While existing works have solved this problem by predicting the appearance residuals that can propagate local shadow distribution, such methods are often incomplete and lead to unnatural predictions, especially for portraits with hard shadows. We overcome the limitations of existing local propagation methods by formulating the removal problem as a generation task where a diffusion model learns to globally rebuild the human appearance from scratch as a condition of an input portrait image. For robust and natural shadow removal, we propose to train the diffusion model with a compositional repurposing framework: a pre-trained text-guided image generation model is first fine-tuned to harmonize the lighting and color of the foreground with a background scene by using a background harmonization dataset; and then the model is further fine-tuned to generate a shadow-free portrait image via a shadow-paired dataset. To overcome the limitation of losing fine details in the latent diffusion model, we propose a guided-upsampling network to restore the original high-frequency details (e.g. , wrinkles and dots) from the input image. To enable our compositional training framework, we construct a high-fidelity and large-scale dataset using a lightstage capturing system and synthetic graphics simulation. Our generative framework effectively removes shadows caused by both self and external occlusions while maintaining original lighting distribution and high-frequency details. Our method also demonstrates robustness to diverse subjects captured in real environments.
{"title":"Generative Portrait Shadow Removal","authors":"Jae Shin Yoon, Zhixin Shu, Mengwei Ren, Cecilia Zhang, Yannick Hold-Geoffroy, Krishna kumar Singh, He Zhang","doi":"10.1145/3687903","DOIUrl":"https://doi.org/10.1145/3687903","url":null,"abstract":"We introduce a high-fidelity portrait shadow removal model that can effectively enhance the image of a portrait by predicting its appearance under disturbing shadows and highlights. Portrait shadow removal is a highly ill-posed problem where multiple plausible solutions can be found based on a single image. For example, disentangling complex environmental lighting from original skin color is a non-trivial problem. While existing works have solved this problem by predicting the appearance residuals that can propagate local shadow distribution, such methods are often incomplete and lead to unnatural predictions, especially for portraits with hard shadows. We overcome the limitations of existing local propagation methods by formulating the removal problem as a generation task where a diffusion model learns to globally rebuild the human appearance from scratch as a condition of an input portrait image. For robust and natural shadow removal, we propose to train the diffusion model with a compositional repurposing framework: a pre-trained text-guided image generation model is first fine-tuned to harmonize the lighting and color of the foreground with a background scene by using a background harmonization dataset; and then the model is further fine-tuned to generate a shadow-free portrait image via a shadow-paired dataset. To overcome the limitation of losing fine details in the latent diffusion model, we propose a guided-upsampling network to restore the original high-frequency details <jats:italic>(e.g.</jats:italic> , wrinkles and dots) from the input image. To enable our compositional training framework, we construct a high-fidelity and large-scale dataset using a lightstage capturing system and synthetic graphics simulation. Our generative framework effectively removes shadows caused by both self and external occlusions while maintaining original lighting distribution and high-frequency details. Our method also demonstrates robustness to diverse subjects captured in real environments.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"99 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142672880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jian Wang, Sizhuo Ma, Karl Bayer, Yi Zhang, Peihao Wang, Bing Zhou, Shree Nayar, Gurunandan Krishnan
Augmented reality (AR) mirrors are novel displays that have great potential for commercial applications such as virtual apparel try-on. Typically the camera is placed beside the display, leading to distorted perspectives during user interaction. In this paper, we present a novel approach to address this problem by placing the camera behind a transparent display, thereby providing users with a perspective-aligned experience. Simply placing the camera behind the display can compromise image quality due to optical effects. We meticulously analyze the image formation process, and present an image restoration algorithm that benefits from physics-based data synthesis and network design. Our method significantly improves image quality and outperforms existing methods especially on the underexplored wire and backscatter artifacts. We then carefully design a full AR mirror system including display and camera selection, real-time processing pipeline, and mechanical design. Our user study demonstrates that the system is exceptionally well-received by users, highlighting its advantages over existing camera configurations not only as an AR mirror, but also for video conferencing. Our work represents a step forward in the development of AR mirrors, with potential applications in retail, cosmetics, fashion, etc. The image restoration dataset and code are available at https://perspective-armirror.github.io/.
增强现实(AR)镜是一种新颖的显示器,在虚拟服装试穿等商业应用中具有巨大潜力。通常情况下,摄像头被放置在显示屏旁边,导致用户交互时视角失真。在本文中,我们提出了一种解决这一问题的新方法,即把摄像头放在透明显示屏后面,从而为用户提供视角对齐的体验。由于光学效应,简单地将摄像头置于显示屏后可能会影响图像质量。我们对图像形成过程进行了细致分析,并提出了一种受益于基于物理的数据合成和网络设计的图像修复算法。我们的方法大大提高了图像质量,并优于现有方法,特别是在未充分开发的线和反向散射伪影方面。然后,我们精心设计了一个完整的 AR 镜系统,包括显示器和摄像头的选择、实时处理管道和机械设计。我们的用户研究表明,该系统受到了用户的极大欢迎,凸显了其相对于现有摄像头配置的优势,不仅可用作 AR 镜,还可用于视频会议。我们的工作标志着 AR 镜的开发向前迈进了一步,有望应用于零售、化妆品、时尚等领域。图像还原数据集和代码可在 https://perspective-armirror.github.io/ 上获取。
{"title":"Perspective-Aligned AR Mirror with Under-Display Camera","authors":"Jian Wang, Sizhuo Ma, Karl Bayer, Yi Zhang, Peihao Wang, Bing Zhou, Shree Nayar, Gurunandan Krishnan","doi":"10.1145/3687995","DOIUrl":"https://doi.org/10.1145/3687995","url":null,"abstract":"Augmented reality (AR) mirrors are novel displays that have great potential for commercial applications such as virtual apparel try-on. Typically the camera is placed beside the display, leading to distorted perspectives during user interaction. In this paper, we present a novel approach to address this problem by placing the camera behind a transparent display, thereby providing users with a perspective-aligned experience. Simply placing the camera behind the display can compromise image quality due to optical effects. We meticulously analyze the image formation process, and present an image restoration algorithm that benefits from physics-based data synthesis and network design. Our method significantly improves image quality and outperforms existing methods especially on the underexplored wire and backscatter artifacts. We then carefully design a full AR mirror system including display and camera selection, real-time processing pipeline, and mechanical design. Our user study demonstrates that the system is exceptionally well-received by users, highlighting its advantages over existing camera configurations not only as an AR mirror, but also for video conferencing. Our work represents a step forward in the development of AR mirrors, with potential applications in retail, cosmetics, fashion, <jats:italic>etc.</jats:italic> The image restoration dataset and code are available at https://perspective-armirror.github.io/.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"36 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liangwang Ruan, Bin Wang, Tiantian Liu, Baoquan Chen
We propose MiNNIE, a simple yet comprehensive framework for real-time simulation of nonlinear near-incompressible elastics. To avoid the common volumetric locking issues at high Poisson's ratios of linear finite element methods (FEM), we build MiNNIE upon a mixed FEM framework and further incorporate a pressure stabilization term to ensure excellent convergence of multigrid solvers. Our pressure stabilization strategy injects bounded influence on nodal displacement which can be eliminated using a quasiNewton method. MiNNIE has a specially tailored GPU multigrid solver including a modified skinning-space interpolation scheme, a novel vertex Vanka smoother, and an efficient dense solver using Schur complement. MiNNIE supports various elastic material models and simulates them in real-time, supporting a full range of Poisson's ratios up to 0.5 while handling large deformations, element inversions, and self-collisions at the same time.
{"title":"MiNNIE: a Mixed Multigrid Method for Real-time Simulation of Nonlinear Near-Incompressible Elastics","authors":"Liangwang Ruan, Bin Wang, Tiantian Liu, Baoquan Chen","doi":"10.1145/3687758","DOIUrl":"https://doi.org/10.1145/3687758","url":null,"abstract":"We propose MiNNIE, a simple yet comprehensive framework for real-time simulation of nonlinear near-incompressible elastics. To avoid the common volumetric locking issues at high Poisson's ratios of linear finite element methods (FEM), we build MiNNIE upon a mixed FEM framework and further incorporate a pressure stabilization term to ensure excellent convergence of multigrid solvers. Our pressure stabilization strategy injects bounded influence on nodal displacement which can be eliminated using a quasiNewton method. MiNNIE has a specially tailored GPU multigrid solver including a modified skinning-space interpolation scheme, a novel vertex Vanka smoother, and an efficient dense solver using Schur complement. MiNNIE supports various elastic material models and simulates them in real-time, supporting a full range of Poisson's ratios up to 0.5 while handling large deformations, element inversions, and self-collisions at the same time.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"251 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Li Wang, Lianghao Zhang, Fangzhou Gao, Yuzhen Kang, Jiawan Zhang
Recovering spatial-varying bi-directional reflectance distribution function (SVBRDF) from a few hand-held captured images has been a challenging task in computer graphics. Benefiting from the learned priors from data, single-image methods can obtain plausible SVBRDF estimation results. However, the extremely limited appearance information in a single image does not suffice for high-quality SVBRDF reconstruction. Although increasing the number of inputs can improve the reconstruction quality, it also affects the efficiency of real data capture and adds significant computational burdens. Therefore, the key challenge is to minimize the required number of inputs, while keeping high-quality results. To address this, we propose maximizing the effective information in each input through a novel co-located capture strategy that combines near-field and far-field point lighting. To further enhance effectiveness, we theoretically investigate the inherent relation between two images. The extracted relation is strongly correlated with the slope of specular reflectance, substantially enhancing the precision of roughness map estimation. Additionally, we designed the registration and denoising modules to meet the practical requirements of hand-held capture. Quantitative assessments and qualitative analysis have demonstrated that our method achieves superior SVBRDF estimations compared to previous approaches. All source codes will be publicly released.
{"title":"NFPLight: Deep SVBRDF Estimation via the Combination of Near and Far Field Point Lighting","authors":"Li Wang, Lianghao Zhang, Fangzhou Gao, Yuzhen Kang, Jiawan Zhang","doi":"10.1145/3687978","DOIUrl":"https://doi.org/10.1145/3687978","url":null,"abstract":"Recovering spatial-varying bi-directional reflectance distribution function (SVBRDF) from a few hand-held captured images has been a challenging task in computer graphics. Benefiting from the learned priors from data, single-image methods can obtain plausible SVBRDF estimation results. However, the extremely limited appearance information in a single image does not suffice for high-quality SVBRDF reconstruction. Although increasing the number of inputs can improve the reconstruction quality, it also affects the efficiency of real data capture and adds significant computational burdens. Therefore, the key challenge is to minimize the required number of inputs, while keeping high-quality results. To address this, we propose maximizing the effective information in each input through a novel co-located capture strategy that combines near-field and far-field point lighting. To further enhance effectiveness, we theoretically investigate the inherent relation between two images. The extracted relation is strongly correlated with the slope of specular reflectance, substantially enhancing the precision of roughness map estimation. Additionally, we designed the registration and denoising modules to meet the practical requirements of hand-held capture. Quantitative assessments and qualitative analysis have demonstrated that our method achieves superior SVBRDF estimations compared to previous approaches. All source codes will be publicly released.","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"55 1","pages":""},"PeriodicalIF":6.2,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142673087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}