Editing anime images via probabilities of attribute tags allows controlling the degree of the manipulation in an intuitive and convenient manner. Existing methods fall short in the progressive modification and preservation of unintended regions in the input image. We propose a controllable anime image editing framework based on adjusting the tag probabilities, in which a probability encoding network (PEN) is developed to encode the probabilities into features that capture continuous characteristic of the probabilities. Thus, the encoded features are able to direct the generative process of a pre-trained diffusion model and facilitate the linear manipulation. We also introduce a local editing module that automatically identifies the intended regions and constrains the edits to be applied to those regions only, which preserves the others unchanged. Comprehensive comparisons with existing methods indicate the effectiveness of our framework in both one-shot and linear editing modes. Results in additional applications further demonstrate the generalization ability of our approach.
{"title":"Controllable Anime Image Editing via Probability of Attribute Tags","authors":"Zhenghao Song, Haoran Mo, Chengying Gao","doi":"10.1111/cgf.15245","DOIUrl":"https://doi.org/10.1111/cgf.15245","url":null,"abstract":"<p>Editing anime images via probabilities of attribute tags allows controlling the degree of the manipulation in an intuitive and convenient manner. Existing methods fall short in the progressive modification and preservation of unintended regions in the input image. We propose a controllable anime image editing framework based on adjusting the tag probabilities, in which a probability encoding network (PEN) is developed to encode the probabilities into features that capture continuous characteristic of the probabilities. Thus, the encoded features are able to direct the generative process of a pre-trained diffusion model and facilitate the linear manipulation. We also introduce a local editing module that automatically identifies the intended regions and constrains the edits to be applied to those regions only, which preserves the others unchanged. Comprehensive comparisons with existing methods indicate the effectiveness of our framework in both one-shot and linear editing modes. Results in additional applications further demonstrate the generalization ability of our approach.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Restoring the appearance of the model is a crucial step for achieving realistic 3D reconstruction. High-fidelity textures can also conceal some geometric defects. Since the estimated camera parameters and reconstructed geometry usually contain errors, subsequent texture mapping often suffers from undesirable visual artifacts such as blurring, ghosting, and visual seams. In particular, significant misalignment between the reconstructed model and the registered images will lead to texturing the mesh with inconsistent image regions. However, eliminating various artifacts to generate high-quality textures remains a challenge. In this paper, we address this issue by designing a texture optimization method to generate seamless and aligned textures for 3D reconstruction. The main idea is to detect misalignment regions between images and geometry and exclude them from texture mapping. To handle the texture holes caused by these excluded regions, a cross-patch texture hole-filling method is proposed, which can also synthesize plausible textures for invisible faces. Moreover, for better stitching of the textures from different views, an improved camera pose optimization is present by introducing color adjustment and boundary point sampling. Experimental results show that the proposed method can eliminate the artifacts caused by inaccurate input data robustly and produce high-quality texture results compared with state-of-the-art methods.
{"title":"Seamless and Aligned Texture Optimization for 3D Reconstruction","authors":"Lei Wang, Linlin Ge, Qitong Zhang, Jieqing Feng","doi":"10.1111/cgf.15205","DOIUrl":"https://doi.org/10.1111/cgf.15205","url":null,"abstract":"<p>Restoring the appearance of the model is a crucial step for achieving realistic 3D reconstruction. High-fidelity textures can also conceal some geometric defects. Since the estimated camera parameters and reconstructed geometry usually contain errors, subsequent texture mapping often suffers from undesirable visual artifacts such as blurring, ghosting, and visual seams. In particular, significant misalignment between the reconstructed model and the registered images will lead to texturing the mesh with inconsistent image regions. However, eliminating various artifacts to generate high-quality textures remains a challenge. In this paper, we address this issue by designing a texture optimization method to generate seamless and aligned textures for 3D reconstruction. The main idea is to detect misalignment regions between images and geometry and exclude them from texture mapping. To handle the texture holes caused by these excluded regions, a cross-patch texture hole-filling method is proposed, which can also synthesize plausible textures for invisible faces. Moreover, for better stitching of the textures from different views, an improved camera pose optimization is present by introducing color adjustment and boundary point sampling. Experimental results show that the proposed method can eliminate the artifacts caused by inaccurate input data robustly and produce high-quality texture results compared with state-of-the-art methods.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neural rendering bakes global illumination and other computationally costly effects into the weights of a neural network, allowing to efficiently synthesize photorealistic images without relying on path tracing. In neural rendering approaches, G-buffers obtained from rasterization through direct rendering provide information regarding the scene such as position, normal, and textures to the neural network, achieving accurate and stable rendering quality in real-time. However, due to the use of G-buffers, existing methods struggle to accurately render transparency and refraction effects, as G-buffers do not capture any ray information from multiple light ray bounces. This limitation results in blurriness, distortions, and loss of detail in rendered images that contain transparency and refraction, and is particularly notable in scenes with refracted objects that have high-frequency textures. In this work, we propose a neural network architecture to encode critical rendering information, including texture coordinates from refracted rays, and enable reconstruction of high-frequency textures in areas with refraction. Our approach is able to achieve accurate refraction rendering in challenging scenes with a diversity of overlapping transparent objects. Experimental results demonstrate that our method can interactively render high quality refraction effects with global illumination, unlike existing neural rendering approaches. Our code can be found at https://github.com/ziyangz5/CrystalNet
神经渲染将全局光照和其他计算成本高昂的效果融入神经网络的权重中,从而无需依赖路径追踪就能高效合成逼真的图像。在神经渲染方法中,通过直接渲染光栅化获得的 G 缓冲区为神经网络提供了有关场景的信息,如位置、法线和纹理,从而实现了准确、稳定的实时渲染质量。然而,由于使用 G 缓冲区,现有方法难以准确渲染透明和折射效果,因为 G 缓冲区无法捕捉到多条光线反弹时的任何光线信息。这种局限性导致渲染的包含透明和折射效果的图像模糊、失真和细节缺失,在具有高频纹理的折射物体场景中尤为明显。在这项工作中,我们提出了一种神经网络架构,用于编码关键的渲染信息,包括折射光线的纹理坐标,并在有折射的区域重建高频纹理。我们的方法能够在具有各种重叠透明物体的挑战性场景中实现精确的折射渲染。实验结果表明,与现有的神经渲染方法不同,我们的方法可以交互式地渲染具有全局照明的高质量折射效果。我们的代码见 https://github.com/ziyangz5/CrystalNet
{"title":"CrystalNet: Texture-Aware Neural Refraction Baking for Global Illumination","authors":"Z. Zhang, E. Simo-Serra","doi":"10.1111/cgf.15227","DOIUrl":"https://doi.org/10.1111/cgf.15227","url":null,"abstract":"<p>Neural rendering bakes global illumination and other computationally costly effects into the weights of a neural network, allowing to efficiently synthesize photorealistic images without relying on path tracing. In neural rendering approaches, G-buffers obtained from rasterization through direct rendering provide information regarding the scene such as position, normal, and textures to the neural network, achieving accurate and stable rendering quality in real-time. However, due to the use of G-buffers, existing methods struggle to accurately render transparency and refraction effects, as G-buffers do not capture any ray information from multiple light ray bounces. This limitation results in blurriness, distortions, and loss of detail in rendered images that contain transparency and refraction, and is particularly notable in scenes with refracted objects that have high-frequency textures. In this work, we propose a neural network architecture to encode critical rendering information, including texture coordinates from refracted rays, and enable reconstruction of high-frequency textures in areas with refraction. Our approach is able to achieve accurate refraction rendering in challenging scenes with a diversity of overlapping transparent objects. Experimental results demonstrate that our method can interactively render high quality refraction effects with global illumination, unlike existing neural rendering approaches. Our code can be found at https://github.com/ziyangz5/CrystalNet</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recovering the complete structure from partial point clouds in arbitrary poses is challenging. Recently, many efforts have been made to address this problem by developing SO(3)-equivariant completion networks or aligning the partial point clouds with a predefined canonical space before completion. However, these approaches are limited to random rotations only or demand costly pose annotation for model training. In this paper, we present a novel Network for Point cloud Completion with Learnable Canonical space (PCLC-Net) to reduce the need for pose annotations and extract SE(3)-invariant geometry features to improve the completion quality in arbitrary poses. Without pose annotations, our PCLC-Net utilizes self-supervised pose estimation to align the input partial point clouds to a canonical space that is learnable for an object category and subsequently performs shape completion in the learned canonical space. Our PCLC-Net can complete partial point clouds with arbitrary SE(3) poses without requiring pose annotations for supervision. Our PCLC-Net achieves state-of-the-art results on shape completion with arbitrary SE(3) poses on both synthetic and real scanned data. To the best of our knowledge, our method is the first to achieve shape completion in arbitrary poses without pose annotations during network training.
{"title":"PCLC-Net: Point Cloud Completion in Arbitrary Poses with Learnable Canonical Space","authors":"Hanmo Xu, Qingyao Shuai, Xuejin Chen","doi":"10.1111/cgf.15217","DOIUrl":"https://doi.org/10.1111/cgf.15217","url":null,"abstract":"<p>Recovering the complete structure from partial point clouds in arbitrary poses is challenging. Recently, many efforts have been made to address this problem by developing SO(3)-equivariant completion networks or aligning the partial point clouds with a predefined canonical space before completion. However, these approaches are limited to random rotations only or demand costly pose annotation for model training. In this paper, we present a novel Network for Point cloud Completion with Learnable Canonical space (PCLC-Net) to reduce the need for pose annotations and extract SE(3)-invariant geometry features to improve the completion quality in arbitrary poses. Without pose annotations, our PCLC-Net utilizes self-supervised pose estimation to align the input partial point clouds to a canonical space that is learnable for an object category and subsequently performs shape completion in the learned canonical space. Our PCLC-Net can complete partial point clouds with arbitrary SE(3) poses without requiring pose annotations for supervision. Our PCLC-Net achieves state-of-the-art results on shape completion with arbitrary SE(3) poses on both synthetic and real scanned data. To the best of our knowledge, our method is the first to achieve shape completion in arbitrary poses without pose annotations during network training.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheng Ye, Zhen-Hui Dong, Yubin Hu, Yu-Hui Wen, Yong-Jin Liu
3D Gaussian Splatting has recently emerged as a powerful representation that can synthesize remarkable novel views using consistent multi-view images as input. However, we notice that images captured in dark environments where the scenes are not fully illuminated can exhibit considerable brightness variations and multi-view inconsistency, which poses great challenges to 3D Gaussian Splatting and severely degrades its performance. To tackle this problem, we propose Gaussian-DK. Observing that inconsistencies are mainly caused by camera imaging, we represent a consistent radiance field of the physical world using a set of anisotropic 3D Gaussians, and design a camera response module to compensate for multi-view inconsistencies. We also introduce a step-based gradient scaling strategy to constrain Gaussians near the camera, which turn out to be floaters, from splitting and cloning. Experiments on our proposed benchmark dataset demonstrate that Gaussian-DK produces high-quality renderings without ghosting and floater artifacts and significantly outperforms existing methods. Furthermore, we can also synthesize light-up images by controlling exposure levels that clearly show details in shadow areas.
{"title":"Gaussian in the Dark: Real-Time View Synthesis From Inconsistent Dark Images Using Gaussian Splatting","authors":"Sheng Ye, Zhen-Hui Dong, Yubin Hu, Yu-Hui Wen, Yong-Jin Liu","doi":"10.1111/cgf.15213","DOIUrl":"https://doi.org/10.1111/cgf.15213","url":null,"abstract":"<p>3D Gaussian Splatting has recently emerged as a powerful representation that can synthesize remarkable novel views using consistent multi-view images as input. However, we notice that images captured in dark environments where the scenes are not fully illuminated can exhibit considerable brightness variations and multi-view inconsistency, which poses great challenges to 3D Gaussian Splatting and severely degrades its performance. To tackle this problem, we propose Gaussian-DK. Observing that inconsistencies are mainly caused by camera imaging, we represent a consistent radiance field of the physical world using a set of anisotropic 3D Gaussians, and design a camera response module to compensate for multi-view inconsistencies. We also introduce a step-based gradient scaling strategy to constrain Gaussians near the camera, which turn out to be floaters, from splitting and cloning. Experiments on our proposed benchmark dataset demonstrate that Gaussian-DK produces high-quality renderings without ghosting and floater artifacts and significantly outperforms existing methods. Furthermore, we can also synthesize light-up images by controlling exposure levels that clearly show details in shadow areas.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Q. Jiang, Q.L. Wang, L.H. Chi, X.H. Chen, Q.Y. Zhang, R. Zhou, Z.Q. Deng, J.S. Deng, B.B. Tang, S.H. Lv, J. Liu
Latent diffusion models (LDMs) have demonstrated remarkable success in generative modeling. It is promising to leverage the potential of diffusion priors to enhance performance in image and video tasks. However, applying LDMs to video super-resolution (VSR) presents significant challenges due to the high demands for realistic details and temporal consistency in generated videos, exacerbated by the inherent stochasticity in the diffusion process. In this work, we propose a novel diffusion-based framework, Temporal-awareness Latent Diffusion Model (TempDiff), specifically designed for real-world video super-resolution, where degradations are diverse and complex. TempDiff harnesses the powerful generative prior of a pre-trained diffusion model and enhances temporal awareness through the following mechanisms: 1) Incorporating temporal layers into the denoising U-Net and VAE-Decoder, and fine-tuning these added modules to maintain temporal coherency; 2) Estimating optical flow guidance using a pre-trained flow net for latent optimization and propagation across video sequences, ensuring overall stability in the generated high-quality video. Extensive experiments demonstrate that TempDiff achieves compelling results, outperforming state-of-the-art methods on both synthetic and real-world VSR benchmark datasets. Code will be available at https://github.com/jiangqin567/TempDiff
{"title":"TempDiff: Enhancing Temporal-awareness in Latent Diffusion for Real-World Video Super-Resolution","authors":"Q. Jiang, Q.L. Wang, L.H. Chi, X.H. Chen, Q.Y. Zhang, R. Zhou, Z.Q. Deng, J.S. Deng, B.B. Tang, S.H. Lv, J. Liu","doi":"10.1111/cgf.15211","DOIUrl":"https://doi.org/10.1111/cgf.15211","url":null,"abstract":"<p>Latent diffusion models (LDMs) have demonstrated remarkable success in generative modeling. It is promising to leverage the potential of diffusion priors to enhance performance in image and video tasks. However, applying LDMs to video super-resolution (VSR) presents significant challenges due to the high demands for realistic details and temporal consistency in generated videos, exacerbated by the inherent stochasticity in the diffusion process. In this work, we propose a novel diffusion-based framework, Temporal-awareness Latent Diffusion Model (TempDiff), specifically designed for real-world video super-resolution, where degradations are diverse and complex. TempDiff harnesses the powerful generative prior of a pre-trained diffusion model and enhances temporal awareness through the following mechanisms: 1) Incorporating temporal layers into the denoising U-Net and VAE-Decoder, and fine-tuning these added modules to maintain temporal coherency; 2) Estimating optical flow guidance using a pre-trained flow net for latent optimization and propagation across video sequences, ensuring overall stability in the generated high-quality video. Extensive experiments demonstrate that TempDiff achieves compelling results, outperforming state-of-the-art methods on both synthetic and real-world VSR benchmark datasets. Code will be available at https://github.com/jiangqin567/TempDiff</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}