Computer Graphics Forum最新文献_第6页

TaNSR:Efficient 3D Reconstruction with Tetrahedral Difference and Feature Aggregation TaNSR：利用四面体差分和特征聚合实现高效三维重建

IF 2.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Graphics Forum

Pub Date : 2024-10-24 DOI: 10.1111/cgf.15207

Zhaohan Lv, Xingcan Bao, Yong Tang, Jing Zhao

Neural surface reconstruction methods have demonstrated their ability to recover 3D surfaces from multiple images. However, current approaches struggle to rapidly achieve high-fidelity surface reconstructions. In this work, we propose TaNSR, which inherits the speed advantages of multi-resolution hash encodings and extends its representation capabilities. To reduce training time, we propose an efficient numerical gradient computation method that significantly reduces additional memory access overhead. To further improve reconstruction quality and expedite training, we propose a feature aggregation strategy in volume rendering. Building on this, we introduce an adaptively weighted aggregation function to ensure the network can accurately reconstruct the surface of objects and recover more geometric details. Experiments on multiple datasets indicate that TaNSR significantly reduces training time while achieving better reconstruction accuracy compared to state-of-the-art nerual implicit methods.

神经表面重建方法已经证明了其从多幅图像中恢复三维表面的能力。然而，目前的方法难以快速实现高保真曲面重建。在这项工作中，我们提出了 TaNSR，它继承了多分辨率哈希编码的速度优势，并扩展了其表示能力。为了缩短训练时间，我们提出了一种高效的梯度数值计算方法，大大减少了额外的内存访问开销。为了进一步提高重建质量并加快训练速度，我们提出了一种体积渲染中的特征聚合策略。在此基础上，我们引入了自适应加权聚合函数，以确保网络能够准确地重建物体表面并恢复更多几何细节。在多个数据集上的实验表明，与最先进的 nerual 隐式方法相比，TaNSR 能显著缩短训练时间，同时获得更好的重建精度。

引用次数: 0

Controllable Anime Image Editing via Probability of Attribute Tags 通过属性标签概率进行可控动漫图像编辑

IF 2.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Graphics Forum

Pub Date : 2024-10-24 DOI: 10.1111/cgf.15245

Zhenghao Song, Haoran Mo, Chengying Gao

Editing anime images via probabilities of attribute tags allows controlling the degree of the manipulation in an intuitive and convenient manner. Existing methods fall short in the progressive modification and preservation of unintended regions in the input image. We propose a controllable anime image editing framework based on adjusting the tag probabilities, in which a probability encoding network (PEN) is developed to encode the probabilities into features that capture continuous characteristic of the probabilities. Thus, the encoded features are able to direct the generative process of a pre-trained diffusion model and facilitate the linear manipulation. We also introduce a local editing module that automatically identifies the intended regions and constrains the edits to be applied to those regions only, which preserves the others unchanged. Comprehensive comparisons with existing methods indicate the effectiveness of our framework in both one-shot and linear editing modes. Results in additional applications further demonstrate the generalization ability of our approach.

通过属性标签的概率来编辑动漫图像，可以直观方便地控制操作的程度。现有方法在逐步修改和保留输入图像中的非预期区域方面存在不足。我们提出了一个基于调整标签概率的可控动漫图像编辑框架，其中开发了一个概率编码网络（PEN），将概率编码为捕捉概率连续特征的特征。因此，编码后的特征能够指导预先训练好的扩散模型的生成过程，并促进线性操作。我们还引入了一个局部编辑模块，它能自动识别目标区域，并限制只对这些区域进行编辑，而其他区域则保持不变。与现有方法的综合比较表明，我们的框架在单次编辑和线性编辑模式下都很有效。其他应用中的结果进一步证明了我们方法的通用能力。

引用次数: 0

Seamless and Aligned Texture Optimization for 3D Reconstruction 为三维重建进行无缝对齐纹理优化

IF 2.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Graphics Forum

Pub Date : 2024-10-24 DOI: 10.1111/cgf.15205

Lei Wang, Linlin Ge, Qitong Zhang, Jieqing Feng

Restoring the appearance of the model is a crucial step for achieving realistic 3D reconstruction. High-fidelity textures can also conceal some geometric defects. Since the estimated camera parameters and reconstructed geometry usually contain errors, subsequent texture mapping often suffers from undesirable visual artifacts such as blurring, ghosting, and visual seams. In particular, significant misalignment between the reconstructed model and the registered images will lead to texturing the mesh with inconsistent image regions. However, eliminating various artifacts to generate high-quality textures remains a challenge. In this paper, we address this issue by designing a texture optimization method to generate seamless and aligned textures for 3D reconstruction. The main idea is to detect misalignment regions between images and geometry and exclude them from texture mapping. To handle the texture holes caused by these excluded regions, a cross-patch texture hole-filling method is proposed, which can also synthesize plausible textures for invisible faces. Moreover, for better stitching of the textures from different views, an improved camera pose optimization is present by introducing color adjustment and boundary point sampling. Experimental results show that the proposed method can eliminate the artifacts caused by inaccurate input data robustly and produce high-quality texture results compared with state-of-the-art methods.

恢复模型的外观是实现逼真三维重建的关键一步。高保真纹理还能掩盖一些几何缺陷。由于估计的相机参数和重建的几何图形通常包含误差，因此后续的纹理映射通常会出现不理想的视觉伪影，如模糊、重影和视觉接缝。特别是，重建模型与注册图像之间的严重错位会导致网格纹理与图像区域不一致。然而，消除各种伪像以生成高质量纹理仍然是一个挑战。本文针对这一问题，设计了一种纹理优化方法，为三维重建生成无缝对齐的纹理。其主要思路是检测图像与几何图形之间的错位区域，并将其排除在纹理映射之外。为了处理这些排除区域造成的纹理漏洞，我们提出了一种交叉补丁纹理漏洞填充方法，这种方法还能为不可见的人脸合成可信的纹理。此外，为了更好地拼接来自不同视角的纹理，还通过引入颜色调整和边界点采样改进了相机姿态优化。实验结果表明，与最先进的方法相比，所提出的方法能稳健地消除因输入数据不准确而产生的伪影，并生成高质量的纹理结果。

{"title":"Seamless and Aligned Texture Optimization for 3D Reconstruction","authors":"Lei Wang, Linlin Ge, Qitong Zhang, Jieqing Feng","doi":"10.1111/cgf.15205","DOIUrl":"https://doi.org/10.1111/cgf.15205","url":null,"abstract":"<p>Restoring the appearance of the model is a crucial step for achieving realistic 3D reconstruction. High-fidelity textures can also conceal some geometric defects. Since the estimated camera parameters and reconstructed geometry usually contain errors, subsequent texture mapping often suffers from undesirable visual artifacts such as blurring, ghosting, and visual seams. In particular, significant misalignment between the reconstructed model and the registered images will lead to texturing the mesh with inconsistent image regions. However, eliminating various artifacts to generate high-quality textures remains a challenge. In this paper, we address this issue by designing a texture optimization method to generate seamless and aligned textures for 3D reconstruction. The main idea is to detect misalignment regions between images and geometry and exclude them from texture mapping. To handle the texture holes caused by these excluded regions, a cross-patch texture hole-filling method is proposed, which can also synthesize plausible textures for invisible faces. Moreover, for better stitching of the textures from different views, an improved camera pose optimization is present by introducing color adjustment and boundary point sampling. Experimental results show that the proposed method can eliminate the artifacts caused by inaccurate input data robustly and produce high-quality texture results compared with state-of-the-art methods.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CrystalNet: Texture-Aware Neural Refraction Baking for Global Illumination CrystalNet：全局照明的纹理感知神经折射烘焙

IF 2.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Graphics Forum

Pub Date : 2024-10-24 DOI: 10.1111/cgf.15227

Z. Zhang, E. Simo-Serra

Neural rendering bakes global illumination and other computationally costly effects into the weights of a neural network, allowing to efficiently synthesize photorealistic images without relying on path tracing. In neural rendering approaches, G-buffers obtained from rasterization through direct rendering provide information regarding the scene such as position, normal, and textures to the neural network, achieving accurate and stable rendering quality in real-time. However, due to the use of G-buffers, existing methods struggle to accurately render transparency and refraction effects, as G-buffers do not capture any ray information from multiple light ray bounces. This limitation results in blurriness, distortions, and loss of detail in rendered images that contain transparency and refraction, and is particularly notable in scenes with refracted objects that have high-frequency textures. In this work, we propose a neural network architecture to encode critical rendering information, including texture coordinates from refracted rays, and enable reconstruction of high-frequency textures in areas with refraction. Our approach is able to achieve accurate refraction rendering in challenging scenes with a diversity of overlapping transparent objects. Experimental results demonstrate that our method can interactively render high quality refraction effects with global illumination, unlike existing neural rendering approaches. Our code can be found at https://github.com/ziyangz5/CrystalNet

神经渲染将全局光照和其他计算成本高昂的效果融入神经网络的权重中，从而无需依赖路径追踪就能高效合成逼真的图像。在神经渲染方法中，通过直接渲染光栅化获得的 G 缓冲区为神经网络提供了有关场景的信息，如位置、法线和纹理，从而实现了准确、稳定的实时渲染质量。然而，由于使用 G 缓冲区，现有方法难以准确渲染透明和折射效果，因为 G 缓冲区无法捕捉到多条光线反弹时的任何光线信息。这种局限性导致渲染的包含透明和折射效果的图像模糊、失真和细节缺失，在具有高频纹理的折射物体场景中尤为明显。在这项工作中，我们提出了一种神经网络架构，用于编码关键的渲染信息，包括折射光线的纹理坐标，并在有折射的区域重建高频纹理。我们的方法能够在具有各种重叠透明物体的挑战性场景中实现精确的折射渲染。实验结果表明，与现有的神经渲染方法不同，我们的方法可以交互式地渲染具有全局照明的高质量折射效果。我们的代码见 https://github.com/ziyangz5/CrystalNet

{"title":"CrystalNet: Texture-Aware Neural Refraction Baking for Global Illumination","authors":"Z. Zhang, E. Simo-Serra","doi":"10.1111/cgf.15227","DOIUrl":"https://doi.org/10.1111/cgf.15227","url":null,"abstract":"<p>Neural rendering bakes global illumination and other computationally costly effects into the weights of a neural network, allowing to efficiently synthesize photorealistic images without relying on path tracing. In neural rendering approaches, G-buffers obtained from rasterization through direct rendering provide information regarding the scene such as position, normal, and textures to the neural network, achieving accurate and stable rendering quality in real-time. However, due to the use of G-buffers, existing methods struggle to accurately render transparency and refraction effects, as G-buffers do not capture any ray information from multiple light ray bounces. This limitation results in blurriness, distortions, and loss of detail in rendered images that contain transparency and refraction, and is particularly notable in scenes with refracted objects that have high-frequency textures. In this work, we propose a neural network architecture to encode critical rendering information, including texture coordinates from refracted rays, and enable reconstruction of high-frequency textures in areas with refraction. Our approach is able to achieve accurate refraction rendering in challenging scenes with a diversity of overlapping transparent objects. Experimental results demonstrate that our method can interactively render high quality refraction effects with global illumination, unlike existing neural rendering approaches. Our code can be found at https://github.com/ziyangz5/CrystalNet</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PCLC-Net: Point Cloud Completion in Arbitrary Poses with Learnable Canonical Space PCLC-Net：利用可学习的典型空间完成任意姿态的点云补全

IF 2.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Graphics Forum

Pub Date : 2024-10-24 DOI: 10.1111/cgf.15217

Hanmo Xu, Qingyao Shuai, Xuejin Chen

Recovering the complete structure from partial point clouds in arbitrary poses is challenging. Recently, many efforts have been made to address this problem by developing SO(3)-equivariant completion networks or aligning the partial point clouds with a predefined canonical space before completion. However, these approaches are limited to random rotations only or demand costly pose annotation for model training. In this paper, we present a novel Network for Point cloud Completion with Learnable Canonical space (PCLC-Net) to reduce the need for pose annotations and extract SE(3)-invariant geometry features to improve the completion quality in arbitrary poses. Without pose annotations, our PCLC-Net utilizes self-supervised pose estimation to align the input partial point clouds to a canonical space that is learnable for an object category and subsequently performs shape completion in the learned canonical space. Our PCLC-Net can complete partial point clouds with arbitrary SE(3) poses without requiring pose annotations for supervision. Our PCLC-Net achieves state-of-the-art results on shape completion with arbitrary SE(3) poses on both synthetic and real scanned data. To the best of our knowledge, our method is the first to achieve shape completion in arbitrary poses without pose annotations during network training.

从任意姿态的部分点云中恢复完整结构具有挑战性。最近，很多人通过开发 SO(3)-equivariant 补全网络或在补全之前将部分点云与预定义的典型空间对齐来解决这一问题。然而，这些方法都仅限于随机旋转，或者在模型训练时需要昂贵的姿态注释。在本文中，我们提出了一种新颖的可学习典型空间点云补全网络（PCLC-Net），以减少对姿势注释的需求，并提取 SE(3)-invariant 几何特征，从而提高任意姿势下的补全质量。在没有姿态注释的情况下，我们的 PCLC-Net 利用自监督姿态估计将输入的部分点云对齐到对象类别可学习的规范空间，随后在学习到的规范空间中执行形状补全。我们的 PCLC-Net 可以完成具有任意 SE(3) 姿势的部分点云，而无需姿势注释监督。我们的 PCLC-Net 在合成数据和真实扫描数据的任意 SE(3) 姿态形状补全方面都取得了最先进的成果。据我们所知，我们的方法是第一种在网络训练过程中无需姿势注释就能实现任意姿势形状补全的方法。

{"title":"PCLC-Net: Point Cloud Completion in Arbitrary Poses with Learnable Canonical Space","authors":"Hanmo Xu, Qingyao Shuai, Xuejin Chen","doi":"10.1111/cgf.15217","DOIUrl":"https://doi.org/10.1111/cgf.15217","url":null,"abstract":"<p>Recovering the complete structure from partial point clouds in arbitrary poses is challenging. Recently, many efforts have been made to address this problem by developing SO(3)-equivariant completion networks or aligning the partial point clouds with a predefined canonical space before completion. However, these approaches are limited to random rotations only or demand costly pose annotation for model training. In this paper, we present a novel Network for Point cloud Completion with Learnable Canonical space (PCLC-Net) to reduce the need for pose annotations and extract SE(3)-invariant geometry features to improve the completion quality in arbitrary poses. Without pose annotations, our PCLC-Net utilizes self-supervised pose estimation to align the input partial point clouds to a canonical space that is learnable for an object category and subsequently performs shape completion in the learned canonical space. Our PCLC-Net can complete partial point clouds with arbitrary SE(3) poses without requiring pose annotations for supervision. Our PCLC-Net achieves state-of-the-art results on shape completion with arbitrary SE(3) poses on both synthetic and real scanned data. To the best of our knowledge, our method is the first to achieve shape completion in arbitrary poses without pose annotations during network training.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gaussian in the Dark: Real-Time View Synthesis From Inconsistent Dark Images Using Gaussian Splatting 黑暗中的高斯：利用高斯拼接技术从不连贯的黑暗图像中实时合成视图

IF 2.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Graphics Forum

Pub Date : 2024-10-24 DOI: 10.1111/cgf.15213

Sheng Ye, Zhen-Hui Dong, Yubin Hu, Yu-Hui Wen, Yong-Jin Liu

3D Gaussian Splatting has recently emerged as a powerful representation that can synthesize remarkable novel views using consistent multi-view images as input. However, we notice that images captured in dark environments where the scenes are not fully illuminated can exhibit considerable brightness variations and multi-view inconsistency, which poses great challenges to 3D Gaussian Splatting and severely degrades its performance. To tackle this problem, we propose Gaussian-DK. Observing that inconsistencies are mainly caused by camera imaging, we represent a consistent radiance field of the physical world using a set of anisotropic 3D Gaussians, and design a camera response module to compensate for multi-view inconsistencies. We also introduce a step-based gradient scaling strategy to constrain Gaussians near the camera, which turn out to be floaters, from splitting and cloning. Experiments on our proposed benchmark dataset demonstrate that Gaussian-DK produces high-quality renderings without ghosting and floater artifacts and significantly outperforms existing methods. Furthermore, we can also synthesize light-up images by controlling exposure levels that clearly show details in shadow areas.

最近，三维高斯拼接技术（3D Gaussian Splatting）作为一种强大的表示方法出现，它可以使用一致的多视角图像作为输入，合成出引人注目的新颖视图。然而，我们注意到，在黑暗环境中捕获的图像，由于场景未被完全照亮，会表现出相当大的亮度变化和多视角不一致性，这给三维高斯拼接带来了巨大挑战，并严重降低了其性能。为了解决这个问题，我们提出了高斯-DK。考虑到不一致性主要是由相机成像造成的，我们用一组各向异性的三维高斯来表示物理世界的一致辐射场，并设计了一个相机响应模块来补偿多视角不一致性。我们还引入了一种基于阶跃梯度缩放的策略，以限制相机附近的高斯（原来是漂浮物）分裂和克隆。在我们提出的基准数据集上进行的实验表明，Gaussian-DK 能生成没有重影和浮点伪影的高质量渲染图，其性能明显优于现有方法。此外，我们还能通过控制曝光水平合成亮光图像，清晰显示阴影区域的细节。

{"title":"Gaussian in the Dark: Real-Time View Synthesis From Inconsistent Dark Images Using Gaussian Splatting","authors":"Sheng Ye, Zhen-Hui Dong, Yubin Hu, Yu-Hui Wen, Yong-Jin Liu","doi":"10.1111/cgf.15213","DOIUrl":"https://doi.org/10.1111/cgf.15213","url":null,"abstract":"<p>3D Gaussian Splatting has recently emerged as a powerful representation that can synthesize remarkable novel views using consistent multi-view images as input. However, we notice that images captured in dark environments where the scenes are not fully illuminated can exhibit considerable brightness variations and multi-view inconsistency, which poses great challenges to 3D Gaussian Splatting and severely degrades its performance. To tackle this problem, we propose Gaussian-DK. Observing that inconsistencies are mainly caused by camera imaging, we represent a consistent radiance field of the physical world using a set of anisotropic 3D Gaussians, and design a camera response module to compensate for multi-view inconsistencies. We also introduce a step-based gradient scaling strategy to constrain Gaussians near the camera, which turn out to be floaters, from splitting and cloning. Experiments on our proposed benchmark dataset demonstrate that Gaussian-DK produces high-quality renderings without ghosting and floater artifacts and significantly outperforms existing methods. Furthermore, we can also synthesize light-up images by controlling exposure levels that clearly show details in shadow areas.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TempDiff: Enhancing Temporal-awareness in Latent Diffusion for Real-World Video Super-Resolution TempDiff：增强潜在扩散中的时间感知，实现真实世界的视频超分辨率

IF 2.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Graphics Forum

Pub Date : 2024-10-18 DOI: 10.1111/cgf.15211

Q. Jiang, Q.L. Wang, L.H. Chi, X.H. Chen, Q.Y. Zhang, R. Zhou, Z.Q. Deng, J.S. Deng, B.B. Tang, S.H. Lv, J. Liu

Latent diffusion models (LDMs) have demonstrated remarkable success in generative modeling. It is promising to leverage the potential of diffusion priors to enhance performance in image and video tasks. However, applying LDMs to video super-resolution (VSR) presents significant challenges due to the high demands for realistic details and temporal consistency in generated videos, exacerbated by the inherent stochasticity in the diffusion process. In this work, we propose a novel diffusion-based framework, Temporal-awareness Latent Diffusion Model (TempDiff), specifically designed for real-world video super-resolution, where degradations are diverse and complex. TempDiff harnesses the powerful generative prior of a pre-trained diffusion model and enhances temporal awareness through the following mechanisms: 1) Incorporating temporal layers into the denoising U-Net and VAE-Decoder, and fine-tuning these added modules to maintain temporal coherency; 2) Estimating optical flow guidance using a pre-trained flow net for latent optimization and propagation across video sequences, ensuring overall stability in the generated high-quality video. Extensive experiments demonstrate that TempDiff achieves compelling results, outperforming state-of-the-art methods on both synthetic and real-world VSR benchmark datasets. Code will be available at https://github.com/jiangqin567/TempDiff

潜在扩散模型（LDM）在生成模型中取得了显著的成功。利用扩散先验的潜力来提高图像和视频任务的性能是大有可为的。然而，将 LDM 应用于视频超分辨率（VSR）却面临着巨大的挑战，因为生成的视频对真实细节和时间一致性的要求很高，而扩散过程中固有的随机性又加剧了这一挑战。在这项工作中，我们提出了一种新颖的基于扩散的框架--时态感知潜在扩散模型（TempDiff），该框架专为真实世界视频超分辨率而设计，在真实世界中，降解是多样而复杂的。TempDiff 利用预先训练好的扩散模型的强大先验生成功能，通过以下机制增强时间感知能力：1）在去噪 U-Net 和 VAE-Decoder 中加入时间层，并对这些新增模块进行微调，以保持时间一致性；2）使用预先训练好的流网估算光流引导，以进行潜优化和跨视频序列传播，确保生成的高质量视频的整体稳定性。广泛的实验表明，TempDiff 取得了令人瞩目的成果，在合成和实际 VSR 基准数据集上的表现均优于最先进的方法。代码见 https://github.com/jiangqin567/TempDiff

{"title":"TempDiff: Enhancing Temporal-awareness in Latent Diffusion for Real-World Video Super-Resolution","authors":"Q. Jiang, Q.L. Wang, L.H. Chi, X.H. Chen, Q.Y. Zhang, R. Zhou, Z.Q. Deng, J.S. Deng, B.B. Tang, S.H. Lv, J. Liu","doi":"10.1111/cgf.15211","DOIUrl":"https://doi.org/10.1111/cgf.15211","url":null,"abstract":"<p>Latent diffusion models (LDMs) have demonstrated remarkable success in generative modeling. It is promising to leverage the potential of diffusion priors to enhance performance in image and video tasks. However, applying LDMs to video super-resolution (VSR) presents significant challenges due to the high demands for realistic details and temporal consistency in generated videos, exacerbated by the inherent stochasticity in the diffusion process. In this work, we propose a novel diffusion-based framework, Temporal-awareness Latent Diffusion Model (TempDiff), specifically designed for real-world video super-resolution, where degradations are diverse and complex. TempDiff harnesses the powerful generative prior of a pre-trained diffusion model and enhances temporal awareness through the following mechanisms: 1) Incorporating temporal layers into the denoising U-Net and VAE-Decoder, and fine-tuning these added modules to maintain temporal coherency; 2) Estimating optical flow guidance using a pre-trained flow net for latent optimization and propagation across video sequences, ensuring overall stability in the generated high-quality video. Extensive experiments demonstrate that TempDiff achieves compelling results, outperforming state-of-the-art methods on both synthetic and real-world VSR benchmark datasets. Code will be available at https://github.com/jiangqin567/TempDiff</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NeuPreSS: Compact Neural Precomputed Subsurface Scattering for Distant Lighting of Heterogeneous Translucent Objects NeuPreSS：用于异质半透明物体远距离照明的紧凑型神经预计算次表面散射法

IF 2.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Graphics Forum

Pub Date : 2024-10-18 DOI: 10.1111/cgf.15234

T. TG, J. R. Frisvad, R. Ramamoorthi, H. W. Jensen

Monte Carlo rendering of translucent objects with heterogeneous scattering properties is often expensive both in terms of memory and computation. If the scattering properties are described by a 3D texture, memory consumption is high. If we do path tracing and use a high dynamic range lighting environment, the computational cost of the rendering can easily become significant. We propose a compact and efficient neural method for representing and rendering the appearance of heterogeneous translucent objects. Instead of assuming only surface variation of optical properties, our method represents the appearance of a full object taking its geometry and volumetric heterogeneities into account. This is similar to a neural radiance field, but our representation works for an arbitrary distant lighting environment. In a sense, we present a version of neural precomputed radiance transfer that captures relighting of heterogeneous translucent objects. We use a multi-layer perceptron (MLP) with skip connections to represent the appearance of an object as a function of spatial position, direction of observation, and direction of incidence. The latter is considered a directional light incident across the entire non-self-shadowed part of the object. We demonstrate the ability of our method to compactly store highly complex materials while having high accuracy when comparing to reference images of the represented object in unseen lighting environments. As compared with path tracing of a heterogeneous light scattering volume behind a refractive interface, our method more easily enables importance sampling of the directions of incidence and can be integrated into existing rendering frameworks while achieving interactive frame rates.

对具有不同散射特性的半透明物体进行蒙特卡罗渲染，通常需要耗费大量内存和计算量。如果散射特性由三维纹理描述，内存消耗就会很高。如果我们进行路径追踪并使用高动态范围照明环境，渲染的计算成本很容易变得很高。我们提出了一种紧凑高效的神经方法，用于表示和渲染异质半透明物体的外观。我们的方法不只假设光学特性的表面变化，而是将整个物体的几何形状和体积异质性考虑在内，表现其外观。这类似于神经辐射场，但我们的表示方法适用于任意的远距离照明环境。从某种意义上说，我们提出了神经预计算辐射度传递的一个版本，可以捕捉异质半透明物体的再照明。我们使用具有跳越连接的多层感知器（MLP），将物体的外观表示为空间位置、观察方向和入射方向的函数。入射方向被认为是入射物体整个非自阴影部分的定向光。我们证明了我们的方法能够紧凑地存储高度复杂的材料，同时与所代表物体在未知照明环境下的参考图像进行比较时具有很高的准确性。与折射界面后的异质光散射体积的路径追踪相比，我们的方法更容易实现入射方向的重要性采样，并可集成到现有的渲染框架中，同时实现交互式帧速率。

{"title":"NeuPreSS: Compact Neural Precomputed Subsurface Scattering for Distant Lighting of Heterogeneous Translucent Objects","authors":"T. TG, J. R. Frisvad, R. Ramamoorthi, H. W. Jensen","doi":"10.1111/cgf.15234","DOIUrl":"https://doi.org/10.1111/cgf.15234","url":null,"abstract":"<div>\u0000 <p>Monte Carlo rendering of translucent objects with heterogeneous scattering properties is often expensive both in terms of memory and computation. If the scattering properties are described by a 3D texture, memory consumption is high. If we do path tracing and use a high dynamic range lighting environment, the computational cost of the rendering can easily become significant. We propose a compact and efficient neural method for representing and rendering the appearance of heterogeneous translucent objects. Instead of assuming only surface variation of optical properties, our method represents the appearance of a full object taking its geometry and volumetric heterogeneities into account. This is similar to a neural radiance field, but our representation works for an arbitrary distant lighting environment. In a sense, we present a version of neural precomputed radiance transfer that captures relighting of heterogeneous translucent objects. We use a multi-layer perceptron (MLP) with skip connections to represent the appearance of an object as a function of spatial position, direction of observation, and direction of incidence. The latter is considered a directional light incident across the entire non-self-shadowed part of the object. We demonstrate the ability of our method to compactly store highly complex materials while having high accuracy when comparing to reference images of the represented object in unseen lighting environments. As compared with path tracing of a heterogeneous light scattering volume behind a refractive interface, our method more easily enables importance sampling of the directions of incidence and can be integrated into existing rendering frameworks while achieving interactive frame rates.</p>\u0000 </div>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cgf.15234","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unerosion: Simulating Terrain Evolution Back in Time Unerosion：回到过去模拟地形演变

IF 2.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Graphics Forum

Pub Date : 2024-10-17 DOI: 10.1111/cgf.15182

Zhanyu Yang, Guillaume Cordonnier, Marie-Paule Cani, Christian Perrenoud, Bedrich Benes

While the past of terrain cannot be known precisely because an effect can result from many different causes, exploring these possible pasts opens the way to numerous applications ranging from movies and games to paleogeography. We introduce unerosion, an attempt to recover plausible past topographies from an input terrain represented as a height field. Our solution relies on novel algorithms for the backward simulation of different processes: fluvial erosion, sedimentation, and thermal erosion. This is achieved by re-formulating the equations of erosion and sedimentation so that they can be simulated back in time. These algorithms can be combined to account for a succession of climate changes backward in time, while the possible ambiguities provide editing options to the user. Results show that our solution can approximately reverse different types of erosion while enabling users to explore a variety of alternative pasts. Using a chronology of climatic periods to inform us about the main erosion phenomena, we also went back in time using real measured terrain data. We checked the consistency with geological findings, namely the height of river beds hundreds of thousands of years ago.

虽然地形的过去无法精确得知，因为一种影响可能由许多不同的原因造成，但探索这些可能的过去为从电影和游戏到古地理学的众多应用开辟了道路。我们介绍了 Unerosion，这是一种尝试从高度场表示的输入地形中恢复可信的过去地形的方法。我们的解决方案依靠新颖的算法来反向模拟不同的过程：河流侵蚀、沉积和热侵蚀。这是通过重新制定侵蚀和沉积方程来实现的，这样就可以对它们进行时间回溯模拟。这些算法可以结合在一起，以解释在时间上的连续气候变化，而可能存在的模糊性则为用户提供了编辑选项。结果表明，我们的解决方案可以近似逆转不同类型的侵蚀，同时让用户能够探索各种不同的过去。通过气候年表，我们了解了主要的侵蚀现象，还利用真实测量的地形数据进行了时间回溯。我们检查了与地质发现的一致性，即数十万年前河床的高度。

{"title":"Unerosion: Simulating Terrain Evolution Back in Time","authors":"Zhanyu Yang, Guillaume Cordonnier, Marie-Paule Cani, Christian Perrenoud, Bedrich Benes","doi":"10.1111/cgf.15182","DOIUrl":"https://doi.org/10.1111/cgf.15182","url":null,"abstract":"<div>\u0000 \u0000 <p>While the past of terrain cannot be known precisely because an effect can result from many different causes, exploring these possible pasts opens the way to numerous applications ranging from movies and games to paleogeography. We introduce unerosion, an attempt to recover plausible past topographies from an input terrain represented as a height field. Our solution relies on novel algorithms for the backward simulation of different processes: fluvial erosion, sedimentation, and thermal erosion. This is achieved by re-formulating the equations of erosion and sedimentation so that they can be simulated back in time. These algorithms can be combined to account for a succession of climate changes backward in time, while the possible ambiguities provide editing options to the user. Results show that our solution can approximately reverse different types of erosion while enabling users to explore a variety of alternative pasts. Using a chronology of climatic periods to inform us about the main erosion phenomena, we also went back in time using real measured terrain data. We checked the consistency with geological findings, namely the height of river beds hundreds of thousands of years ago.</p>\u0000 </div>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cgf.15182","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142707654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ADAPT: AI-Driven Artefact Purging Technique for IMU Based Motion Capture ADAPT：基于 IMU 运动捕捉的人工智能驱动伪影清除技术

IF 2.7 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computer Graphics Forum

Pub Date : 2024-10-17 DOI: 10.1111/cgf.15172

P. Schreiner, R. Netterstrøm, H. Yin, S. Darkner, K. Erleben

While IMU based motion capture offers a cost-effective alternative to premium camera-based systems, it often falls short in matching the latter's realism. Common distortions, such as self-penetrating body parts, foot skating, and floating, limit the usability of these systems, particularly for high-end users. To address this, we employed reinforcement learning to train an AI agent that mimics erroneous sample motion. Since our agent operates within a simulated environment, it inherently avoids generating these distortions since it must adhere to the laws of physics. Impressively, the agent manages to mimic the sample motions while preserving their distinctive characteristics. We assessed our method's efficacy across various types of input data, showcasing an ideal blend of artefact-laden IMU-based data with high-grade optical motion capture data. Furthermore, we compared the configuration of observation and action spaces with other implementations, pinpointing the most suitable configuration for our purposes. All our models underwent rigorous evaluation using a spectrum of quantitative metrics complemented by a qualitative review. These evaluations were performed using a benchmark dataset of IMU-based motion data from actors not included in the training data.

虽然基于 IMU 的运动捕捉系统为基于摄像头的高级系统提供了一种具有成本效益的替代方案，但它往往无法达到后者的逼真度。常见的失真现象，如自穿透身体部位、脚部滑动和漂浮等，限制了这些系统的可用性，尤其是对高端用户而言。为了解决这个问题，我们采用了强化学习的方法来训练一个人工智能代理，以模仿错误的样本运动。由于我们的代理是在模拟环境中运行的，它必须遵守物理定律，因此从本质上避免了产生这些失真。令人印象深刻的是，该代理能够在模仿样本运动的同时保留其独特的特征。我们评估了我们的方法在各种类型的输入数据中的有效性，展示了基于假象的 IMU 数据与高级光学运动捕捉数据的理想融合。此外，我们还将观察空间和行动空间的配置与其他实现方法进行了比较，从而确定了最适合我们目的的配置。我们使用一系列定量指标对所有模型进行了严格评估，并辅以定性审查。这些评估是使用一个基准数据集进行的，该数据集是基于 IMU 的演员运动数据，但不包括在训练数据中。

{"title":"ADAPT: AI-Driven Artefact Purging Technique for IMU Based Motion Capture","authors":"P. Schreiner, R. Netterstrøm, H. Yin, S. Darkner, K. Erleben","doi":"10.1111/cgf.15172","DOIUrl":"https://doi.org/10.1111/cgf.15172","url":null,"abstract":"<div>\u0000 \u0000 <p>While IMU based motion capture offers a cost-effective alternative to premium camera-based systems, it often falls short in matching the latter's realism. Common distortions, such as self-penetrating body parts, foot skating, and floating, limit the usability of these systems, particularly for high-end users. To address this, we employed reinforcement learning to train an AI agent that mimics erroneous sample motion. Since our agent operates within a simulated environment, it inherently avoids generating these distortions since it must adhere to the laws of physics. Impressively, the agent manages to mimic the sample motions while preserving their distinctive characteristics. We assessed our method's efficacy across various types of input data, showcasing an ideal blend of artefact-laden IMU-based data with high-grade optical motion capture data. Furthermore, we compared the configuration of observation and action spaces with other implementations, pinpointing the most suitable configuration for our purposes. All our models underwent rigorous evaluation using a spectrum of quantitative metrics complemented by a qualitative review. These evaluations were performed using a benchmark dataset of IMU-based motion data from actors not included in the training data.</p>\u0000 </div>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cgf.15172","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142707689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0