Computers & Graphics-Uk最新文献_第8页

Single-image SVBRDF estimation with auto-adaptive high-frequency feature extraction 利用自动适应性高频特征提取进行单图像 SVBRDF 估算

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-10-09 DOI: 10.1016/j.cag.2024.104103

Jiamin Cheng, Li Wang, Lianghao Zhang, Fangzhou Gao, Jiawan Zhang

In this paper, we address the task of estimating spatially-varying bi-directional reflectance distribution functions (SVBRDF) of a near-planar surface from a single flash-lit image. Disentangling SVBRDF from the material appearance by deep learning has proven a formidable challenge. This difficulty is particularly pronounced when dealing with images lit by a point light source because the uneven distribution of irradiance in the scene interacts with the surface, leading to significant global luminance variations across the image. These variations may be overemphasized by the network and wrongly baked into the material property space. To tackle this issue, we propose a high-frequency path that contains an auto-adaptive subband “knob”. This path aims to extract crucial image textures and details while eliminating global luminance variations present in the original image. Furthermore, recognizing that color information is ignored in this path, we design a two-path strategy to jointly estimate material reflectance from both the high-frequency path and the original image. Extensive experiments on a substantial dataset have confirmed the effectiveness of our method. Our method outperforms state-of-the-art methods across a wide range of materials.

在本文中，我们要解决的任务是从单张闪光图像中估算近平面表面的空间变化双向反射率分布函数（SVBRDF）。事实证明，通过深度学习将 SVBRDF 从材料外观中分离出来是一项艰巨的挑战。在处理由点光源点亮的图像时，这种困难尤为明显，因为场景中不均匀的辐照度分布与表面相互作用，导致整个图像的亮度出现显著的全局变化。这些变化可能会被网络过度强调，并错误地嵌入到材料属性空间中。为了解决这个问题，我们提出了一种包含自动适应子带 "旋钮 "的高频路径。该路径旨在提取关键的图像纹理和细节，同时消除原始图像中存在的全局亮度变化。此外，考虑到该路径忽略了颜色信息，我们设计了一种双路径策略，从高频路径和原始图像中联合估计材料反射率。在大量数据集上进行的广泛实验证实了我们方法的有效性。在各种材料上，我们的方法都优于最先进的方法。

{"title":"Single-image SVBRDF estimation with auto-adaptive high-frequency feature extraction","authors":"Jiamin Cheng, Li Wang, Lianghao Zhang, Fangzhou Gao, Jiawan Zhang","doi":"10.1016/j.cag.2024.104103","DOIUrl":"10.1016/j.cag.2024.104103","url":null,"abstract":"<div><div>In this paper, we address the task of estimating spatially-varying bi-directional reflectance distribution functions (SVBRDF) of a near-planar surface from a single flash-lit image. Disentangling SVBRDF from the material appearance by deep learning has proven a formidable challenge. This difficulty is particularly pronounced when dealing with images lit by a point light source because the uneven distribution of irradiance in the scene interacts with the surface, leading to significant global luminance variations across the image. These variations may be overemphasized by the network and wrongly baked into the material property space. To tackle this issue, we propose a high-frequency path that contains an auto-adaptive subband “knob”. This path aims to extract crucial image textures and details while eliminating global luminance variations present in the original image. Furthermore, recognizing that color information is ignored in this path, we design a two-path strategy to jointly estimate material reflectance from both the high-frequency path and the original image. Extensive experiments on a substantial dataset have confirmed the effectiveness of our method. Our method outperforms state-of-the-art methods across a wide range of materials.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104103"},"PeriodicalIF":2.5,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An immersive labeling method for large point clouds 大型点云的沉浸式标注方法

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-10-05 DOI: 10.1016/j.cag.2024.104101

Tianfang Lin , Zhongyuan Yu , Matthew McGinity , Stefan Gumhold

3D point clouds, such as those produced by 3D scanners, often require labeling – the accurate classification of each point into structural or semantic categories – before they can be used in their intended application. However, in the absence of fully automated methods, such labeling must be performed manually, which can prove extremely time and labor intensive. To address this we present a virtual reality tool for accelerating and improving the manual labeling of very large 3D point clouds. The labeling tool provides a variety of 3D interactions for efficient viewing, selection and labeling of points using the controllers of consumer VR-kits. The main contribution of our work is a mixed CPU/GPU-based data structure that supports rendering, selection and labeling with immediate visual feedback at high frame rates necessary for a convenient VR experience. Our mixed CPU/GPU data structure supports fluid interaction with very large point clouds in VR, what is not possible with existing continuous level-of-detail rendering algorithms. We evaluate our method with 25 users on tasks involving point clouds of up to 50 million points and find convincing results that support the case for VR-based point cloud labeling.

三维点云（如三维扫描仪生成的点云）通常需要进行标注--将每个点精确分类为结构或语义类别--才能用于其预期应用。然而，由于缺乏完全自动化的方法，这种标注工作必须手动完成，这可能会耗费大量的时间和人力。为此，我们推出了一款虚拟现实工具，用于加速和改进超大型三维点云的手动标注。该标注工具提供了多种三维交互方式，可使用消费级 VR 工具包的控制器高效查看、选择和标注点。我们工作的主要贡献是基于 CPU/GPU 的混合数据结构，它支持渲染、选择和标注，并能以方便的 VR 体验所需的高帧速率提供即时视觉反馈。我们的 CPU/GPU 混合数据结构支持在 VR 中与超大型点云进行流畅交互，而现有的连续细节级渲染算法则无法实现这一点。我们与 25 位用户就涉及多达 5000 万个点的点云任务对我们的方法进行了评估，结果令人信服，支持基于 VR 的点云标注。

{"title":"An immersive labeling method for large point clouds","authors":"Tianfang Lin , Zhongyuan Yu , Matthew McGinity , Stefan Gumhold","doi":"10.1016/j.cag.2024.104101","DOIUrl":"10.1016/j.cag.2024.104101","url":null,"abstract":"<div><div>3D point clouds, such as those produced by 3D scanners, often require labeling – the accurate classification of each point into structural or semantic categories – before they can be used in their intended application. However, in the absence of fully automated methods, such labeling must be performed manually, which can prove extremely time and labor intensive. To address this we present a virtual reality tool for accelerating and improving the manual labeling of very large 3D point clouds. The labeling tool provides a variety of 3D interactions for efficient viewing, selection and labeling of points using the controllers of consumer VR-kits. The main contribution of our work is a mixed CPU/GPU-based data structure that supports rendering, selection and labeling with immediate visual feedback at high frame rates necessary for a convenient VR experience. Our mixed CPU/GPU data structure supports fluid interaction with very large point clouds in VR, what is not possible with existing continuous level-of-detail rendering algorithms. We evaluate our method with 25 users on tasks involving point clouds of up to 50 million points and find convincing results that support the case for VR-based point cloud labeling.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104101"},"PeriodicalIF":2.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advances in vision-based deep learning methods for interacting hands reconstruction: A survey 基于视觉的深度学习方法在交互式手部重建方面的进展：调查

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-10-05 DOI: 10.1016/j.cag.2024.104102

Yu Miao, Yue Liu

Vision-based hand reconstructions have become noteworthy tools in enhancing interactive experiences in various applications such as virtual reality, augmented reality, and autonomous driving, which enable sophisticated interactions by reconstructing complex motions of human hands. Despite significant progress driven by deep-learning methodologies, the quest for high-fidelity interacting hands reconstruction faces challenges such as limited dataset diversity, lack of detailed hand representation, occlusions, and differentiation between similar hand structures. This survey thoroughly reviews deep learning-based methods, diverse datasets, loss functions, and evaluation metrics addressing the complexities of interacting hands reconstruction. Mainstream algorithms of the past five years are systematically classified into two main categories: algorithms that employ explicit representations, such as parametric meshes and 3D Gaussian splatting, and those that utilize implicit representations, including signed distance fields and neural radiance fields. Novel deep-learning models like graph convolutional networks and transformers are applied to solve the aforementioned challenges in hand reconstruction effectively. Beyond summarizing these interaction-aware algorithms, this survey also briefly discusses hand tracking in virtual reality and augmented reality. To the best of our knowledge, this is the first survey specifically focusing on the reconstruction of both hands and their interactions with objects. The survey contains the various facets of hand modeling, deep learning approaches, and datasets, broadening the horizon of hand reconstruction research and future innovation in natural user interactions.

在虚拟现实、增强现实和自动驾驶等各种应用中，基于视觉的手部重建已成为增强交互体验的重要工具，这些应用通过重建人手的复杂动作实现了复杂的交互。尽管在深度学习方法的推动下取得了重大进展，但高保真交互手部重建的探索仍面临挑战，如数据集多样性有限、缺乏详细的手部表示、遮挡以及相似手部结构之间的区分。本调查全面回顾了基于深度学习的方法、各种数据集、损失函数和评估指标，以解决交互式手部重建的复杂性问题。过去五年的主流算法被系统地分为两大类：一类是采用显式表示的算法，如参数网格和三维高斯拼接；另一类是采用隐式表示的算法，包括符号距离场和神经辐射场。图卷积网络和变换器等新型深度学习模型被用于有效解决上述手部重建难题。除了总结这些交互感知算法外，本调查还简要讨论了虚拟现实和增强现实中的手部跟踪。据我们所知，这是第一份专门针对双手重建及其与物体交互的调查报告。该调查包含了手部建模、深度学习方法和数据集的各个方面，拓宽了手部重建研究和未来自然用户交互创新的视野。

{"title":"Advances in vision-based deep learning methods for interacting hands reconstruction: A survey","authors":"Yu Miao, Yue Liu","doi":"10.1016/j.cag.2024.104102","DOIUrl":"10.1016/j.cag.2024.104102","url":null,"abstract":"<div><div>Vision-based hand reconstructions have become noteworthy tools in enhancing interactive experiences in various applications such as virtual reality, augmented reality, and autonomous driving, which enable sophisticated interactions by reconstructing complex motions of human hands. Despite significant progress driven by deep-learning methodologies, the quest for high-fidelity interacting hands reconstruction faces challenges such as limited dataset diversity, lack of detailed hand representation, occlusions, and differentiation between similar hand structures. This survey thoroughly reviews deep learning-based methods, diverse datasets, loss functions, and evaluation metrics addressing the complexities of interacting hands reconstruction. Mainstream algorithms of the past five years are systematically classified into two main categories: algorithms that employ explicit representations, such as parametric meshes and 3D Gaussian splatting, and those that utilize implicit representations, including signed distance fields and neural radiance fields. Novel deep-learning models like graph convolutional networks and transformers are applied to solve the aforementioned challenges in hand reconstruction effectively. Beyond summarizing these interaction-aware algorithms, this survey also briefly discusses hand tracking in virtual reality and augmented reality. To the best of our knowledge, this is the first survey specifically focusing on the reconstruction of both hands and their interactions with objects. The survey contains the various facets of hand modeling, deep learning approaches, and datasets, broadening the horizon of hand reconstruction research and future innovation in natural user interactions.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104102"},"PeriodicalIF":2.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Diverse non-homogeneous texture synthesis from a single exemplar 从单一范例中合成多样化非均质纹理

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-10-04 DOI: 10.1016/j.cag.2024.104099

A. Phillips , J. Lang , D. Mould

Capturing non-local, long range features present in non-homogeneous textures is difficult to achieve with existing techniques. We introduce a new training method and architecture for single-exemplar texture synthesis that combines a Generative Adversarial Network (GAN) and a Variational Autoencoder (VAE). In the proposed architecture, the combined networks share information during training via structurally identical, independent blocks, facilitating highly diverse texture variations from a single image exemplar. Supporting this training method, we also include a similarity loss term that further encourages diverse output while also improving the overall quality. Using our approach, it is possible to produce diverse results over the entire sample size taken from a single model that can be trained in approximately 15 min. We show that our approach obtains superior performance when compared to SOTA texture synthesis methods and single image GAN methods using standard diversity and quality metrics.

现有技术难以捕捉非同质纹理中的非局部、长距离特征。我们为单例纹理合成引入了一种新的训练方法和架构，它结合了生成对抗网络（GAN）和变异自动编码器（VAE）。在所提出的架构中，组合网络在训练过程中通过结构相同的独立块共享信息，从而促进单个图像示例的纹理变化高度多样化。为了支持这种训练方法，我们还加入了一个相似性损失项，在提高整体质量的同时，进一步鼓励多样化的输出。使用我们的方法，可以在大约 15 分钟的时间内，通过一个单一模型的训练，在整个样本大小上产生多样化的结果。我们的研究表明，与 SOTA 纹理合成方法和使用标准多样性和质量指标的单图像 GAN 方法相比，我们的方法具有更优越的性能。

引用次数: 0

Geometric implicit neural representations for signed distance functions 带符号距离函数的几何隐含神经表征

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-10-01 DOI: 10.1016/j.cag.2024.104085

Luiz Schirmer , Tiago Novello , Vinícius da Silva , Guilherme Schardong , Daniel Perazzo , Hélio Lopes , Nuno Gonçalves , Luiz Velho

Implicit neural representations (INRs) have emerged as a promising framework for representing signals in low-dimensional spaces. This survey reviews the existing literature on the specialized INR problem of approximating signed distance functions (SDFs) for surface scenes, using either oriented point clouds or a set of posed images. We refer to neural SDFs that incorporate differential geometry tools, such as normals and curvatures, in their loss functions as geometric INRs. The key idea behind this 3D reconstruction approach is to include additional regularization terms in the loss function, ensuring that the INR satisfies certain global properties that the function should hold — such as having unit gradient in the case of SDFs. We explore key methodological components, including the definition of INR, the construction of geometric loss functions, and sampling schemes from a differential geometry perspective. Our review highlights the significant advancements enabled by geometric INRs in surface reconstruction from oriented point clouds and posed images.

隐式神经表征（INRs）是在低维空间中表示信号的一种有前途的框架。本研究回顾了现有的 INR 专门问题文献，即利用定向点云或一组假定图像来逼近表面场景的符号距离函数 (SDF)。我们将损失函数中包含法线和曲率等微分几何工具的神经 SDF 称为几何 INR。这种三维重建方法背后的关键理念是在损失函数中加入额外的正则化项，确保 INR 满足函数应具有的某些全局属性--例如在 SDF 中具有单位梯度。我们从微分几何的角度探讨了方法论的关键部分，包括 INR 的定义、几何损失函数的构建和采样方案。我们的综述强调了几何 INR 在从定向点云和假定图像进行表面重建方面取得的重大进展。

{"title":"Geometric implicit neural representations for signed distance functions","authors":"Luiz Schirmer , Tiago Novello , Vinícius da Silva , Guilherme Schardong , Daniel Perazzo , Hélio Lopes , Nuno Gonçalves , Luiz Velho","doi":"10.1016/j.cag.2024.104085","DOIUrl":"10.1016/j.cag.2024.104085","url":null,"abstract":"<div><div><em>Implicit neural representations</em> (INRs) have emerged as a promising framework for representing signals in low-dimensional spaces. This survey reviews the existing literature on the specialized INR problem of approximating <em>signed distance functions</em> (SDFs) for surface scenes, using either oriented point clouds or a set of posed images. We refer to neural SDFs that incorporate differential geometry tools, such as normals and curvatures, in their loss functions as <em>geometric</em> INRs. The key idea behind this 3D reconstruction approach is to include additional <em>regularization</em> terms in the loss function, ensuring that the INR satisfies certain global properties that the function should hold — such as having unit gradient in the case of SDFs. We explore key methodological components, including the definition of INR, the construction of geometric loss functions, and sampling schemes from a differential geometry perspective. Our review highlights the significant advancements enabled by geometric INRs in surface reconstruction from oriented point clouds and posed images.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104085"},"PeriodicalIF":2.5,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Flow style-aware network for arbitrary style transfer 用于任意样式传输的流量样式感知网络

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-29 DOI: 10.1016/j.cag.2024.104098

Zhenshan Hu, Bin Ge, Chenxing Xia, Wenyan Wu, Guangao Zhou, Baotong Wang

Researchers have recently proposed arbitrary style transfer methods based on various model frameworks. Although all of them have achieved good results, they still face the problems of insufficient stylization, artifacts and inadequate retention of content structure. In order to solve these problems, we propose a flow style-aware network (FSANet) for arbitrary style transfer, which combines a VGG network and a flow network. FSANet consists of a flow style transfer module (FSTM), a dynamic regulation attention module (DRAM), and a style feature interaction module (SFIM). The flow style transfer module uses the reversible residue block features of the flow network to create a sample feature containing the target content and style. To adapt the FSTM to VGG networks, we design the dynamic regulation attention module and exploit the sample features both at the channel and pixel levels. The style feature interaction module computes a style tensor that optimizes the fused features. Extensive qualitative and quantitative experiments demonstrate that our proposed FSANet can effectively avoid artifacts and enhance the preservation of content details while migrating style features.

最近，研究人员提出了基于各种模型框架的任意文体转换方法。虽然这些方法都取得了不错的效果，但仍然面临着风格化不足、伪原创和内容结构保留不充分等问题。为了解决这些问题，我们提出了一种用于任意风格转移的流风格感知网络（FSANet），它将 VGG 网络和流网络相结合。FSANet 由流量风格传输模块（FSTM）、动态调节关注模块（DRAM）和风格特征交互模块（SFIM）组成。流式传输模块利用流式网络的可逆残差块特征创建包含目标内容和风格的样本特征。为使 FSTM 适应 VGG 网络，我们设计了动态调节关注模块，并在通道和像素层面利用样本特征。风格特征交互模块可计算出优化融合特征的风格张量。广泛的定性和定量实验证明，我们提出的 FSANet 可以在迁移风格特征时有效避免伪影，并增强对内容细节的保护。

{"title":"Flow style-aware network for arbitrary style transfer","authors":"Zhenshan Hu, Bin Ge, Chenxing Xia, Wenyan Wu, Guangao Zhou, Baotong Wang","doi":"10.1016/j.cag.2024.104098","DOIUrl":"10.1016/j.cag.2024.104098","url":null,"abstract":"<div><div>Researchers have recently proposed arbitrary style transfer methods based on various model frameworks. Although all of them have achieved good results, they still face the problems of insufficient stylization, artifacts and inadequate retention of content structure. In order to solve these problems, we propose a flow style-aware network (FSANet) for arbitrary style transfer, which combines a VGG network and a flow network. FSANet consists of a flow style transfer module (FSTM), a dynamic regulation attention module (DRAM), and a style feature interaction module (SFIM). The flow style transfer module uses the reversible residue block features of the flow network to create a sample feature containing the target content and style. To adapt the FSTM to VGG networks, we design the dynamic regulation attention module and exploit the sample features both at the channel and pixel levels. The style feature interaction module computes a style tensor that optimizes the fused features. Extensive qualitative and quantitative experiments demonstrate that our proposed FSANet can effectively avoid artifacts and enhance the preservation of content details while migrating style features.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104098"},"PeriodicalIF":2.5,"publicationDate":"2024-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-supervised reconstruction of re-renderable facial textures from single image 从单张图像自监督重建可重新渲染的面部纹理

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-28 DOI: 10.1016/j.cag.2024.104096

Mingxin Yang , Jianwei Guo , Xiaopeng Zhang , Zhanglin Cheng

Reconstructing high-fidelity 3D facial texture from a single image is a quite challenging task due to the lack of complete face information and the domain gap between the 3D face and 2D image. Further, obtaining re-renderable 3D faces has become a strongly desired property in many applications, where the term ’re-renderable’ demands the facial texture to be spatially complete and disentangled with environmental illumination. In this paper, we propose a new self-supervised deep learning framework for reconstructing high-quality and re-renderable facial albedos from single-view images in the wild. Our main idea is to first utilize a prior generation module based on the 3DMM proxy model to produce an unwrapped texture and a globally parameterized prior albedo. Then we apply a detail refinement module to synthesize the final texture with both high-frequency details and completeness. To further make facial textures disentangled with illumination, we propose a novel detailed illumination representation that is reconstructed with the detailed albedo together. We also design several novel regularization losses on both the albedo and illumination maps to facilitate the disentanglement of these two factors. Finally, by leveraging a differentiable renderer, each face attribute can be jointly trained in a self-supervised manner without requiring ground-truth facial reflectance. Extensive comparisons and ablation studies on challenging datasets demonstrate that our framework outperforms state-of-the-art approaches.

由于缺乏完整的人脸信息以及三维人脸和二维图像之间的域差距，从单张图像重建高保真三维人脸纹理是一项相当具有挑战性的任务。此外，在许多应用中，获得可重新渲染的三维人脸已成为人们强烈渴望的属性，其中 "可重新渲染 "一词要求面部纹理在空间上是完整的，并且与环境光照相分离。在本文中，我们提出了一种新的自监督深度学习框架，用于从野外单视角图像中重建高质量和可重新渲染的面部反差。我们的主要思路是，首先利用基于 3DMM 代理模型的先验生成模块，生成无包裹纹理和全局参数化的先验反照率。然后，我们使用细节细化模块合成具有高频细节和完整性的最终纹理。为了进一步使面部纹理与光照分离，我们提出了一种新颖的详细光照表示法，该表示法与详细反照率一起重建。我们还在反照率和光照图上设计了几种新的正则化损失，以促进这两个因素的分离。最后，通过利用可微分渲染器，每个脸部属性都能以自我监督的方式得到联合训练，而不需要地面真实的脸部反射率。在具有挑战性的数据集上进行的广泛比较和消融研究表明，我们的框架优于最先进的方法。

{"title":"Self-supervised reconstruction of re-renderable facial textures from single image","authors":"Mingxin Yang , Jianwei Guo , Xiaopeng Zhang , Zhanglin Cheng","doi":"10.1016/j.cag.2024.104096","DOIUrl":"10.1016/j.cag.2024.104096","url":null,"abstract":"<div><div>Reconstructing high-fidelity 3D facial texture from a single image is a quite challenging task due to the lack of complete face information and the domain gap between the 3D face and 2D image. Further, obtaining re-renderable 3D faces has become a strongly desired property in many applications, where the term ’re-renderable’ demands the facial texture to be spatially complete and disentangled with environmental illumination. In this paper, we propose a new self-supervised deep learning framework for reconstructing high-quality and re-renderable facial albedos from single-view images in the wild. Our main idea is to first utilize a <em>prior generation module</em> based on the 3DMM proxy model to produce an unwrapped texture and a globally parameterized prior albedo. Then we apply a <em>detail refinement module</em> to synthesize the final texture with both high-frequency details and completeness. To further make facial textures disentangled with illumination, we propose a novel detailed illumination representation that is reconstructed with the detailed albedo together. We also design several novel regularization losses on both the albedo and illumination maps to facilitate the disentanglement of these two factors. Finally, by leveraging a differentiable renderer, each face attribute can be jointly trained in a self-supervised manner without requiring ground-truth facial reflectance. Extensive comparisons and ablation studies on challenging datasets demonstrate that our framework outperforms state-of-the-art approaches.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104096"},"PeriodicalIF":2.5,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142446516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Psychophysiology of rhythmic stimuli and time experience in virtual reality 虚拟现实中节奏刺激和时间体验的心理生理学

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-27 DOI: 10.1016/j.cag.2024.104097

Stéven Picard, Jean Botev

Time experience is an essential part of one’s perception of any environment, real or virtual. In this article, from a virtual environment design perspective, we explore how rhythmic stimuli can influence an unrelated cognitive task regarding time experience and performance in virtual reality. This study explicitly includes physiological data to investigate how, overall, experience correlates with psychophysiological observations. The task involves sorting 3D objects by shape, with varying rhythmic stimuli in terms of their tempo and sensory channel (auditory and/or visual) in different trials, to collect subjective measures of time estimation and judgment. The results indicate different effects on time experience and performance depending on the context, such as user fatigue and trial repetition. Depending on the context, a positive impact of audio stimuli or a negative impact of visual stimuli on task performance can be observed, as well as time being underestimated concerning tempo in relation to task familiarity. However, some effects are consistent regardless of context, such as time being judged to pass faster with additional stimuli or consistent correlations between participants’ performance and time experience, suggesting flow-related aspects. We also observe correlations between time experience with eye-tracking data and body temperature, yet some of these correlations may be due to a confounding effect of fatigue. If confirmed as separate from fatigue, these physiological data could be used as reference point for evaluating a user’s time experience. This might be of great interest for designing virtual environments, as purposeful stimuli can strongly influence task performance and time experience, both essential components of virtual environment user experience.

时间体验是人们感知任何真实或虚拟环境的重要组成部分。在本文中，我们从虚拟环境设计的角度出发，探讨了节奏刺激如何影响一项无关的认知任务，即虚拟现实中的时间体验和表现。本研究明确包含生理数据，以调查总体体验如何与心理生理观察结果相关联。这项任务涉及按照形状对三维物体进行分类，在不同的试验中，节奏刺激的节奏和感官通道（听觉和/或视觉）各不相同，以收集时间估计和判断的主观测量结果。结果表明，不同的情境（如用户疲劳和重复试验）对时间体验和性能有不同的影响。根据情境的不同，可以观察到音频刺激对任务执行的积极影响或视觉刺激对任务执行的消极影响，以及与任务熟悉程度有关的节奏对时间估计不足的影响。不过，有些影响是一致的，与情境无关，例如额外刺激会使时间流逝得更快，或者参与者的表现与时间体验之间存在一致的相关性，这表明与流动有关。我们还观察到时间体验与眼动跟踪数据和体温之间的相关性，但其中一些相关性可能是由于疲劳的干扰效应造成的。如果证实与疲劳无关，这些生理数据可用作评估用户时间体验的参考点。这可能对虚拟环境的设计具有重大意义，因为有目的的刺激会强烈影响任务执行和时间体验，而这两者都是虚拟环境用户体验的重要组成部分。

{"title":"Psychophysiology of rhythmic stimuli and time experience in virtual reality","authors":"Stéven Picard, Jean Botev","doi":"10.1016/j.cag.2024.104097","DOIUrl":"10.1016/j.cag.2024.104097","url":null,"abstract":"<div><div>Time experience is an essential part of one’s perception of any environment, real or virtual. In this article, from a virtual environment design perspective, we explore how rhythmic stimuli can influence an unrelated cognitive task regarding time experience and performance in virtual reality. This study explicitly includes physiological data to investigate how, overall, experience correlates with psychophysiological observations. The task involves sorting 3D objects by shape, with varying rhythmic stimuli in terms of their tempo and sensory channel (auditory and/or visual) in different trials, to collect subjective measures of time estimation and judgment. The results indicate different effects on time experience and performance depending on the context, such as user fatigue and trial repetition. Depending on the context, a positive impact of audio stimuli or a negative impact of visual stimuli on task performance can be observed, as well as time being underestimated concerning tempo in relation to task familiarity. However, some effects are consistent regardless of context, such as time being judged to pass faster with additional stimuli or consistent correlations between participants’ performance and time experience, suggesting flow-related aspects. We also observe correlations between time experience with eye-tracking data and body temperature, yet some of these correlations may be due to a confounding effect of fatigue. If confirmed as separate from fatigue, these physiological data could be used as reference point for evaluating a user’s time experience. This might be of great interest for designing virtual environments, as purposeful stimuli can strongly influence task performance and time experience, both essential components of virtual environment user experience.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104097"},"PeriodicalIF":2.5,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing MeshNet for 3D shape classification with focal and regularization losses 利用焦点和正则化损失增强 MeshNet 的三维形状分类能力

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-25 DOI: 10.1016/j.cag.2024.104094

Meng Liu, Feiyu Zhao

With the development of deep learning and computer vision, an increasing amount of research has focused on applying deep learning models to the recognition and classification of three-dimensional shapes. In classification tasks, differences in sample quantity, feature amount, model complexity, and other aspects among different categories of 3D model data cause significant variations in classification difficulty. However, simple cross-entropy loss is generally used as the loss function, but it is insufficient to address these differences. In this paper, we used MeshNet as the base model and introduced focal loss as a metric for the loss function. Additionally, to prevent deep learning models from developing a preference for specific categories, we incorporated regularization loss. The combined use of focal loss and regularization loss in optimizing the MeshNet model’s loss function resulted in a classification accuracy of up to 92.46%, representing a 0.20% improvement over the original model’s highest accuracy of 92.26%. Furthermore, the average accuracy over the final 50 epochs remained stable at a higher level of 92.01%, reflecting a 0.71% improvement compared to the original MeshNet model’s 91.30%. These results indicate that our method performs better in 3D shape classification task.

随着深度学习和计算机视觉的发展，越来越多的研究集中于将深度学习模型应用于三维形状的识别和分类。在分类任务中，不同类别的三维模型数据在样本数量、特征数量、模型复杂度等方面的差异会导致分类难度的显著不同。然而，一般采用简单的交叉熵损失作为损失函数，但不足以解决这些差异。本文以 MeshNet 为基础模型，引入焦点损失作为损失函数的度量。此外，为了防止深度学习模型对特定类别产生偏好，我们还加入了正则化损失。在优化 MeshNet 模型的损失函数时，综合使用了焦点损失和正则化损失，结果分类准确率高达 92.46%，比原始模型的最高准确率 92.26% 提高了 0.20%。此外，最后 50 个历时的平均准确率稳定在 92.01% 的较高水平，与原始 MeshNet 模型的 91.30% 相比提高了 0.71%。这些结果表明，我们的方法在三维形状分类任务中表现更好。

{"title":"Enhancing MeshNet for 3D shape classification with focal and regularization losses","authors":"Meng Liu, Feiyu Zhao","doi":"10.1016/j.cag.2024.104094","DOIUrl":"10.1016/j.cag.2024.104094","url":null,"abstract":"<div><div>With the development of deep learning and computer vision, an increasing amount of research has focused on applying deep learning models to the recognition and classification of three-dimensional shapes. In classification tasks, differences in sample quantity, feature amount, model complexity, and other aspects among different categories of 3D model data cause significant variations in classification difficulty. However, simple cross-entropy loss is generally used as the loss function, but it is insufficient to address these differences. In this paper, we used MeshNet as the base model and introduced focal loss as a metric for the loss function. Additionally, to prevent deep learning models from developing a preference for specific categories, we incorporated regularization loss. The combined use of focal loss and regularization loss in optimizing the MeshNet model’s loss function resulted in a classification accuracy of up to 92.46%, representing a 0.20% improvement over the original model’s highest accuracy of 92.26%. Furthermore, the average accuracy over the final 50 epochs remained stable at a higher level of 92.01%, reflecting a 0.71% improvement compared to the original MeshNet model’s 91.30%. These results indicate that our method performs better in 3D shape classification task.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104094"},"PeriodicalIF":2.5,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ChatKG: Visualizing time-series patterns aided by intelligent agents and a knowledge graph ChatKG：在智能代理和知识图谱的帮助下可视化时间序列模式

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-09-24 DOI: 10.1016/j.cag.2024.104092

Leonardo Christino , Fernando V. Paulovich

Line-chart visualizations of temporal data enable users to identify interesting patterns for the user to inquire about. Using Intelligent Agents (IA), Visual Analytic tools can automatically uncover explicit knowledge related information to said patterns. Yet, visualizing the association of data, patterns, and knowledge is not straightforward. In this paper, we present ChatKG, a novel visual analytics strategy that allows exploratory data analysis of a Knowledge Graph that associates temporal sequences, the patterns found in each sequence, the temporal overlap between patterns, the related knowledge of each given pattern gathered from a multi-agent IA, and the IA’s suggestions of related datasets for further analysis visualized as annotations. We exemplify and informally evaluate ChatKG by analyzing the world’s life expectancy. For this, we implement an oracle that automatically extracts relevant or interesting patterns, populates the Knowledge Graph to be visualized, and, during user interaction, inquires the multi-agent IA for related information and suggests related datasets to be displayed as visual annotations. Our tests and an interview conducted showed that ChatKG is well suited for temporal analysis of temporal patterns and their related knowledge when applied to history studies.

时间数据的线图可视化使用户能够识别有趣的模式，供用户查询。利用智能代理（IA），可视化分析工具可以自动发现与上述模式相关的显性知识信息。然而，将数据、模式和知识的关联可视化并不简单。在本文中，我们介绍了一种新颖的可视化分析策略 ChatKG，它允许对知识图谱（Knowledge Graph）进行探索性数据分析，该图谱关联了时间序列、每个序列中发现的模式、模式之间的时间重叠、从多智能代理（IA）那里收集到的每个给定模式的相关知识，以及智能代理为进一步分析而提出的相关数据集建议（可视化为注释）。我们通过分析世界人口的预期寿命对 ChatKG 进行了示范和非正式评估。为此，我们实现了一个甲骨文，它能自动提取相关或有趣的模式，填充要可视化的知识图谱，并在用户交互过程中，向多代理执行机构询问相关信息，并建议将相关数据集显示为可视化注释。我们的测试和访谈表明，ChatKG 非常适合用于历史研究中的时间模式及其相关知识的时间分析。

{"title":"ChatKG: Visualizing time-series patterns aided by intelligent agents and a knowledge graph","authors":"Leonardo Christino , Fernando V. Paulovich","doi":"10.1016/j.cag.2024.104092","DOIUrl":"10.1016/j.cag.2024.104092","url":null,"abstract":"<div><div>Line-chart visualizations of temporal data enable users to identify interesting patterns for the user to inquire about. Using Intelligent Agents (IA), Visual Analytic tools can automatically uncover <em>explicit knowledge</em> related information to said patterns. Yet, visualizing the association of data, patterns, and knowledge is not straightforward. In this paper, we present <em>ChatKG</em>, a novel visual analytics strategy that allows exploratory data analysis of a Knowledge Graph that associates temporal sequences, the patterns found in each sequence, the temporal overlap between patterns, the related knowledge of each given pattern gathered from a multi-agent IA, and the IA’s suggestions of related datasets for further analysis visualized as annotations. We exemplify and informally evaluate ChatKG by analyzing the world’s life expectancy. For this, we implement an oracle that automatically extracts relevant or interesting patterns, populates the Knowledge Graph to be visualized, and, during user interaction, inquires the multi-agent IA for related information and suggests related datasets to be displayed as visual annotations. Our tests and an interview conducted showed that ChatKG is well suited for temporal analysis of temporal patterns and their related knowledge when applied to history studies.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104092"},"PeriodicalIF":2.5,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142357034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0