IEEE Transactions on Visualization and Computer Graphics最新文献_第3页

Neural Projection Mapping Using Reflectance Fields 利用反射场的神经投影映射

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-06-11 DOI: 10.48550/arXiv.2306.06595

Yotam Erel, D. Iwai, Amit H. Bermano

We introduce a high resolution spatially adaptive light source, or a projector, into a neural reflectance field that allows to both calibrate the projector and photo realistic light editing. The projected texture is fully differentiable with respect to all scene parameters, and can be optimized to yield a desired appearance suitable for applications in augmented reality and projection mapping. Our neural field consists of three neural networks, estimating geometry, material, and transmittance. Using an analytical BRDF model and carefully selected projection patterns, our acquisition process is simple and intuitive, featuring a fixed uncalibrated projected and a handheld camera with a co-located light source. As we demonstrate, the virtual projector incorporated into the pipeline improves scene understanding and enables various projection mapping applications, alleviating the need for time consuming calibration steps performed in a traditional setting per view or projector location. In addition to enabling novel viewpoint synthesis, we demonstrate state-of-the-art performance projector compensation for novel viewpoints, improvement over the baselines in material and scene reconstruction, and three simply implemented scenarios where projection image optimization is performed, including the use of a 2D generative model to consistently dictate scene appearance from multiple viewpoints. We believe that neural projection mapping opens up the door to novel and exciting downstream tasks, through the joint optimization of the scene and projection images.

我们将高分辨率空间自适应光源或投影仪引入神经反射场，既可以校准投影仪，也可以进行逼真的光编辑。投影纹理相对于所有场景参数是完全可微分的，并且可以被优化以产生适合于增强现实和投影映射中的应用的期望外观。我们的神经领域由三个神经网络组成，估计几何形状、材料和透射率。使用分析的BRDF模型和精心选择的投影模式，我们的采集过程简单直观，具有固定的未校准投影和带同一光源的手持相机。正如我们所展示的，集成到管道中的虚拟投影仪提高了场景理解，并支持各种投影映射应用程序，从而减少了在传统设置中按视图或投影仪位置执行耗时校准步骤的需要。除了实现新的视点合成外，我们还展示了最先进的性能投影仪对新视点的补偿、对材料和场景重建中基线的改进，以及执行投影图像优化的三个简单实现的场景，包括使用2D生成模型来一致地指示来自多个视点的场景外观。我们相信，通过对场景和投影图像的联合优化，神经投影映射为新颖而令人兴奋的下游任务打开了大门。

{"title":"Neural Projection Mapping Using Reflectance Fields","authors":"Yotam Erel, D. Iwai, Amit H. Bermano","doi":"10.48550/arXiv.2306.06595","DOIUrl":"https://doi.org/10.48550/arXiv.2306.06595","url":null,"abstract":"We introduce a high resolution spatially adaptive light source, or a projector, into a neural reflectance field that allows to both calibrate the projector and photo realistic light editing. The projected texture is fully differentiable with respect to all scene parameters, and can be optimized to yield a desired appearance suitable for applications in augmented reality and projection mapping. Our neural field consists of three neural networks, estimating geometry, material, and transmittance. Using an analytical BRDF model and carefully selected projection patterns, our acquisition process is simple and intuitive, featuring a fixed uncalibrated projected and a handheld camera with a co-located light source. As we demonstrate, the virtual projector incorporated into the pipeline improves scene understanding and enables various projection mapping applications, alleviating the need for time consuming calibration steps performed in a traditional setting per view or projector location. In addition to enabling novel viewpoint synthesis, we demonstrate state-of-the-art performance projector compensation for novel viewpoints, improvement over the baselines in material and scene reconstruction, and three simply implemented scenarios where projection image optimization is performed, including the use of a 2D generative model to consistently dictate scene appearance from multiple viewpoints. We believe that neural projection mapping opens up the door to novel and exciting downstream tasks, through the joint optimization of the scene and projection images.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43109371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeepTree: Modeling Trees with Situated Latents DeepTree:建模树与定位潜势

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-05-09 DOI: 10.48550/arXiv.2305.05153

Xiaochen Zhou, Bosheng Li, Bedrich Benes, S. Fei, S. Pirk

In this paper, we propose DeepTree, a novel method for modeling trees based on learning developmental rules for branching structures instead of manually defining them. We call our deep neural model "situated latent" because its behavior is determined by the intrinsic state -encoded as a latent space of a deep neural model- and by the extrinsic (environmental) data that is "situated" as the location in the 3D space and on the tree structure. We use a neural network pipeline to train a situated latent space that allows us to locally predict branch growth only based on a single node in the branch graph of a tree model. We use this representation to progressively develop new branch nodes, thereby mimicking the growth process of trees. Starting from a root node, a tree is generated by iteratively querying the neural network on the newly added nodes resulting in the branching structure of the whole tree. Our method enables generating a wide variety of tree shapes without the need to define intricate parameters that control their growth and behavior. Furthermore, we show that the situated latents can also be used to encode the environmental response of tree models, e.g., when trees grow next to obstacles. We validate the effectiveness of our method by measuring the similarity of our tree models and by procedurally generated ones based on a number of established metrics for tree form.

在本文中，我们提出了DeepTree，这是一种基于学习分支结构的发展规则而不是手动定义它们来建模树的新方法。我们称我们的深度神经模型为“定位潜”，因为它的行为是由内在状态(编码为深度神经模型的潜在空间)和外在(环境)数据(“定位”为3D空间和树结构中的位置)决定的。我们使用神经网络管道来训练一个定位的潜在空间，使我们能够仅基于树模型分支图中的单个节点局部预测分支生长。我们使用这种表示逐步发展新的分支节点，从而模仿树木的生长过程。从一个根节点开始，通过在新增加的节点上迭代查询神经网络生成一棵树，从而形成整个树的分支结构。我们的方法可以生成各种各样的树木形状，而不需要定义复杂的参数来控制它们的生长和行为。此外，我们表明，定位电位也可以用于编码树木模型的环境响应，例如，当树木生长在障碍物旁边时。我们通过测量我们的树模型的相似性和基于树形式的一些既定指标的程序生成的模型来验证我们方法的有效性。

{"title":"DeepTree: Modeling Trees with Situated Latents","authors":"Xiaochen Zhou, Bosheng Li, Bedrich Benes, S. Fei, S. Pirk","doi":"10.48550/arXiv.2305.05153","DOIUrl":"https://doi.org/10.48550/arXiv.2305.05153","url":null,"abstract":"In this paper, we propose DeepTree, a novel method for modeling trees based on learning developmental rules for branching structures instead of manually defining them. We call our deep neural model \"situated latent\" because its behavior is determined by the intrinsic state -encoded as a latent space of a deep neural model- and by the extrinsic (environmental) data that is \"situated\" as the location in the 3D space and on the tree structure. We use a neural network pipeline to train a situated latent space that allows us to locally predict branch growth only based on a single node in the branch graph of a tree model. We use this representation to progressively develop new branch nodes, thereby mimicking the growth process of trees. Starting from a root node, a tree is generated by iteratively querying the neural network on the newly added nodes resulting in the branching structure of the whole tree. Our method enables generating a wide variety of tree shapes without the need to define intricate parameters that control their growth and behavior. Furthermore, we show that the situated latents can also be used to encode the environmental response of tree models, e.g., when trees grow next to obstacles. We validate the effectiveness of our method by measuring the similarity of our tree models and by procedurally generated ones based on a number of established metrics for tree form.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2023-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47275380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Local-to-Global Panorama Inpainting for Locale-Aware Indoor Lighting Prediction 局部到全局全景图像绘制用于区域感知室内照明预测

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-03-18 DOI: 10.48550/arXiv.2303.10344

Jia-Xuan Bai, Zhen He, Shangxue Yang, Jie Guo, Zhenyu Chen, Y. Zhang, Yanwen Guo

Predicting panoramic indoor lighting from a single perspective image is a fundamental but highly ill-posed problem in computer vision and graphics. To achieve locale-aware and robust prediction, this problem can be decomposed into three sub-tasks: depth-based image warping, panorama inpainting and high-dynamic-range (HDR) reconstruction, among which the success of panorama inpainting plays a key role. Recent methods mostly rely on convolutional neural networks (CNNs) to fill the missing contents in the warped panorama. However, they usually achieve suboptimal performance since the missing contents occupy a very large portion in the panoramic space while CNNs are plagued by limited receptive fields. The spatially-varying distortion in the spherical signals further increases the difficulty for conventional CNNs. To address these issues, we propose a local-to-global strategy for large-scale panorama inpainting. In our method, a depth-guided local inpainting is first applied on the warped panorama to fill small but dense holes. Then, a transformer-based network, dubbed PanoTransformer, is designed to hallucinate reasonable global structures in the large holes. To avoid distortion, we further employ cubemap projection in our design of PanoTransformer. The high-quality panorama recovered at any locale helps us to capture spatially-varying indoor illumination with physically-plausible global structures and fine details.

从单视角图像预测室内全景照明是计算机视觉和图形学中一个基本但高度不适定的问题。为了实现区域感知和鲁棒预测，该问题可以分解为三个子任务：基于深度的图像扭曲、全景修复和高动态范围（HDR）重建，其中全景修复的成功起着关键作用。最近的方法主要依靠卷积神经网络（CNNs）来填补扭曲全景中缺失的内容。然而，它们通常实现次优性能，因为缺失的内容在全景空间中占据了很大一部分，而细胞神经网络受到有限感受野的困扰。球形信号中的空间变化失真进一步增加了传统细胞神经网络的难度。为了解决这些问题，我们提出了一种从局部到全局的大规模全景修复策略。在我们的方法中，首先在扭曲的全景图上应用深度引导的局部修复来填充小但密集的洞。然后，设计了一个基于变压器的网络，称为PanoTransformer，以在大洞中产生合理的全局结构。为了避免失真，我们在PanoTransformer的设计中进一步采用了立方体映射投影。在任何地点恢复的高质量全景都有助于我们捕捉空间变化的室内照明，具有物理上合理的全局结构和精细的细节。

{"title":"Local-to-Global Panorama Inpainting for Locale-Aware Indoor Lighting Prediction","authors":"Jia-Xuan Bai, Zhen He, Shangxue Yang, Jie Guo, Zhenyu Chen, Y. Zhang, Yanwen Guo","doi":"10.48550/arXiv.2303.10344","DOIUrl":"https://doi.org/10.48550/arXiv.2303.10344","url":null,"abstract":"Predicting panoramic indoor lighting from a single perspective image is a fundamental but highly ill-posed problem in computer vision and graphics. To achieve locale-aware and robust prediction, this problem can be decomposed into three sub-tasks: depth-based image warping, panorama inpainting and high-dynamic-range (HDR) reconstruction, among which the success of panorama inpainting plays a key role. Recent methods mostly rely on convolutional neural networks (CNNs) to fill the missing contents in the warped panorama. However, they usually achieve suboptimal performance since the missing contents occupy a very large portion in the panoramic space while CNNs are plagued by limited receptive fields. The spatially-varying distortion in the spherical signals further increases the difficulty for conventional CNNs. To address these issues, we propose a local-to-global strategy for large-scale panorama inpainting. In our method, a depth-guided local inpainting is first applied on the warped panorama to fill small but dense holes. Then, a transformer-based network, dubbed PanoTransformer, is designed to hallucinate reasonable global structures in the large holes. To avoid distortion, we further employ cubemap projection in our design of PanoTransformer. The high-quality panorama recovered at any locale helps us to capture spatially-varying indoor illumination with physically-plausible global structures and fine details.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2023-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44739180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Topological Distance between Multi-fields based on Multi-Dimensional Persistence Diagrams 基于多维持久化图的多域间拓扑距离

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-03-06 DOI: 10.48550/arXiv.2303.03038

Yashwanth Ramamurthi, A. Chattopadhyay

The problem of computing topological distance between two scalar fields based on Reeb graphs or contour trees has been studied and applied successfully to various problems in topological shape matching, data analysis, and visualization. However, generalizing such results for computing distance measures between two multi-fields based on their Reeb spaces is still in its infancy. Towards this, in the current paper we propose a technique to compute an effective distance measure between two multi-fields by computing a novel multi-dimensional persistence diagram (MDPD) corresponding to each of the (quantized) Reeb spaces. First, we construct a multi-dimensional Reeb graph (MDRG), which is a hierarchical decomposition of the Reeb space into a collection of Reeb graphs. The MDPD corresponding to each MDRG is then computed based on the persistence diagrams of the component Reeb graphs of the MDRG. Our distance measure extends the Wasserstein distance between two persistence diagrams of Reeb graphs to MDPDs of MDRGs. We prove that the proposed measure is a pseudo-metric and satisfies a stability property. Effectiveness of the proposed distance measure has been demonstrated in (i) shape retrieval contest data - SHREC 2010 and (ii) Pt-CO bond detection data from computational chemistry. Experimental results show that the proposed distance measure based on the Reeb spaces has more discriminating power in clustering the shapes and detecting the formation of a stable Pt-CO bond as compared to the similar measures between Reeb graphs.

基于Reeb图或等高线树计算两个标量场之间拓扑距离的问题已经被研究并成功地应用于拓扑形状匹配、数据分析和可视化中的各种问题。然而，将这些结果推广到基于Reeb空间计算两个多域之间的距离度量仍然处于起步阶段。为此，本文提出了一种通过计算对应于每个(量化)Reeb空间的新型多维持续图(MDPD)来计算两个多场之间有效距离度量的技术。首先，我们构造了一个多维Reeb图(MDRG)，它是将Reeb空间分层分解为Reeb图的集合。然后根据MDRG的组件Reeb图的持久性图计算每个MDRG对应的MDPD。我们的距离度量将Reeb图的两个持久性图之间的Wasserstein距离扩展到mdrg的mdpd。我们证明了所提出的测度是一个伪测度，并且满足稳定性。所提出的距离度量的有效性已在(i)形状检索竞赛数据(SHREC 2010)和(ii)计算化学的Pt-CO键检测数据中得到证明。实验结果表明，基于Reeb空间的距离测度比基于Reeb图的距离测度在聚类形状和检测稳定Pt-CO键形成方面具有更强的判别能力。

{"title":"A Topological Distance between Multi-fields based on Multi-Dimensional Persistence Diagrams","authors":"Yashwanth Ramamurthi, A. Chattopadhyay","doi":"10.48550/arXiv.2303.03038","DOIUrl":"https://doi.org/10.48550/arXiv.2303.03038","url":null,"abstract":"The problem of computing topological distance between two scalar fields based on Reeb graphs or contour trees has been studied and applied successfully to various problems in topological shape matching, data analysis, and visualization. However, generalizing such results for computing distance measures between two multi-fields based on their Reeb spaces is still in its infancy. Towards this, in the current paper we propose a technique to compute an effective distance measure between two multi-fields by computing a novel multi-dimensional persistence diagram (MDPD) corresponding to each of the (quantized) Reeb spaces. First, we construct a multi-dimensional Reeb graph (MDRG), which is a hierarchical decomposition of the Reeb space into a collection of Reeb graphs. The MDPD corresponding to each MDRG is then computed based on the persistence diagrams of the component Reeb graphs of the MDRG. Our distance measure extends the Wasserstein distance between two persistence diagrams of Reeb graphs to MDPDs of MDRGs. We prove that the proposed measure is a pseudo-metric and satisfies a stability property. Effectiveness of the proposed distance measure has been demonstrated in (i) shape retrieval contest data - SHREC 2010 and (ii) Pt-CO bond detection data from computational chemistry. Experimental results show that the proposed distance measure based on the Reeb spaces has more discriminating power in clustering the shapes and detecting the formation of a stable Pt-CO bond as compared to the similar measures between Reeb graphs.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45061673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IEEE VR 2023 Message from the Program Chairs and Guest Editors IEEE VR 2023项目主席和客座编辑的信息

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-03-01 DOI: 10.1109/tvcg.2021.3067835

Bobby Bodenheimer, V. Popescu, J. Quarles, Lili Wang

引用次数: 0

IntrinsicNGP: Intrinsic Coordinate based Hash Encoding for Human NeRF IntrinsicNGP:基于内在坐标的人类NeRF哈希编码

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-02-28 DOI: 10.48550/arXiv.2302.14683

Bo Peng, Jun Hu, Jingtao Zhou, Xuan Gao, Ju-yong Zhang

Recently, many works have been proposed to utilize the neural radiance field for novel view synthesis of human performers. However, most of these methods require hours of training, making them difficult for practical use. To address this challenging problem, we propose IntrinsicNGP, which can train from scratch and achieve high-fidelity results in few minutes with videos of a human performer. To achieve this target, we introduce a continuous and optimizable intrinsic coordinate rather than the original explicit Euclidean coordinate in the hash encoding module of instant-NGP. With this novel intrinsic coordinate, IntrinsicNGP can aggregate inter-frame information for dynamic objects with the help of proxy geometry shapes. Moreover, the results trained with the given rough geometry shapes can be further refined with an optimizable offset field based on the intrinsic coordinate. Extensive experimental results on several datasets demonstrate the effectiveness and efficiency of IntrinsicNGP. We also illustrate our approach's ability to edit the shape of reconstructed subjects.

近年来，人们提出了许多利用神经辐射场进行人类表演者新视角合成的研究。然而，这些方法大多需要数小时的训练，使它们难以实际使用。为了解决这个具有挑战性的问题，我们提出了IntrinsicNGP，它可以从零开始训练，并在几分钟内通过人类表演者的视频获得高保真的结果。为了实现这一目标，我们在instant-NGP的哈希编码模块中引入了一个连续的、可优化的内在坐标，而不是原来的显式欧几里德坐标。利用这种新颖的内在坐标，IntrinsicNGP可以借助代理几何形状聚合动态对象的帧间信息。此外，使用给定的粗糙几何形状训练的结果可以进一步细化基于内在坐标的可优化偏移场。在多个数据集上的大量实验结果证明了IntrinsicNGP的有效性和高效性。我们还说明了我们的方法编辑重建主题的形状的能力。

引用次数: 4

MoReVis: A Visual Summary for Spatiotemporal Moving Regions MoReVis：时空运动区域的可视化总结

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-02-26 DOI: 10.48550/arXiv.2302.13199

Giovani Valdrighi, Nivan Ferreira, Jorge Poco

Spatial and temporal interactions are central and fundamental in many activities in our world. A common problem faced when visualizing this type of data is how to provide an overview that helps users navigate efficiently. Traditional approaches use coordinated views or 3D metaphors like the Space-time cube to tackle this problem. However, they suffer from overplotting and often lack spatial context, hindering data exploration. More recent techniques, such as MotionRugs, propose compact temporal summaries based on 1D projection. While powerful, these techniques do not support the situation for which the spatial extent of the objects and their intersections is relevant, such as the analysis of surveillance videos or tracking weather storms. In this paper, we propose MoReVis, a visual overview of spatiotemporal data that considers the objects' spatial extent and strives to show spatial interactions among these objects by displaying spatial intersections. Like previous techniques, our method involves projecting the spatial coordinates to 1D to produce compact summaries. However, our solution's core consists of performing a layout optimization step that sets the size and positions of the visual marks on the summary to resemble the actual values on the original space. We also provide multiple interactive mechanisms to make interpreting the results more straightforward for the user. We perform an extensive experimental evaluation and usage scenarios. Moreover, we evaluated the usefulness of MoReVis in a study with 9 participants. The results point out the effectiveness and suitability of our method in representing different datasets compared to traditional techniques.

空间和时间的相互作用是我们世界上许多活动的中心和基础。可视化这类数据时面临的一个常见问题是如何提供帮助用户有效导航的概览。传统方法使用协调视图或三维隐喻(如时空立方体)来解决这个问题。然而，它们受到过度绘图的困扰，往往缺乏空间背景，阻碍了数据探索。最近的技术，如motionrug，提出了基于一维投影的紧凑时间摘要。虽然功能强大，但这些技术并不支持与物体的空间范围及其相交相关的情况，例如分析监控视频或跟踪天气风暴。在本文中，我们提出了MoReVis，这是一种时空数据的视觉概述，它考虑了物体的空间范围，并通过显示空间交叉点来努力显示这些物体之间的空间相互作用。与以前的技术一样，我们的方法涉及将空间坐标投影到1D以生成紧凑的摘要。然而，我们的解决方案的核心包括执行布局优化步骤，该步骤设置摘要上视觉标记的大小和位置，使其与原始空间上的实际值相似。我们还提供了多种交互机制，使用户能够更直接地解释结果。我们进行了广泛的实验评估和使用场景。此外，我们在一项有9名参与者的研究中评估了MoReVis的有效性。结果表明，与传统方法相比，我们的方法在表示不同数据集方面的有效性和适用性。

{"title":"MoReVis: A Visual Summary for Spatiotemporal Moving Regions","authors":"Giovani Valdrighi, Nivan Ferreira, Jorge Poco","doi":"10.48550/arXiv.2302.13199","DOIUrl":"https://doi.org/10.48550/arXiv.2302.13199","url":null,"abstract":"Spatial and temporal interactions are central and fundamental in many activities in our world. A common problem faced when visualizing this type of data is how to provide an overview that helps users navigate efficiently. Traditional approaches use coordinated views or 3D metaphors like the Space-time cube to tackle this problem. However, they suffer from overplotting and often lack spatial context, hindering data exploration. More recent techniques, such as MotionRugs, propose compact temporal summaries based on 1D projection. While powerful, these techniques do not support the situation for which the spatial extent of the objects and their intersections is relevant, such as the analysis of surveillance videos or tracking weather storms. In this paper, we propose MoReVis, a visual overview of spatiotemporal data that considers the objects' spatial extent and strives to show spatial interactions among these objects by displaying spatial intersections. Like previous techniques, our method involves projecting the spatial coordinates to 1D to produce compact summaries. However, our solution's core consists of performing a layout optimization step that sets the size and positions of the visual marks on the summary to resemble the actual values on the original space. We also provide multiple interactive mechanisms to make interpreting the results more straightforward for the user. We perform an extensive experimental evaluation and usage scenarios. Moreover, we evaluated the usefulness of MoReVis in a study with 9 participants. The results point out the effectiveness and suitability of our method in representing different datasets compared to traditional techniques.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2023-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45632088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LC-NeRF: Local Controllable Face Generation in Neural Randiance Field LC-NeRF:神经距离场的局部可控人脸生成

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-02-19 DOI: 10.48550/arXiv.2302.09486

Wen-Yang Zhou, Lu Yuan, Shu-Yu Chen, Lin Gao, Shimin Hu

3D face generation has achieved high visual quality and 3D consistency thanks to the development of neural radiance fields (NeRF). However, these methods model the whole face as a neural radiance field, which limits the controllability of the local regions. In other words, previous methods struggle to independently control local regions, such as the mouth, nose, and hair. To improve local controllability in NeRF-based face generation, we propose LC-NeRF, which is composed of a Local Region Generators Module (LRGM) and a Spatial-Aware Fusion Module (SAFM), allowing for geometry and texture control of local facial regions. The LRGM models different facial regions as independent neural radiance fields and the SAFM is responsible for merging multiple independent neural radiance fields into a complete representation. Finally, LC-NeRF enables the modification of the latent code associated with each individual generator, thereby allowing precise control over the corresponding local region. Qualitative and quantitative evaluations show that our method provides better local controllability than state-of-the-art 3D-aware face generation methods. A perception study reveals that our method outperforms existing state-of-the-art methods in terms of image quality, face consistency, and editing effects. Furthermore, our method exhibits favorable performance in downstream tasks, including real image editing and text-driven facial image editing.

由于神经辐射场(neural radiance fields, NeRF)的发展，3D人脸生成已经达到了高视觉质量和3D一致性。然而，这些方法将整个脸部建模为一个神经辐射场，这限制了局部区域的可控性。换句话说，以前的方法很难独立控制局部区域，如嘴、鼻子和头发。为了提高基于nerf的人脸生成的局部可控制性，我们提出了LC-NeRF，它由局部区域生成器模块(LRGM)和空间感知融合模块(SAFM)组成，允许局部面部区域的几何和纹理控制。LRGM将不同的面部区域建模为独立的神经辐射场，SAFM负责将多个独立的神经辐射场合并为一个完整的表示。最后，LC-NeRF允许修改与每个单独的生成器相关的潜在代码，从而允许对相应的局部区域进行精确控制。定性和定量评估表明，我们的方法比最先进的3d感知人脸生成方法提供了更好的局部可控性。一项感知研究表明，我们的方法在图像质量、面部一致性和编辑效果方面优于现有的最先进的方法。此外，我们的方法在下游任务中表现出良好的性能，包括真实图像编辑和文本驱动的面部图像编辑。

{"title":"LC-NeRF: Local Controllable Face Generation in Neural Randiance Field","authors":"Wen-Yang Zhou, Lu Yuan, Shu-Yu Chen, Lin Gao, Shimin Hu","doi":"10.48550/arXiv.2302.09486","DOIUrl":"https://doi.org/10.48550/arXiv.2302.09486","url":null,"abstract":"3D face generation has achieved high visual quality and 3D consistency thanks to the development of neural radiance fields (NeRF). However, these methods model the whole face as a neural radiance field, which limits the controllability of the local regions. In other words, previous methods struggle to independently control local regions, such as the mouth, nose, and hair. To improve local controllability in NeRF-based face generation, we propose LC-NeRF, which is composed of a Local Region Generators Module (LRGM) and a Spatial-Aware Fusion Module (SAFM), allowing for geometry and texture control of local facial regions. The LRGM models different facial regions as independent neural radiance fields and the SAFM is responsible for merging multiple independent neural radiance fields into a complete representation. Finally, LC-NeRF enables the modification of the latent code associated with each individual generator, thereby allowing precise control over the corresponding local region. Qualitative and quantitative evaluations show that our method provides better local controllability than state-of-the-art 3D-aware face generation methods. A perception study reveals that our method outperforms existing state-of-the-art methods in terms of image quality, face consistency, and editing effects. Furthermore, our method exhibits favorable performance in downstream tasks, including real image editing and text-driven facial image editing.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2023-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44963218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Audio2Gestures: Generating Diverse Gestures from Audio Audio2Gestures:从音频生成不同的手势

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-01-17 DOI: 10.48550/arXiv.2301.06690

Jing Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Linchao Bao, Zhenyu He

People may perform diverse gestures affected by various mental and physical factors when speaking the same sentences. This inherent one-to-many relationship makes co-speech gesture generation from audio particularly challenging. Conventional CNNs/RNNs assume one-to-one mapping, and thus tend to predict the average of all possible target motions, easily resulting in plain/boring motions during inference. So we propose to explicitly model the one-to-many audio-to-motion mapping by splitting the cross-modal latent code into shared code and motion-specific code. The shared code is expected to be responsible for the motion component that is more correlated to the audio while the motion-specific code is expected to capture diverse motion information that is more independent of the audio. However, splitting the latent code into two parts poses extra training difficulties. Several crucial training losses/strategies, including relaxed motion loss, bicycle constraint, and diversity loss, are designed to better train the VAE. Experiments on both 3D and 2D motion datasets verify that our method generates more realistic and diverse motions than previous state-of-the-art methods, quantitatively and qualitatively. Besides, our formulation is compatible with discrete cosine transformation (DCT) modeling and other popular backbones (i.e. RNN, Transformer). As for motion losses and quantitative motion evaluation, we find structured losses/metrics (e.g. STFT) that consider temporal and/or spatial context complement the most commonly used point-wise losses (e.g. PCK), resulting in better motion dynamics and more nuanced motion details. Finally, we demonstrate that our method can be readily used to generate motion sequences with user-specified motion clips on the timeline.

人们在说同一句话时，可能会受到各种心理和身体因素的影响，做出不同的手势。这种固有的一对多关系使得从音频生成共同语音手势特别具有挑战性。传统的CNN/RNN假设一对一映射，因此倾向于预测所有可能的目标运动的平均值，在推理过程中很容易导致平淡/无聊的运动。因此，我们建议通过将跨模态潜在码划分为共享码和运动特定码来显式地对一对多音频到运动映射进行建模。共享代码被期望负责与音频更相关的运动分量，而运动专用代码被期望捕获更独立于音频的不同运动信息。然而，将潜在代码分为两部分会带来额外的训练困难。几个关键的训练损失/策略，包括放松运动损失、自行车约束和多样性损失，旨在更好地训练VAE。在3D和2D运动数据集上的实验验证了我们的方法在数量和质量上都比以前最先进的方法产生了更真实和多样化的运动。此外，我们的公式与离散余弦变换（DCT）建模和其他流行的主干（即RNN、Transformer）兼容。至于运动损失和定量运动评估，我们发现考虑时间和/或空间上下文的结构化损失/度量（例如STFT）补充了最常用的逐点损失（例如PCK），从而产生更好的运动动力学和更细微的运动细节。最后，我们证明了我们的方法可以很容易地用于在时间线上生成具有用户指定的运动片段的运动序列。

{"title":"Audio2Gestures: Generating Diverse Gestures from Audio","authors":"Jing Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Linchao Bao, Zhenyu He","doi":"10.48550/arXiv.2301.06690","DOIUrl":"https://doi.org/10.48550/arXiv.2301.06690","url":null,"abstract":"People may perform diverse gestures affected by various mental and physical factors when speaking the same sentences. This inherent one-to-many relationship makes co-speech gesture generation from audio particularly challenging. Conventional CNNs/RNNs assume one-to-one mapping, and thus tend to predict the average of all possible target motions, easily resulting in plain/boring motions during inference. So we propose to explicitly model the one-to-many audio-to-motion mapping by splitting the cross-modal latent code into shared code and motion-specific code. The shared code is expected to be responsible for the motion component that is more correlated to the audio while the motion-specific code is expected to capture diverse motion information that is more independent of the audio. However, splitting the latent code into two parts poses extra training difficulties. Several crucial training losses/strategies, including relaxed motion loss, bicycle constraint, and diversity loss, are designed to better train the VAE. Experiments on both 3D and 2D motion datasets verify that our method generates more realistic and diverse motions than previous state-of-the-art methods, quantitatively and qualitatively. Besides, our formulation is compatible with discrete cosine transformation (DCT) modeling and other popular backbones (i.e. RNN, Transformer). As for motion losses and quantitative motion evaluation, we find structured losses/metrics (e.g. STFT) that consider temporal and/or spatial context complement the most commonly used point-wise losses (e.g. PCK), resulting in better motion dynamics and more nuanced motion details. Finally, we demonstrate that our method can be readily used to generate motion sequences with user-specified motion clips on the timeline.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42570159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

NeRF-Art: Text-Driven Neural Radiance Fields Stylization NeRF-Art:文本驱动的神经辐射领域的风格化

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2022-12-15 DOI: 10.48550/arXiv.2212.08070

Can Wang, Ruixia Jiang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

As a powerful representation of 3D scenes, the neural radiance field (NeRF) enables high-quality novel view synthesis from multi-view images. Stylizing NeRF, however, remains challenging, especially in simulating a text-guided style with both the appearance and the geometry altered simultaneously. In this paper, we present NeRF-Art, a text-guided NeRF stylization approach that manipulates the style of a pre-trained NeRF model with a simple text prompt. Unlike previous approaches that either lack sufficient geometry deformations and texture details or require meshes to guide the stylization, our method can shift a 3D scene to the target style characterized by desired geometry and appearance variations without any mesh guidance. This is achieved by introducing a novel global-local contrastive learning strategy, combined with the directional constraint to simultaneously control both the trajectory and the strength of the target style. Moreover, we adopt a weight regularization method to effectively suppress cloudy artifacts and geometry noises which arise easily when the density field is transformed during geometry stylization. Through extensive experiments on various styles, we demonstrate that our method is effective and robust regarding both single-view stylization quality and cross-view consistency. The code and more results can be found on our project page: https://cassiepython.github.io/nerfart/.

作为3D场景的强大表示，神经辐射场（NeRF）能够从多视图图像中合成高质量的新视图。然而，NeRF的样式化仍然具有挑战性，尤其是在模拟外观和几何图形同时更改的文本引导样式时。在本文中，我们介绍了NeRF Art，这是一种文本引导的NeRF风格化方法，通过简单的文本提示来操纵预先训练的NeRF模型的风格。与之前缺乏足够的几何变形和纹理细节或需要网格来指导风格化的方法不同，我们的方法可以在没有任何网格指导的情况下将3D场景转换为以所需几何和外观变化为特征的目标样式。这是通过引入一种新的全局-局部对比学习策略来实现的，该策略结合方向约束来同时控制目标风格的轨迹和强度。此外，我们采用了权重正则化方法来有效地抑制几何风格化过程中密度场变换时容易出现的模糊伪影和几何噪声。通过对各种风格的大量实验，我们证明了我们的方法在单视图风格化质量和跨视图一致性方面是有效和稳健的。代码和更多结果可以在我们的项目页面上找到：https://cassiepython.github.io/nerfart/.

{"title":"NeRF-Art: Text-Driven Neural Radiance Fields Stylization","authors":"Can Wang, Ruixia Jiang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao","doi":"10.48550/arXiv.2212.08070","DOIUrl":"https://doi.org/10.48550/arXiv.2212.08070","url":null,"abstract":"As a powerful representation of 3D scenes, the neural radiance field (NeRF) enables high-quality novel view synthesis from multi-view images. Stylizing NeRF, however, remains challenging, especially in simulating a text-guided style with both the appearance and the geometry altered simultaneously. In this paper, we present NeRF-Art, a text-guided NeRF stylization approach that manipulates the style of a pre-trained NeRF model with a simple text prompt. Unlike previous approaches that either lack sufficient geometry deformations and texture details or require meshes to guide the stylization, our method can shift a 3D scene to the target style characterized by desired geometry and appearance variations without any mesh guidance. This is achieved by introducing a novel global-local contrastive learning strategy, combined with the directional constraint to simultaneously control both the trajectory and the strength of the target style. Moreover, we adopt a weight regularization method to effectively suppress cloudy artifacts and geometry noises which arise easily when the density field is transformed during geometry stylization. Through extensive experiments on various styles, we demonstrate that our method is effective and robust regarding both single-view stylization quality and cross-view consistency. The code and more results can be found on our project page: https://cassiepython.github.io/nerfart/.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44448914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34