IEEE Transactions on Visualization and Computer Graphics最新文献_第4页

Local-to-Global Panorama Inpainting for Locale-Aware Indoor Lighting Prediction 局部到全局全景图像绘制用于区域感知室内照明预测

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-03-18 DOI: 10.48550/arXiv.2303.10344

Jia-Xuan Bai, Zhen He, Shangxue Yang, Jie Guo, Zhenyu Chen, Y. Zhang, Yanwen Guo

Predicting panoramic indoor lighting from a single perspective image is a fundamental but highly ill-posed problem in computer vision and graphics. To achieve locale-aware and robust prediction, this problem can be decomposed into three sub-tasks: depth-based image warping, panorama inpainting and high-dynamic-range (HDR) reconstruction, among which the success of panorama inpainting plays a key role. Recent methods mostly rely on convolutional neural networks (CNNs) to fill the missing contents in the warped panorama. However, they usually achieve suboptimal performance since the missing contents occupy a very large portion in the panoramic space while CNNs are plagued by limited receptive fields. The spatially-varying distortion in the spherical signals further increases the difficulty for conventional CNNs. To address these issues, we propose a local-to-global strategy for large-scale panorama inpainting. In our method, a depth-guided local inpainting is first applied on the warped panorama to fill small but dense holes. Then, a transformer-based network, dubbed PanoTransformer, is designed to hallucinate reasonable global structures in the large holes. To avoid distortion, we further employ cubemap projection in our design of PanoTransformer. The high-quality panorama recovered at any locale helps us to capture spatially-varying indoor illumination with physically-plausible global structures and fine details.

从单视角图像预测室内全景照明是计算机视觉和图形学中一个基本但高度不适定的问题。为了实现区域感知和鲁棒预测，该问题可以分解为三个子任务：基于深度的图像扭曲、全景修复和高动态范围（HDR）重建，其中全景修复的成功起着关键作用。最近的方法主要依靠卷积神经网络（CNNs）来填补扭曲全景中缺失的内容。然而，它们通常实现次优性能，因为缺失的内容在全景空间中占据了很大一部分，而细胞神经网络受到有限感受野的困扰。球形信号中的空间变化失真进一步增加了传统细胞神经网络的难度。为了解决这些问题，我们提出了一种从局部到全局的大规模全景修复策略。在我们的方法中，首先在扭曲的全景图上应用深度引导的局部修复来填充小但密集的洞。然后，设计了一个基于变压器的网络，称为PanoTransformer，以在大洞中产生合理的全局结构。为了避免失真，我们在PanoTransformer的设计中进一步采用了立方体映射投影。在任何地点恢复的高质量全景都有助于我们捕捉空间变化的室内照明，具有物理上合理的全局结构和精细的细节。

{"title":"Local-to-Global Panorama Inpainting for Locale-Aware Indoor Lighting Prediction","authors":"Jia-Xuan Bai, Zhen He, Shangxue Yang, Jie Guo, Zhenyu Chen, Y. Zhang, Yanwen Guo","doi":"10.48550/arXiv.2303.10344","DOIUrl":"https://doi.org/10.48550/arXiv.2303.10344","url":null,"abstract":"Predicting panoramic indoor lighting from a single perspective image is a fundamental but highly ill-posed problem in computer vision and graphics. To achieve locale-aware and robust prediction, this problem can be decomposed into three sub-tasks: depth-based image warping, panorama inpainting and high-dynamic-range (HDR) reconstruction, among which the success of panorama inpainting plays a key role. Recent methods mostly rely on convolutional neural networks (CNNs) to fill the missing contents in the warped panorama. However, they usually achieve suboptimal performance since the missing contents occupy a very large portion in the panoramic space while CNNs are plagued by limited receptive fields. The spatially-varying distortion in the spherical signals further increases the difficulty for conventional CNNs. To address these issues, we propose a local-to-global strategy for large-scale panorama inpainting. In our method, a depth-guided local inpainting is first applied on the warped panorama to fill small but dense holes. Then, a transformer-based network, dubbed PanoTransformer, is designed to hallucinate reasonable global structures in the large holes. To avoid distortion, we further employ cubemap projection in our design of PanoTransformer. The high-quality panorama recovered at any locale helps us to capture spatially-varying indoor illumination with physically-plausible global structures and fine details.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2023-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44739180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Topological Distance between Multi-fields based on Multi-Dimensional Persistence Diagrams 基于多维持久化图的多域间拓扑距离

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-03-06 DOI: 10.48550/arXiv.2303.03038

Yashwanth Ramamurthi, A. Chattopadhyay

The problem of computing topological distance between two scalar fields based on Reeb graphs or contour trees has been studied and applied successfully to various problems in topological shape matching, data analysis, and visualization. However, generalizing such results for computing distance measures between two multi-fields based on their Reeb spaces is still in its infancy. Towards this, in the current paper we propose a technique to compute an effective distance measure between two multi-fields by computing a novel multi-dimensional persistence diagram (MDPD) corresponding to each of the (quantized) Reeb spaces. First, we construct a multi-dimensional Reeb graph (MDRG), which is a hierarchical decomposition of the Reeb space into a collection of Reeb graphs. The MDPD corresponding to each MDRG is then computed based on the persistence diagrams of the component Reeb graphs of the MDRG. Our distance measure extends the Wasserstein distance between two persistence diagrams of Reeb graphs to MDPDs of MDRGs. We prove that the proposed measure is a pseudo-metric and satisfies a stability property. Effectiveness of the proposed distance measure has been demonstrated in (i) shape retrieval contest data - SHREC 2010 and (ii) Pt-CO bond detection data from computational chemistry. Experimental results show that the proposed distance measure based on the Reeb spaces has more discriminating power in clustering the shapes and detecting the formation of a stable Pt-CO bond as compared to the similar measures between Reeb graphs.

基于Reeb图或等高线树计算两个标量场之间拓扑距离的问题已经被研究并成功地应用于拓扑形状匹配、数据分析和可视化中的各种问题。然而，将这些结果推广到基于Reeb空间计算两个多域之间的距离度量仍然处于起步阶段。为此，本文提出了一种通过计算对应于每个(量化)Reeb空间的新型多维持续图(MDPD)来计算两个多场之间有效距离度量的技术。首先，我们构造了一个多维Reeb图(MDRG)，它是将Reeb空间分层分解为Reeb图的集合。然后根据MDRG的组件Reeb图的持久性图计算每个MDRG对应的MDPD。我们的距离度量将Reeb图的两个持久性图之间的Wasserstein距离扩展到mdrg的mdpd。我们证明了所提出的测度是一个伪测度，并且满足稳定性。所提出的距离度量的有效性已在(i)形状检索竞赛数据(SHREC 2010)和(ii)计算化学的Pt-CO键检测数据中得到证明。实验结果表明，基于Reeb空间的距离测度比基于Reeb图的距离测度在聚类形状和检测稳定Pt-CO键形成方面具有更强的判别能力。

{"title":"A Topological Distance between Multi-fields based on Multi-Dimensional Persistence Diagrams","authors":"Yashwanth Ramamurthi, A. Chattopadhyay","doi":"10.48550/arXiv.2303.03038","DOIUrl":"https://doi.org/10.48550/arXiv.2303.03038","url":null,"abstract":"The problem of computing topological distance between two scalar fields based on Reeb graphs or contour trees has been studied and applied successfully to various problems in topological shape matching, data analysis, and visualization. However, generalizing such results for computing distance measures between two multi-fields based on their Reeb spaces is still in its infancy. Towards this, in the current paper we propose a technique to compute an effective distance measure between two multi-fields by computing a novel multi-dimensional persistence diagram (MDPD) corresponding to each of the (quantized) Reeb spaces. First, we construct a multi-dimensional Reeb graph (MDRG), which is a hierarchical decomposition of the Reeb space into a collection of Reeb graphs. The MDPD corresponding to each MDRG is then computed based on the persistence diagrams of the component Reeb graphs of the MDRG. Our distance measure extends the Wasserstein distance between two persistence diagrams of Reeb graphs to MDPDs of MDRGs. We prove that the proposed measure is a pseudo-metric and satisfies a stability property. Effectiveness of the proposed distance measure has been demonstrated in (i) shape retrieval contest data - SHREC 2010 and (ii) Pt-CO bond detection data from computational chemistry. Experimental results show that the proposed distance measure based on the Reeb spaces has more discriminating power in clustering the shapes and detecting the formation of a stable Pt-CO bond as compared to the similar measures between Reeb graphs.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45061673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IEEE VR 2023 Message from the Program Chairs and Guest Editors IEEE VR 2023项目主席和客座编辑的信息

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-03-01 DOI: 10.1109/tvcg.2021.3067835

Bobby Bodenheimer, V. Popescu, J. Quarles, Lili Wang

引用次数: 0

IntrinsicNGP: Intrinsic Coordinate based Hash Encoding for Human NeRF IntrinsicNGP:基于内在坐标的人类NeRF哈希编码

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-02-28 DOI: 10.48550/arXiv.2302.14683

Bo Peng, Jun Hu, Jingtao Zhou, Xuan Gao, Ju-yong Zhang

Recently, many works have been proposed to utilize the neural radiance field for novel view synthesis of human performers. However, most of these methods require hours of training, making them difficult for practical use. To address this challenging problem, we propose IntrinsicNGP, which can train from scratch and achieve high-fidelity results in few minutes with videos of a human performer. To achieve this target, we introduce a continuous and optimizable intrinsic coordinate rather than the original explicit Euclidean coordinate in the hash encoding module of instant-NGP. With this novel intrinsic coordinate, IntrinsicNGP can aggregate inter-frame information for dynamic objects with the help of proxy geometry shapes. Moreover, the results trained with the given rough geometry shapes can be further refined with an optimizable offset field based on the intrinsic coordinate. Extensive experimental results on several datasets demonstrate the effectiveness and efficiency of IntrinsicNGP. We also illustrate our approach's ability to edit the shape of reconstructed subjects.

近年来，人们提出了许多利用神经辐射场进行人类表演者新视角合成的研究。然而，这些方法大多需要数小时的训练，使它们难以实际使用。为了解决这个具有挑战性的问题，我们提出了IntrinsicNGP，它可以从零开始训练，并在几分钟内通过人类表演者的视频获得高保真的结果。为了实现这一目标，我们在instant-NGP的哈希编码模块中引入了一个连续的、可优化的内在坐标，而不是原来的显式欧几里德坐标。利用这种新颖的内在坐标，IntrinsicNGP可以借助代理几何形状聚合动态对象的帧间信息。此外，使用给定的粗糙几何形状训练的结果可以进一步细化基于内在坐标的可优化偏移场。在多个数据集上的大量实验结果证明了IntrinsicNGP的有效性和高效性。我们还说明了我们的方法编辑重建主题的形状的能力。

引用次数: 4

MoReVis: A Visual Summary for Spatiotemporal Moving Regions MoReVis：时空运动区域的可视化总结

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-02-26 DOI: 10.48550/arXiv.2302.13199

Giovani Valdrighi, Nivan Ferreira, Jorge Poco

Spatial and temporal interactions are central and fundamental in many activities in our world. A common problem faced when visualizing this type of data is how to provide an overview that helps users navigate efficiently. Traditional approaches use coordinated views or 3D metaphors like the Space-time cube to tackle this problem. However, they suffer from overplotting and often lack spatial context, hindering data exploration. More recent techniques, such as MotionRugs, propose compact temporal summaries based on 1D projection. While powerful, these techniques do not support the situation for which the spatial extent of the objects and their intersections is relevant, such as the analysis of surveillance videos or tracking weather storms. In this paper, we propose MoReVis, a visual overview of spatiotemporal data that considers the objects' spatial extent and strives to show spatial interactions among these objects by displaying spatial intersections. Like previous techniques, our method involves projecting the spatial coordinates to 1D to produce compact summaries. However, our solution's core consists of performing a layout optimization step that sets the size and positions of the visual marks on the summary to resemble the actual values on the original space. We also provide multiple interactive mechanisms to make interpreting the results more straightforward for the user. We perform an extensive experimental evaluation and usage scenarios. Moreover, we evaluated the usefulness of MoReVis in a study with 9 participants. The results point out the effectiveness and suitability of our method in representing different datasets compared to traditional techniques.

空间和时间的相互作用是我们世界上许多活动的中心和基础。可视化这类数据时面临的一个常见问题是如何提供帮助用户有效导航的概览。传统方法使用协调视图或三维隐喻(如时空立方体)来解决这个问题。然而，它们受到过度绘图的困扰，往往缺乏空间背景，阻碍了数据探索。最近的技术，如motionrug，提出了基于一维投影的紧凑时间摘要。虽然功能强大，但这些技术并不支持与物体的空间范围及其相交相关的情况，例如分析监控视频或跟踪天气风暴。在本文中，我们提出了MoReVis，这是一种时空数据的视觉概述，它考虑了物体的空间范围，并通过显示空间交叉点来努力显示这些物体之间的空间相互作用。与以前的技术一样，我们的方法涉及将空间坐标投影到1D以生成紧凑的摘要。然而，我们的解决方案的核心包括执行布局优化步骤，该步骤设置摘要上视觉标记的大小和位置，使其与原始空间上的实际值相似。我们还提供了多种交互机制，使用户能够更直接地解释结果。我们进行了广泛的实验评估和使用场景。此外，我们在一项有9名参与者的研究中评估了MoReVis的有效性。结果表明，与传统方法相比，我们的方法在表示不同数据集方面的有效性和适用性。

{"title":"MoReVis: A Visual Summary for Spatiotemporal Moving Regions","authors":"Giovani Valdrighi, Nivan Ferreira, Jorge Poco","doi":"10.48550/arXiv.2302.13199","DOIUrl":"https://doi.org/10.48550/arXiv.2302.13199","url":null,"abstract":"Spatial and temporal interactions are central and fundamental in many activities in our world. A common problem faced when visualizing this type of data is how to provide an overview that helps users navigate efficiently. Traditional approaches use coordinated views or 3D metaphors like the Space-time cube to tackle this problem. However, they suffer from overplotting and often lack spatial context, hindering data exploration. More recent techniques, such as MotionRugs, propose compact temporal summaries based on 1D projection. While powerful, these techniques do not support the situation for which the spatial extent of the objects and their intersections is relevant, such as the analysis of surveillance videos or tracking weather storms. In this paper, we propose MoReVis, a visual overview of spatiotemporal data that considers the objects' spatial extent and strives to show spatial interactions among these objects by displaying spatial intersections. Like previous techniques, our method involves projecting the spatial coordinates to 1D to produce compact summaries. However, our solution's core consists of performing a layout optimization step that sets the size and positions of the visual marks on the summary to resemble the actual values on the original space. We also provide multiple interactive mechanisms to make interpreting the results more straightforward for the user. We perform an extensive experimental evaluation and usage scenarios. Moreover, we evaluated the usefulness of MoReVis in a study with 9 participants. The results point out the effectiveness and suitability of our method in representing different datasets compared to traditional techniques.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2023-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45632088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LC-NeRF: Local Controllable Face Generation in Neural Randiance Field LC-NeRF:神经距离场的局部可控人脸生成

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-02-19 DOI: 10.48550/arXiv.2302.09486

Wen-Yang Zhou, Lu Yuan, Shu-Yu Chen, Lin Gao, Shimin Hu

3D face generation has achieved high visual quality and 3D consistency thanks to the development of neural radiance fields (NeRF). However, these methods model the whole face as a neural radiance field, which limits the controllability of the local regions. In other words, previous methods struggle to independently control local regions, such as the mouth, nose, and hair. To improve local controllability in NeRF-based face generation, we propose LC-NeRF, which is composed of a Local Region Generators Module (LRGM) and a Spatial-Aware Fusion Module (SAFM), allowing for geometry and texture control of local facial regions. The LRGM models different facial regions as independent neural radiance fields and the SAFM is responsible for merging multiple independent neural radiance fields into a complete representation. Finally, LC-NeRF enables the modification of the latent code associated with each individual generator, thereby allowing precise control over the corresponding local region. Qualitative and quantitative evaluations show that our method provides better local controllability than state-of-the-art 3D-aware face generation methods. A perception study reveals that our method outperforms existing state-of-the-art methods in terms of image quality, face consistency, and editing effects. Furthermore, our method exhibits favorable performance in downstream tasks, including real image editing and text-driven facial image editing.

由于神经辐射场(neural radiance fields, NeRF)的发展，3D人脸生成已经达到了高视觉质量和3D一致性。然而，这些方法将整个脸部建模为一个神经辐射场，这限制了局部区域的可控性。换句话说，以前的方法很难独立控制局部区域，如嘴、鼻子和头发。为了提高基于nerf的人脸生成的局部可控制性，我们提出了LC-NeRF，它由局部区域生成器模块(LRGM)和空间感知融合模块(SAFM)组成，允许局部面部区域的几何和纹理控制。LRGM将不同的面部区域建模为独立的神经辐射场，SAFM负责将多个独立的神经辐射场合并为一个完整的表示。最后，LC-NeRF允许修改与每个单独的生成器相关的潜在代码，从而允许对相应的局部区域进行精确控制。定性和定量评估表明，我们的方法比最先进的3d感知人脸生成方法提供了更好的局部可控性。一项感知研究表明，我们的方法在图像质量、面部一致性和编辑效果方面优于现有的最先进的方法。此外，我们的方法在下游任务中表现出良好的性能，包括真实图像编辑和文本驱动的面部图像编辑。

{"title":"LC-NeRF: Local Controllable Face Generation in Neural Randiance Field","authors":"Wen-Yang Zhou, Lu Yuan, Shu-Yu Chen, Lin Gao, Shimin Hu","doi":"10.48550/arXiv.2302.09486","DOIUrl":"https://doi.org/10.48550/arXiv.2302.09486","url":null,"abstract":"3D face generation has achieved high visual quality and 3D consistency thanks to the development of neural radiance fields (NeRF). However, these methods model the whole face as a neural radiance field, which limits the controllability of the local regions. In other words, previous methods struggle to independently control local regions, such as the mouth, nose, and hair. To improve local controllability in NeRF-based face generation, we propose LC-NeRF, which is composed of a Local Region Generators Module (LRGM) and a Spatial-Aware Fusion Module (SAFM), allowing for geometry and texture control of local facial regions. The LRGM models different facial regions as independent neural radiance fields and the SAFM is responsible for merging multiple independent neural radiance fields into a complete representation. Finally, LC-NeRF enables the modification of the latent code associated with each individual generator, thereby allowing precise control over the corresponding local region. Qualitative and quantitative evaluations show that our method provides better local controllability than state-of-the-art 3D-aware face generation methods. A perception study reveals that our method outperforms existing state-of-the-art methods in terms of image quality, face consistency, and editing effects. Furthermore, our method exhibits favorable performance in downstream tasks, including real image editing and text-driven facial image editing.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2023-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44963218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Audio2Gestures: Generating Diverse Gestures from Audio Audio2Gestures:从音频生成不同的手势

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2023-01-17 DOI: 10.48550/arXiv.2301.06690

Jing Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Linchao Bao, Zhenyu He

People may perform diverse gestures affected by various mental and physical factors when speaking the same sentences. This inherent one-to-many relationship makes co-speech gesture generation from audio particularly challenging. Conventional CNNs/RNNs assume one-to-one mapping, and thus tend to predict the average of all possible target motions, easily resulting in plain/boring motions during inference. So we propose to explicitly model the one-to-many audio-to-motion mapping by splitting the cross-modal latent code into shared code and motion-specific code. The shared code is expected to be responsible for the motion component that is more correlated to the audio while the motion-specific code is expected to capture diverse motion information that is more independent of the audio. However, splitting the latent code into two parts poses extra training difficulties. Several crucial training losses/strategies, including relaxed motion loss, bicycle constraint, and diversity loss, are designed to better train the VAE. Experiments on both 3D and 2D motion datasets verify that our method generates more realistic and diverse motions than previous state-of-the-art methods, quantitatively and qualitatively. Besides, our formulation is compatible with discrete cosine transformation (DCT) modeling and other popular backbones (i.e. RNN, Transformer). As for motion losses and quantitative motion evaluation, we find structured losses/metrics (e.g. STFT) that consider temporal and/or spatial context complement the most commonly used point-wise losses (e.g. PCK), resulting in better motion dynamics and more nuanced motion details. Finally, we demonstrate that our method can be readily used to generate motion sequences with user-specified motion clips on the timeline.

人们在说同一句话时，可能会受到各种心理和身体因素的影响，做出不同的手势。这种固有的一对多关系使得从音频生成共同语音手势特别具有挑战性。传统的CNN/RNN假设一对一映射，因此倾向于预测所有可能的目标运动的平均值，在推理过程中很容易导致平淡/无聊的运动。因此，我们建议通过将跨模态潜在码划分为共享码和运动特定码来显式地对一对多音频到运动映射进行建模。共享代码被期望负责与音频更相关的运动分量，而运动专用代码被期望捕获更独立于音频的不同运动信息。然而，将潜在代码分为两部分会带来额外的训练困难。几个关键的训练损失/策略，包括放松运动损失、自行车约束和多样性损失，旨在更好地训练VAE。在3D和2D运动数据集上的实验验证了我们的方法在数量和质量上都比以前最先进的方法产生了更真实和多样化的运动。此外，我们的公式与离散余弦变换（DCT）建模和其他流行的主干（即RNN、Transformer）兼容。至于运动损失和定量运动评估，我们发现考虑时间和/或空间上下文的结构化损失/度量（例如STFT）补充了最常用的逐点损失（例如PCK），从而产生更好的运动动力学和更细微的运动细节。最后，我们证明了我们的方法可以很容易地用于在时间线上生成具有用户指定的运动片段的运动序列。

{"title":"Audio2Gestures: Generating Diverse Gestures from Audio","authors":"Jing Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Linchao Bao, Zhenyu He","doi":"10.48550/arXiv.2301.06690","DOIUrl":"https://doi.org/10.48550/arXiv.2301.06690","url":null,"abstract":"People may perform diverse gestures affected by various mental and physical factors when speaking the same sentences. This inherent one-to-many relationship makes co-speech gesture generation from audio particularly challenging. Conventional CNNs/RNNs assume one-to-one mapping, and thus tend to predict the average of all possible target motions, easily resulting in plain/boring motions during inference. So we propose to explicitly model the one-to-many audio-to-motion mapping by splitting the cross-modal latent code into shared code and motion-specific code. The shared code is expected to be responsible for the motion component that is more correlated to the audio while the motion-specific code is expected to capture diverse motion information that is more independent of the audio. However, splitting the latent code into two parts poses extra training difficulties. Several crucial training losses/strategies, including relaxed motion loss, bicycle constraint, and diversity loss, are designed to better train the VAE. Experiments on both 3D and 2D motion datasets verify that our method generates more realistic and diverse motions than previous state-of-the-art methods, quantitatively and qualitatively. Besides, our formulation is compatible with discrete cosine transformation (DCT) modeling and other popular backbones (i.e. RNN, Transformer). As for motion losses and quantitative motion evaluation, we find structured losses/metrics (e.g. STFT) that consider temporal and/or spatial context complement the most commonly used point-wise losses (e.g. PCK), resulting in better motion dynamics and more nuanced motion details. Finally, we demonstrate that our method can be readily used to generate motion sequences with user-specified motion clips on the timeline.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2023-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42570159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

NeRF-Art: Text-Driven Neural Radiance Fields Stylization NeRF-Art:文本驱动的神经辐射领域的风格化

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2022-12-15 DOI: 10.48550/arXiv.2212.08070

Can Wang, Ruixia Jiang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao

As a powerful representation of 3D scenes, the neural radiance field (NeRF) enables high-quality novel view synthesis from multi-view images. Stylizing NeRF, however, remains challenging, especially in simulating a text-guided style with both the appearance and the geometry altered simultaneously. In this paper, we present NeRF-Art, a text-guided NeRF stylization approach that manipulates the style of a pre-trained NeRF model with a simple text prompt. Unlike previous approaches that either lack sufficient geometry deformations and texture details or require meshes to guide the stylization, our method can shift a 3D scene to the target style characterized by desired geometry and appearance variations without any mesh guidance. This is achieved by introducing a novel global-local contrastive learning strategy, combined with the directional constraint to simultaneously control both the trajectory and the strength of the target style. Moreover, we adopt a weight regularization method to effectively suppress cloudy artifacts and geometry noises which arise easily when the density field is transformed during geometry stylization. Through extensive experiments on various styles, we demonstrate that our method is effective and robust regarding both single-view stylization quality and cross-view consistency. The code and more results can be found on our project page: https://cassiepython.github.io/nerfart/.

作为3D场景的强大表示，神经辐射场（NeRF）能够从多视图图像中合成高质量的新视图。然而，NeRF的样式化仍然具有挑战性，尤其是在模拟外观和几何图形同时更改的文本引导样式时。在本文中，我们介绍了NeRF Art，这是一种文本引导的NeRF风格化方法，通过简单的文本提示来操纵预先训练的NeRF模型的风格。与之前缺乏足够的几何变形和纹理细节或需要网格来指导风格化的方法不同，我们的方法可以在没有任何网格指导的情况下将3D场景转换为以所需几何和外观变化为特征的目标样式。这是通过引入一种新的全局-局部对比学习策略来实现的，该策略结合方向约束来同时控制目标风格的轨迹和强度。此外，我们采用了权重正则化方法来有效地抑制几何风格化过程中密度场变换时容易出现的模糊伪影和几何噪声。通过对各种风格的大量实验，我们证明了我们的方法在单视图风格化质量和跨视图一致性方面是有效和稳健的。代码和更多结果可以在我们的项目页面上找到：https://cassiepython.github.io/nerfart/.

{"title":"NeRF-Art: Text-Driven Neural Radiance Fields Stylization","authors":"Can Wang, Ruixia Jiang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao","doi":"10.48550/arXiv.2212.08070","DOIUrl":"https://doi.org/10.48550/arXiv.2212.08070","url":null,"abstract":"As a powerful representation of 3D scenes, the neural radiance field (NeRF) enables high-quality novel view synthesis from multi-view images. Stylizing NeRF, however, remains challenging, especially in simulating a text-guided style with both the appearance and the geometry altered simultaneously. In this paper, we present NeRF-Art, a text-guided NeRF stylization approach that manipulates the style of a pre-trained NeRF model with a simple text prompt. Unlike previous approaches that either lack sufficient geometry deformations and texture details or require meshes to guide the stylization, our method can shift a 3D scene to the target style characterized by desired geometry and appearance variations without any mesh guidance. This is achieved by introducing a novel global-local contrastive learning strategy, combined with the directional constraint to simultaneously control both the trajectory and the strength of the target style. Moreover, we adopt a weight regularization method to effectively suppress cloudy artifacts and geometry noises which arise easily when the density field is transformed during geometry stylization. Through extensive experiments on various styles, we demonstrate that our method is effective and robust regarding both single-view stylization quality and cross-view consistency. The code and more results can be found on our project page: https://cassiepython.github.io/nerfart/.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44448914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives 智能网格生成的现状与展望

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2022-11-11 DOI: 10.48550/arXiv.2211.06009

Zezeng Li, Zebin Xu, Ying Li, X. Gu, Na Lei

Intelligent Mesh Generation (IMG) represents a novel and promising field of research, utilizing machine learning techniques to generate meshes. Despite its relative infancy, IMG has significantly broadened the adaptability and practicality of mesh generation techniques, delivering numerous breakthroughs and unveiling potential future pathways. However, a noticeable void exists in the contemporary literature concerning comprehensive surveys of IMG methods. This paper endeavors to fill this gap by providing a systematic and thorough survey of the current IMG landscape. With a focus on 113 preliminary IMG methods, we undertake a meticulous analysis from various angles, encompassing core algorithm techniques and their application scope, agent learning objectives, data types, targeted challenges, as well as advantages and limitations. We have curated and categorized the literature, proposing three unique taxonomies based on key techniques, output mesh unit elements, and relevant input data types. This paper also underscores several promising future research directions and challenges in IMG. To augment reader accessibility, a dedicated IMG project page is available at https://github.com/xzb030/IMG_Survey.

智能网格生成（IMG）是一个新颖而有前途的研究领域，利用机器学习技术生成网格。尽管IMG还处于起步阶段，但它显著拓宽了网格生成技术的适应性和实用性，带来了许多突破，并揭示了未来的潜在途径。然而，在当代文献中，关于IMG方法的全面调查存在着明显的空白。本文试图通过对当前IMG景观进行系统而彻底的调查来填补这一空白。我们重点研究了113种初步的IMG方法，从各个角度进行了细致的分析，包括核心算法技术及其应用范围、代理学习目标、数据类型、有针对性的挑战以及优势和局限性。我们对文献进行了整理和分类，根据关键技术、输出网格单元元素和相关输入数据类型提出了三种独特的分类法。本文还强调了IMG未来几个有前景的研究方向和挑战。为了增加读者的可访问性，IMG项目专用页面可在https://github.com/xzb030/IMG_Survey.

{"title":"What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives","authors":"Zezeng Li, Zebin Xu, Ying Li, X. Gu, Na Lei","doi":"10.48550/arXiv.2211.06009","DOIUrl":"https://doi.org/10.48550/arXiv.2211.06009","url":null,"abstract":"Intelligent Mesh Generation (IMG) represents a novel and promising field of research, utilizing machine learning techniques to generate meshes. Despite its relative infancy, IMG has significantly broadened the adaptability and practicality of mesh generation techniques, delivering numerous breakthroughs and unveiling potential future pathways. However, a noticeable void exists in the contemporary literature concerning comprehensive surveys of IMG methods. This paper endeavors to fill this gap by providing a systematic and thorough survey of the current IMG landscape. With a focus on 113 preliminary IMG methods, we undertake a meticulous analysis from various angles, encompassing core algorithm techniques and their application scope, agent learning objectives, data types, targeted challenges, as well as advantages and limitations. We have curated and categorized the literature, proposing three unique taxonomies based on key techniques, output mesh unit elements, and relevant input data types. This paper also underscores several promising future research directions and challenges in IMG. To augment reader accessibility, a dedicated IMG project page is available at https://github.com/xzb030/IMG_Survey.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41797284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

GPA-Net: No-Reference Point Cloud Quality Assessment with Multi-task Graph Convolutional Network GPA-Net：基于多任务图卷积网络的无参考点云质量评估

IF 5.2 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IEEE Transactions on Visualization and Computer Graphics

Pub Date : 2022-10-29 DOI: 10.48550/arXiv.2210.16478

Ziyu Shan, Qi Yang, Rui Ye, Yujie Zhang, Yi Xu, Xiaozhong Xu, Shan Liu

With the rapid development of 3D vision, point cloud has become an increasingly popular 3D visual media content. Due to the irregular structure, point cloud has posed novel challenges to the related research, such as compression, transmission, rendering and quality assessment. In these latest researches, point cloud quality assessment (PCQA) has attracted wide attention due to its significant role in guiding practical applications, especially in many cases where the reference point cloud is unavailable. However, current no-reference metrics which based on prevalent deep neural network have apparent disadvantages. For example, to adapt to the irregular structure of point cloud, they require preprocessing such as voxelization and projection that introduce extra distortions, and the applied grid-kernel networks, such as Convolutional Neural Networks, fail to extract effective distortion-related features. Besides, they rarely consider the various distortion patterns and the philosophy that PCQA should exhibit shift, scaling, and rotation invariance. In this paper, we propose a novel no-reference PCQA metric named the Graph convolutional PCQA network (GPA-Net). To extract effective features for PCQA, we propose a new graph convolution kernel, i.e., GPAConv, which attentively captures the perturbation of structure and texture. Then, we propose the multi-task framework consisting of one main task (quality regression) and two auxiliary tasks (distortion type and degree predictions). Finally, we propose a coordinate normalization module to stabilize the results of GPAConv under shift, scale and rotation transformations. Experimental results on two independent databases show that GPA-Net achieves the best performance compared to the state-of-the-art no-reference PCQA metrics, even better than some full-reference metrics in some cases. The code is available at: https://github.com/Slowhander/GPA-Net.git.

随着三维视觉的快速发展，点云已经成为越来越受欢迎的三维视觉媒体内容。由于点云结构的不规则性，对压缩、传输、渲染和质量评估等相关研究提出了新的挑战。在这些最新研究中，点云质量评估（PCQA）因其在指导实际应用方面的重要作用而受到广泛关注，尤其是在许多没有参考点云的情况下。然而，目前基于流行的深度神经网络的无参考度量存在明显的缺点。例如，为了适应点云的不规则结构，它们需要预处理，如引入额外失真的体素化和投影，而应用的网格核网络，如卷积神经网络，无法提取有效的失真相关特征。此外，他们很少考虑各种失真模式和PCQA应该表现出移位、缩放和旋转不变性的哲学。在本文中，我们提出了一种新的无参考PCQA度量，称为图卷积PCQA网络（GPA-Net）。为了提取PCQA的有效特征，我们提出了一种新的图卷积核，即GPAConv，它可以专注地捕捉结构和纹理的扰动。然后，我们提出了由一个主任务（质量回归）和两个辅助任务（失真类型和程度预测）组成的多任务框架。最后，我们提出了一个坐标归一化模块来稳定GPAConv在移位、缩放和旋转变换下的结果。在两个独立数据库上的实验结果表明，与最先进的无参考PCQA指标相比，GPA-Net实现了最佳性能，在某些情况下甚至优于一些完全参考指标。该代码位于：https://github.com/Slowhander/GPA-Net.git.

{"title":"GPA-Net: No-Reference Point Cloud Quality Assessment with Multi-task Graph Convolutional Network","authors":"Ziyu Shan, Qi Yang, Rui Ye, Yujie Zhang, Yi Xu, Xiaozhong Xu, Shan Liu","doi":"10.48550/arXiv.2210.16478","DOIUrl":"https://doi.org/10.48550/arXiv.2210.16478","url":null,"abstract":"With the rapid development of 3D vision, point cloud has become an increasingly popular 3D visual media content. Due to the irregular structure, point cloud has posed novel challenges to the related research, such as compression, transmission, rendering and quality assessment. In these latest researches, point cloud quality assessment (PCQA) has attracted wide attention due to its significant role in guiding practical applications, especially in many cases where the reference point cloud is unavailable. However, current no-reference metrics which based on prevalent deep neural network have apparent disadvantages. For example, to adapt to the irregular structure of point cloud, they require preprocessing such as voxelization and projection that introduce extra distortions, and the applied grid-kernel networks, such as Convolutional Neural Networks, fail to extract effective distortion-related features. Besides, they rarely consider the various distortion patterns and the philosophy that PCQA should exhibit shift, scaling, and rotation invariance. In this paper, we propose a novel no-reference PCQA metric named the Graph convolutional PCQA network (GPA-Net). To extract effective features for PCQA, we propose a new graph convolution kernel, i.e., GPAConv, which attentively captures the perturbation of structure and texture. Then, we propose the multi-task framework consisting of one main task (quality regression) and two auxiliary tasks (distortion type and degree predictions). Finally, we propose a coordinate normalization module to stabilize the results of GPAConv under shift, scale and rotation transformations. Experimental results on two independent databases show that GPA-Net achieves the best performance compared to the state-of-the-art no-reference PCQA metrics, even better than some full-reference metrics in some cases. The code is available at: https://github.com/Slowhander/GPA-Net.git.","PeriodicalId":13376,"journal":{"name":"IEEE Transactions on Visualization and Computer Graphics","volume":" ","pages":""},"PeriodicalIF":5.2,"publicationDate":"2022-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42738762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5