首页 > 最新文献

Computational Visual Media最新文献

英文 中文
Multi-scale hash encoding based neural geometry representation 基于神经几何表示的多尺度哈希编码
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-03-22 DOI: 10.1007/s41095-023-0340-x

Abstract

Recently, neural implicit function-based representation has attracted more and more attention, and has been widely used to represent surfaces using differentiable neural networks. However, surface reconstruction from point clouds or multi-view images using existing neural geometry representations still suffer from slow computation and poor accuracy. To alleviate these issues, we propose a multi-scale hash encoding-based neural geometry representation which effectively and efficiently represents the surface as a signed distance field. Our novel neural network structure carefully combines low-frequency Fourier position encoding with multi-scale hash encoding. The initialization of the geometry network and geometry features of the rendering module are accordingly redesigned. Our experiments demonstrate that the proposed representation is at least 10 times faster for reconstructing point clouds with millions of points. It also significantly improves speed and accuracy of multi-view reconstruction. Our code and models are available at https://github.com/Dengzhi-USTC/Neural-Geometry-Reconstruction.

摘要 近年来,基于神经隐函数的表示法越来越受到关注,并被广泛用于利用可微神经网络表示曲面。然而,使用现有的神经几何表示法从点云或多视角图像中重建曲面仍然存在计算速度慢、精度低的问题。为了缓解这些问题,我们提出了一种基于多尺度哈希编码的神经几何表示法,它能有效且高效地将曲面表示为有符号的距离场。我们新颖的神经网络结构将低频傅立叶位置编码与多尺度哈希编码巧妙地结合在一起。几何网络的初始化和渲染模块的几何特征也相应进行了重新设计。我们的实验证明,在重建数百万个点的点云时,所提出的表示方法至少快 10 倍。它还大大提高了多视角重建的速度和精度。我们的代码和模型可在 https://github.com/Dengzhi-USTC/Neural-Geometry-Reconstruction 上查阅。
{"title":"Multi-scale hash encoding based neural geometry representation","authors":"","doi":"10.1007/s41095-023-0340-x","DOIUrl":"https://doi.org/10.1007/s41095-023-0340-x","url":null,"abstract":"<h3>Abstract</h3> <p>Recently, neural implicit function-based representation has attracted more and more attention, and has been widely used to represent surfaces using differentiable neural networks. However, surface reconstruction from point clouds or multi-view images using existing neural geometry representations still suffer from slow computation and poor accuracy. To alleviate these issues, we propose a multi-scale hash encoding-based neural geometry representation which effectively and efficiently represents the surface as a signed distance field. Our novel neural network structure carefully combines low-frequency Fourier position encoding with multi-scale hash encoding. The initialization of the geometry network and geometry features of the rendering module are accordingly redesigned. Our experiments demonstrate that the proposed representation is at least 10 times faster for reconstructing point clouds with millions of points. It also significantly improves speed and accuracy of multi-view reconstruction. Our code and models are available at https://github.com/Dengzhi-USTC/Neural-Geometry-Reconstruction. <span> <span> <img alt=\"\" src=\"https://static-content.springer.com/image/MediaObjects/41095_2023_340_Fig1_HTML.jpg\"/> </span> </span></p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"8 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140204009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Erratum to: Dynamic ocean inverse modeling based on differentiable rendering 勘误:基于可变渲染的动态海洋逆建模
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-03-22 DOI: 10.1007/s41095-024-0398-z
Xueguang Xie, Yang Gao, Fei Hou, Aimin Hao, Hong Qin

The authors apologize for a hidden error in the article. It is that the images in Figs. 14(a) and 14(d) were mistakenly presented as left–right mirror images. The authors have flipped them to ensure that the figures now correspond correctly with others in the subfigures (b, c, e, f). The accurate version of Fig. 14 is provided as below.

作者对文章中隐藏的错误表示歉意。图 14(a)和 14(d)中的图像被误认为是左右镜像。作者已将其翻转,以确保现在的图与子图(b、c、e、f)中的其他图正确对应。图 14 的准确版本如下。
{"title":"Erratum to: Dynamic ocean inverse modeling based on differentiable rendering","authors":"Xueguang Xie, Yang Gao, Fei Hou, Aimin Hao, Hong Qin","doi":"10.1007/s41095-024-0398-z","DOIUrl":"https://doi.org/10.1007/s41095-024-0398-z","url":null,"abstract":"<p>The authors apologize for a hidden error in the article. It is that the images in Figs. 14(a) and 14(d) were mistakenly presented as left–right mirror images. The authors have flipped them to ensure that the figures now correspond correctly with others in the subfigures (b, c, e, f). The accurate version of Fig. 14 is provided as below.</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"20 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140204015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Delving into high-quality SVBRDF acquisition: A new setup and method 深入研究高质量 SVBRDF 采集:新的设置和方法
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-09 DOI: 10.1007/s41095-023-0352-6
Chuhua Xian, Jiaxin Li, Hao Wu, Zisen Lin, Guiqing Li

In this study, we present a new and innovative framework for acquiring high-quality SVBRDF maps. Our approach addresses the limitations of the current methods and proposes a new solution. The core of our method is a simple hardware setup consisting of a consumer-level camera, LED lights, and a carefully designed network that can accurately obtain the high-quality SVBRDF properties of a nearly planar object. By capturing a flexible number of images of an object, our network uses different subnetworks to train different property maps and employs appropriate loss functions for each of them. To further enhance the quality of the maps, we improved the network structure by adding a novel skip connection that connects the encoder and decoder with global features. Through extensive experimentation using both synthetic and real-world materials, our results demonstrate that our method outperforms previous methods and produces superior results. Furthermore, our proposed setup can also be used to acquire physically based rendering maps of special materials.

在本研究中,我们提出了一个用于获取高质量 SVBRDF 地图的创新框架。我们的方法解决了现有方法的局限性,并提出了新的解决方案。我们方法的核心是一个简单的硬件设置,由消费级相机、LED 灯和精心设计的网络组成,可以准确获取近似平面物体的高质量 SVBRDF 特性。通过捕捉物体的灵活图像数量,我们的网络使用不同的子网络来训练不同的属性图,并为每个属性图使用适当的损失函数。为了进一步提高属性图的质量,我们改进了网络结构,增加了一个新颖的跳过连接,用全局特征连接编码器和解码器。通过使用合成材料和真实材料进行大量实验,我们的结果表明,我们的方法优于之前的方法,并产生了卓越的效果。此外,我们提出的设置还可用于获取基于物理的特殊材料渲染图。
{"title":"Delving into high-quality SVBRDF acquisition: A new setup and method","authors":"Chuhua Xian, Jiaxin Li, Hao Wu, Zisen Lin, Guiqing Li","doi":"10.1007/s41095-023-0352-6","DOIUrl":"https://doi.org/10.1007/s41095-023-0352-6","url":null,"abstract":"<p>In this study, we present a new and innovative framework for acquiring high-quality SVBRDF maps. Our approach addresses the limitations of the current methods and proposes a new solution. The core of our method is a simple hardware setup consisting of a consumer-level camera, LED lights, and a carefully designed network that can accurately obtain the high-quality SVBRDF properties of a nearly planar object. By capturing a flexible number of images of an object, our network uses different subnetworks to train different property maps and employs appropriate loss functions for each of them. To further enhance the quality of the maps, we improved the network structure by adding a novel skip connection that connects the encoder and decoder with global features. Through extensive experimentation using both synthetic and real-world materials, our results demonstrate that our method outperforms previous methods and produces superior results. Furthermore, our proposed setup can also be used to acquire physically based rendering maps of special materials.</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"4 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network CF-DAN:基于交叉融合双注意网络的面部表情识别
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-08 DOI: 10.1007/s41095-023-0369-x

Abstract

Recently, facial-expression recognition (FER) has primarily focused on images in the wild, including factors such as face occlusion and image blurring, rather than laboratory images. Complex field environments have introduced new challenges to FER. To address these challenges, this study proposes a cross-fusion dual-attention network. The network comprises three parts: (1) a cross-fusion grouped dual-attention mechanism to refine local features and obtain global information; (2) a proposed C2 activation function construction method, which is a piecewise cubic polynomial with three degrees of freedom, requiring less computation with improved flexibility and recognition abilities, which can better address slow running speeds and neuron inactivation problems; and (3) a closed-loop operation between the self-attention distillation process and residual connections to suppress redundant information and improve the generalization ability of the model. The recognition accuracies on the RAF-DB, FERPlus, and AffectNet datasets were 92.78%, 92.02%, and 63.58%, respectively. Experiments show that this model can provide more effective solutions for FER tasks.

摘要 最近,面部表情识别(FER)主要侧重于野外图像,包括人脸遮挡和图像模糊等因素,而不是实验室图像。复杂的野外环境给 FER 带来了新的挑战。为了应对这些挑战,本研究提出了一种交叉融合双注意网络。该网络由三部分组成:(1) 交叉融合分组双注意机制,用于提炼局部特征并获取全局信息;(2) 提出的 C2 激活函数构造方法,即具有三个自由度的片断三次多项式,需要的计算量更少,灵活性和识别能力更强,能较好地解决运行速度慢和神经元失活的问题;(3) 自注意提炼过程与残余连接之间的闭环操作,用于抑制冗余信息,提高模型的泛化能力。在 RAF-DB、FERPlus 和 AffectNet 数据集上的识别准确率分别为 92.78%、92.02% 和 63.58%。实验表明,该模型能为 FER 任务提供更有效的解决方案。
{"title":"CF-DAN: Facial-expression recognition based on cross-fusion dual-attention network","authors":"","doi":"10.1007/s41095-023-0369-x","DOIUrl":"https://doi.org/10.1007/s41095-023-0369-x","url":null,"abstract":"<h3>Abstract</h3> <p>Recently, facial-expression recognition (FER) has primarily focused on images in the wild, including factors such as face occlusion and image blurring, rather than laboratory images. Complex field environments have introduced new challenges to FER. To address these challenges, this study proposes a cross-fusion dual-attention network. The network comprises three parts: (1) a cross-fusion grouped dual-attention mechanism to refine local features and obtain global information; (2) a proposed <em>C</em><sup>2</sup> activation function construction method, which is a piecewise cubic polynomial with three degrees of freedom, requiring less computation with improved flexibility and recognition abilities, which can better address slow running speeds and neuron inactivation problems; and (3) a closed-loop operation between the self-attention distillation process and residual connections to suppress redundant information and improve the generalization ability of the model. The recognition accuracies on the RAF-DB, FERPlus, and AffectNet datasets were 92.78%, 92.02%, and 63.58%, respectively. Experiments show that this model can provide more effective solutions for FER tasks. <span> <span> <img alt=\"\" src=\"https://static-content.springer.com/image/MediaObjects/41095_2023_369_Fig1_HTML.jpg\"/> </span> </span></p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"17 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139765951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-task learning and joint refinement between camera localization and object detection 多任务学习以及摄像机定位和物体检测之间的联合改进
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-08 DOI: 10.1007/s41095-022-0319-z
Junyi Wang, Yue Qi

Visual localization and object detection both play important roles in various tasks. In many indoor application scenarios where some detected objects have fixed positions, the two techniques work closely together. However, few researchers consider these two tasks simultaneously, because of a lack of datasets and the little attention paid to such environments. In this paper, we explore multi-task network design and joint refinement of detection and localization. To address the dataset problem, we construct a medium indoor scene of an aviation exhibition hall through a semi-automatic process. The dataset provides localization and detection information, and is publicly available at https://drive.google.com/drive/folders/1U28zkuN4_I0dbzkqyIAKlAl5k9oUK0jI?usp=sharing for benchmarking localization and object detection tasks. Targeting this dataset, we have designed a multi-task network, JLDNet, based on YOLO v3, that outputs a target point cloud and object bounding boxes. For dynamic environments, the detection branch also promotes the perception of dynamics. JLDNet includes image feature learning, point feature learning, feature fusion, detection construction, and point cloud regression. Moreover, object-level bundle adjustment is used to further improve localization and detection accuracy. To test JLDNet and compare it to other methods, we have conducted experiments on 7 static scenes, our constructed dataset, and the dynamic TUM RGB-D and Bonn datasets. Our results show state-of-the-art accuracy for both tasks, and the benefit of jointly working on both tasks is demonstrated.

视觉定位和物体检测在各种任务中都发挥着重要作用。在许多室内应用场景中,一些被检测物体的位置是固定的,因此这两种技术可以密切配合。然而,由于缺乏数据集和对此类环境的关注度不高,很少有研究人员同时考虑这两项任务。在本文中,我们探讨了多任务网络设计以及检测和定位的联合改进。为了解决数据集问题,我们通过半自动流程构建了一个航空展览馆的中等室内场景。该数据集提供了定位和检测信息,可通过 https://drive.google.com/drive/folders/1U28zkuN4_I0dbzkqyIAKlAl5k9oUK0jI?usp=sharing 公开获取,用于定位和物体检测任务的基准测试。针对该数据集,我们设计了基于 YOLO v3 的多任务网络 JLDNet,该网络可输出目标点云和物体边界框。对于动态环境,检测分支还能促进动态感知。JLDNet 包括图像特征学习、点特征学习、特征融合、检测构建和点云回归。此外,还使用了对象级束调整来进一步提高定位和检测精度。为了测试 JLDNet 并将其与其他方法进行比较,我们在 7 个静态场景、我们构建的数据集以及动态 TUM RGB-D 和波恩数据集上进行了实验。我们的结果表明,这两项任务都达到了最先进的精度,同时也证明了联合完成这两项任务的优势。
{"title":"Multi-task learning and joint refinement between camera localization and object detection","authors":"Junyi Wang, Yue Qi","doi":"10.1007/s41095-022-0319-z","DOIUrl":"https://doi.org/10.1007/s41095-022-0319-z","url":null,"abstract":"<p>Visual localization and object detection both play important roles in various tasks. In many indoor application scenarios where some detected objects have fixed positions, the two techniques work closely together. However, few researchers consider these two tasks simultaneously, because of a lack of datasets and the little attention paid to such environments. In this paper, we explore multi-task network design and joint refinement of detection and localization. To address the dataset problem, we construct a medium indoor scene of an aviation exhibition hall through a semi-automatic process. The dataset provides localization and detection information, and is publicly available at https://drive.google.com/drive/folders/1U28zkuN4_I0dbzkqyIAKlAl5k9oUK0jI?usp=sharing for benchmarking localization and object detection tasks. Targeting this dataset, we have designed a multi-task network, JLDNet, based on YOLO v3, that outputs a target point cloud and object bounding boxes. For dynamic environments, the detection branch also promotes the perception of dynamics. JLDNet includes image feature learning, point feature learning, feature fusion, detection construction, and point cloud regression. Moreover, object-level bundle adjustment is used to further improve localization and detection accuracy. To test JLDNet and compare it to other methods, we have conducted experiments on 7 static scenes, our constructed dataset, and the dynamic TUM RGB-D and Bonn datasets. Our results show state-of-the-art accuracy for both tasks, and the benefit of jointly working on both tasks is demonstrated.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"4 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139766066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DualSmoke: Sketch-based smoke illustration design with two-stage generative model DualSmoke:基于草图的烟雾插图设计与两阶段生成模型
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-08 DOI: 10.1007/s41095-022-0318-0
Haoran Xie, Keisuke Arihara, Syuhei Sato, Kazunori Miyata

The dynamic effects of smoke are impressive in illustration design, but it is a troublesome and challenging issue for inexpert users to design smoke effects without domain knowledge of fluid simulations. In this work, we propose DualSmoke, a two-stage global-to-local generation framework for interactive smoke illustration design. In the global stage, the proposed approach utilizes fluid patterns to generate Lagrangian coherent structures from the user’s hand-drawn sketches. In the local stage, detailed flow patterns are obtained from the generated coherent structure. Finally, we apply a guiding force field to the smoke simulator to produce the desired smoke illustration. To construct the training dataset, DualSmoke generates flow patterns using finite-time Lyapunov exponents of the velocity fields. The synthetic sketch data are generated from the flow patterns by skeleton extraction. Our user study verifies that the proposed design interface can provide various smoke illustration designs with good user usability. Our code is available at https://githubcom/shasph/DualSmoke.

烟雾的动态效果在插图设计中给人留下深刻印象,但对于不熟悉流体模拟领域知识的用户来说,设计烟雾效果是一个麻烦且具有挑战性的问题。在这项工作中,我们为交互式烟雾插图设计提出了一个从全局到局部的两阶段生成框架--DualSmoke。在全局阶段,建议的方法利用流体模式从用户的手绘草图生成拉格朗日相干结构。在局部阶段,从生成的相干结构中获得详细的流动模式。最后,我们将引导力场应用于烟雾模拟器,生成所需的烟雾插图。为了构建训练数据集,DualSmoke 使用速度场的有限时间 Lyapunov 指数生成流动模式。合成草图数据通过骨架提取从流动模式中生成。我们的用户研究验证了所提出的设计界面可以提供各种烟雾插图设计,具有良好的用户可用性。我们的代码见 https://githubcom/shasph/DualSmoke。
{"title":"DualSmoke: Sketch-based smoke illustration design with two-stage generative model","authors":"Haoran Xie, Keisuke Arihara, Syuhei Sato, Kazunori Miyata","doi":"10.1007/s41095-022-0318-0","DOIUrl":"https://doi.org/10.1007/s41095-022-0318-0","url":null,"abstract":"<p>The dynamic effects of smoke are impressive in illustration design, but it is a troublesome and challenging issue for inexpert users to design smoke effects without domain knowledge of fluid simulations. In this work, we propose DualSmoke, a two-stage global-to-local generation framework for interactive smoke illustration design. In the global stage, the proposed approach utilizes fluid patterns to generate Lagrangian coherent structures from the user’s hand-drawn sketches. In the local stage, detailed flow patterns are obtained from the generated coherent structure. Finally, we apply a guiding force field to the smoke simulator to produce the desired smoke illustration. To construct the training dataset, DualSmoke generates flow patterns using finite-time Lyapunov exponents of the velocity fields. The synthetic sketch data are generated from the flow patterns by skeleton extraction. Our user study verifies that the proposed design interface can provide various smoke illustration designs with good user usability. Our code is available at https://githubcom/shasph/DualSmoke.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"10 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139766088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep panoramic depth prediction and completion for indoor scenes 室内场景的深度全景深度预测和完成
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-08 DOI: 10.1007/s41095-023-0358-0
Giovanni Pintore, Eva Almansa, Armando Sanchez, Giorgio Vassena, Enrico Gobbetti

We introduce a novel end-to-end deep-learning solution for rapidly estimating a dense spherical depth map of an indoor environment. Our input is a single equirectangular image registered with a sparse depth map, as provided by a variety of common capture setups. Depth is inferred by an efficient and lightweight single-branch network, which employs a dynamic gating system to process together dense visual data and sparse geometric data. We exploit the characteristics of typical man-made environments to efficiently compress multi-resolution features and find short- and long-range relations among scene parts. Furthermore, we introduce a new augmentation strategy to make the model robust to different types of sparsity, including those generated by various structured light sensors and LiDAR setups. The experimental results demonstrate that our method provides interactive performance and outperforms state-of-the-art solutions in computational efficiency, adaptivity to variable depth sparsity patterns, and prediction accuracy for challenging indoor data, even when trained solely on synthetic data without any fine tuning.

我们介绍了一种新颖的端到端深度学习解决方案,用于快速估计室内环境的密集球形深度图。我们的输入是由各种常见捕捉设置提供的单个等角图像与稀疏深度图。深度由一个高效、轻便的单分支网络推断,该网络采用动态门控系统来处理密集的视觉数据和稀疏的几何数据。我们利用典型人造环境的特点来有效压缩多分辨率特征,并找到场景各部分之间的短距离和长距离关系。此外,我们还引入了一种新的增强策略,使模型对不同类型的稀疏性具有鲁棒性,包括由各种结构光传感器和激光雷达设置产生的稀疏性。实验结果表明,我们的方法具有交互性能,在计算效率、对不同深度稀疏模式的适应性以及对具有挑战性的室内数据的预测准确性方面都优于最先进的解决方案,即使仅在合成数据上进行训练而不做任何微调也是如此。
{"title":"Deep panoramic depth prediction and completion for indoor scenes","authors":"Giovanni Pintore, Eva Almansa, Armando Sanchez, Giorgio Vassena, Enrico Gobbetti","doi":"10.1007/s41095-023-0358-0","DOIUrl":"https://doi.org/10.1007/s41095-023-0358-0","url":null,"abstract":"<p>We introduce a novel end-to-end deep-learning solution for rapidly estimating a dense spherical depth map of an indoor environment. Our input is a single equirectangular image registered with a sparse depth map, as provided by a variety of common capture setups. Depth is inferred by an efficient and lightweight single-branch network, which employs a dynamic gating system to process together dense visual data and sparse geometric data. We exploit the characteristics of typical man-made environments to efficiently compress multi-resolution features and find short- and long-range relations among scene parts. Furthermore, we introduce a new augmentation strategy to make the model robust to different types of sparsity, including those generated by various structured light sensors and LiDAR setups. The experimental results demonstrate that our method provides interactive performance and outperforms state-of-the-art solutions in computational efficiency, adaptivity to variable depth sparsity patterns, and prediction accuracy for challenging indoor data, even when trained solely on synthetic data without any fine tuning.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"3 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139766149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shape embedding and retrieval in multi-flow deformation 多流变形中的形状嵌入和检索
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-02-08 DOI: 10.1007/s41095-022-0315-3
Baiqiang Leng, Jingwei Huang, Guanlin Shen, Bin Wang

We propose a unified 3D flow framework for joint learning of shape embedding and deformation for different categories. Our goal is to recover shapes from imperfect point clouds by fitting the best shape template in a shape repository after deformation. Accordingly, we learn a shape embedding for template retrieval and a flow-based network for robust deformation. We note that the deformation flow can be quite different for different shape categories. Therefore, we introduce a novel multi-hub module to learn multiple modes of deformation to incorporate such variation, providing a network which can handle a wide range of objects from different categories. The shape embedding is designed to retrieve the best-fit template as the nearest neighbor in a latent space. We replace the standard fully connected layer with a tiny structure in the embedding that significantly reduces network complexity and further improves deformation quality. Experiments show the superiority of our method to existing state-of-the-art methods via qualitative and quantitative comparisons. Finally, our method provides efficient and flexible deformation that can further be used for novel shape design.

我们提出了一个统一的三维流框架,用于联合学习不同类别的形状嵌入和变形。我们的目标是从不完美的点云中恢复形状,方法是在变形后在形状库中拟合最佳形状模板。因此,我们学习了用于模板检索的形状嵌入和用于稳健变形的基于流的网络。我们注意到,不同形状类别的变形流可能大不相同。因此,我们引入了一个新颖的多集线器模块来学习多种变形模式,以纳入这种变化,从而提供一个可处理不同类别的各种物体的网络。形状嵌入的设计目的是在潜在空间中检索作为最近邻的最合适模板。我们用嵌入中的微小结构取代了标准的全连接层,从而大大降低了网络的复杂性,并进一步提高了变形质量。通过定性和定量比较,实验表明我们的方法优于现有的最先进方法。最后,我们的方法提供了高效灵活的变形,可进一步用于新颖的形状设计。
{"title":"Shape embedding and retrieval in multi-flow deformation","authors":"Baiqiang Leng, Jingwei Huang, Guanlin Shen, Bin Wang","doi":"10.1007/s41095-022-0315-3","DOIUrl":"https://doi.org/10.1007/s41095-022-0315-3","url":null,"abstract":"<p>We propose a unified 3D flow framework for joint learning of shape embedding and deformation for different categories. Our goal is to recover shapes from imperfect point clouds by fitting the best shape template in a shape repository after deformation. Accordingly, we learn a shape embedding for template retrieval and a flow-based network for robust deformation. We note that the deformation flow can be quite different for different shape categories. Therefore, we introduce a novel multi-hub module to learn multiple modes of deformation to incorporate such variation, providing a network which can handle a wide range of objects from different categories. The shape embedding is designed to retrieve the best-fit template as the nearest neighbor in a latent space. We replace the standard fully connected layer with a tiny structure in the embedding that significantly reduces network complexity and further improves deformation quality. Experiments show the superiority of our method to existing state-of-the-art methods via qualitative and quantitative comparisons. Finally, our method provides efficient and flexible deformation that can further be used for novel shape design.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"45 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139766148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic ocean inverse modeling based on differentiable rendering 基于可变渲染的动态海洋反演建模
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-01-03 DOI: 10.1007/s41095-023-0338-4
Xueguang Xie, Yang Gao, Fei Hou, Aimin Hao, Hong Qin

Learning and inferring underlying motion patterns of captured 2D scenes and then re-creating dynamic evolution consistent with the real-world natural phenomena have high appeal for graphics and animation. To bridge the technical gap between virtual and real environments, we focus on the inverse modeling and reconstruction of visually consistent and property-verifiable oceans, taking advantage of deep learning and differentiable physics to learn geometry and constitute waves in a self-supervised manner. First, we infer hierarchical geometry using two networks, which are optimized via the differentiable renderer. We extract wave components from the sequence of inferred geometry through a network equipped with a differentiable ocean model. Then, ocean dynamics can be evolved using the reconstructed wave components. Through extensive experiments, we verify that our new method yields satisfactory results for both geometry reconstruction and wave estimation. Moreover, the new framework has the inverse modeling potential to facilitate a host of graphics applications, such as the rapid production of physically accurate scene animation and editing guided by real ocean scenes.

学习和推断捕捉到的二维场景的底层运动模式,然后重新创建与真实世界自然现象一致的动态演化,这对图形和动画制作具有很高的吸引力。为了弥合虚拟环境与真实环境之间的技术差距,我们重点研究了视觉上一致且属性可验证的海洋的反向建模和重建,利用深度学习和可微分物理学的优势,以自我监督的方式学习几何并构成波浪。首先,我们使用两个网络推断分层几何,并通过可微分渲染器进行优化。我们通过一个配备可微分海洋模型的网络,从推断出的几何图形序列中提取波浪成分。然后,就可以利用重建的波浪成分来演化海洋动力学。通过大量实验,我们验证了我们的新方法在几何重建和波浪估算方面都取得了令人满意的结果。此外,新框架还具有反建模潜力,可促进大量图形应用,如快速制作物理上精确的场景动画,以及在真实海洋场景指导下进行编辑。
{"title":"Dynamic ocean inverse modeling based on differentiable rendering","authors":"Xueguang Xie, Yang Gao, Fei Hou, Aimin Hao, Hong Qin","doi":"10.1007/s41095-023-0338-4","DOIUrl":"https://doi.org/10.1007/s41095-023-0338-4","url":null,"abstract":"<p>Learning and inferring underlying motion patterns of captured 2D scenes and then re-creating dynamic evolution consistent with the real-world natural phenomena have high appeal for graphics and animation. To bridge the technical gap between virtual and real environments, we focus on the inverse modeling and reconstruction of visually consistent and property-verifiable oceans, taking advantage of deep learning and differentiable physics to learn geometry and constitute waves in a self-supervised manner. First, we infer hierarchical geometry using two networks, which are optimized via the differentiable renderer. We extract wave components from the sequence of inferred geometry through a network equipped with a differentiable ocean model. Then, ocean dynamics can be evolved using the reconstructed wave components. Through extensive experiments, we verify that our new method yields satisfactory results for both geometry reconstruction and wave estimation. Moreover, the new framework has the inverse modeling potential to facilitate a host of graphics applications, such as the rapid production of physically accurate scene animation and editing guided by real ocean scenes.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"72 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139084317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking visual SLAM methods in mirror environments 镜像环境中视觉 SLAM 方法的基准测试
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-01-03 DOI: 10.1007/s41095-022-0329-x
Peter Herbert, Jing Wu, Ze Ji, Yu-Kun Lai

Visual simultaneous localisation and mapping (vSLAM) finds applications for indoor and outdoor navigation that routinely subjects it to visual complexities, particularly mirror reflections. The effect of mirror presence (time visible and its average size in the frame) was hypothesised to impact localisation and mapping performance, with systems using direct techniques expected to perform worse. Thus, a dataset, MirrEnv, of image sequences recorded in mirror environments, was collected, and used to evaluate the performance of existing representative methods. RGBD ORB-SLAM3 and BundleFusion appear to show moderate degradation of absolute trajectory error with increasing mirror duration, whilst the remaining results did not show significantly degraded localisation performance. The mesh maps generated proved to be very inaccurate, with real and virtual reflections colliding in the reconstructions. A discussion is given of the likely sources of error and robustness in mirror environments, outlining future directions for validating and improving vSLAM performance in the presence of planar mirrors. The MirrEnv dataset is available at https://doi.org/10.17035/d.2023.0292477898.

视觉同步定位和绘图(vSLAM)应用于室内和室外导航,经常会遇到复杂的视觉问题,尤其是镜面反射。镜面存在的影响(可见时间和镜面在画面中的平均大小)被认为会影响定位和绘图性能,使用直接技术的系统预计性能会更差。因此,我们收集了在镜面环境中记录的图像序列数据集 MirrEnv,用于评估现有代表性方法的性能。随着镜像持续时间的增加,RGBD ORB-SLAM3 和 BundleFusion 似乎显示出绝对轨迹误差的适度下降,而其余结果并未显示出明显的定位性能下降。事实证明,生成的网格图非常不准确,真实反射和虚拟反射在重建中发生碰撞。本文讨论了可能的误差来源和镜面环境下的鲁棒性,概述了验证和改进 vSLAM 在平面镜面下性能的未来方向。MirrEnv 数据集可在 https://doi.org/10.17035/d.2023.0292477898 上查阅。
{"title":"Benchmarking visual SLAM methods in mirror environments","authors":"Peter Herbert, Jing Wu, Ze Ji, Yu-Kun Lai","doi":"10.1007/s41095-022-0329-x","DOIUrl":"https://doi.org/10.1007/s41095-022-0329-x","url":null,"abstract":"<p>Visual simultaneous localisation and mapping (vSLAM) finds applications for indoor and outdoor navigation that routinely subjects it to visual complexities, particularly mirror reflections. The effect of mirror presence (time visible and its average size in the frame) was hypothesised to impact localisation and mapping performance, with systems using direct techniques expected to perform worse. Thus, a dataset, MirrEnv, of image sequences recorded in mirror environments, was collected, and used to evaluate the performance of existing representative methods. RGBD ORB-SLAM3 and BundleFusion appear to show moderate degradation of absolute trajectory error with increasing mirror duration, whilst the remaining results did not show significantly degraded localisation performance. The mesh maps generated proved to be very inaccurate, with real and virtual reflections colliding in the reconstructions. A discussion is given of the likely sources of error and robustness in mirror environments, outlining future directions for validating and improving vSLAM performance in the presence of planar mirrors. The MirrEnv dataset is available at https://doi.org/10.17035/d.2023.0292477898.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":"39 1","pages":""},"PeriodicalIF":6.9,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139084444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Visual Media
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1