首页 > 最新文献

Computational Visual Media最新文献

英文 中文
Dance2MIDI: Dance-driven multi-instrument music generation Dance2MIDI: 舞蹈驱动的多乐器音乐生成器
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-07-24 DOI: 10.1007/s41095-024-0417-1
Bo Han, Yuheng Li, Yixuan Shen, Yi Ren, Feilin Han

Dance-driven music generation aims to generate musical pieces conditioned on dance videos. Previous works focus on monophonic or raw audio generation, while the multi-instrument scenario is under-explored. The challenges associated with dance-driven multi-instrument music (MIDI) generation are twofold: (i) lack of a publicly available multi-instrument MIDI and video paired dataset and (ii) the weak correlation between music and video. To tackle these challenges, we have built the first multi-instrument MIDI and dance paired dataset (D2MIDI). Based on this dataset, we introduce a multi-instrument MIDI generation framework (Dance2MIDI) conditioned on dance video. Firstly, to capture the relationship between dance and music, we employ a graph convolutional network to encode the dance motion. This allows us to extract features related to dance movement and dance style. Secondly, to generate a harmonious rhythm, we utilize a transformer model to decode the drum track sequence, leveraging a cross-attention mechanism. Thirdly, we model the task of generating the remaining tracks based on the drum track as a sequence understanding and completion task. A BERT-like model is employed to comprehend the context of the entire music piece through self-supervised learning. We evaluate the music generated by our framework trained on the D2MIDI dataset and demonstrate that our method achieves state-of-the-art performance.

舞蹈驱动音乐生成旨在生成以舞蹈视频为条件的音乐作品。以往的工作主要集中在单声道或原始音频的生成上,而对多乐器的情况则探索不足。舞蹈驱动的多乐器音乐(MIDI)生成面临两方面的挑战:(i) 缺乏公开的多乐器 MIDI 和视频配对数据集;(ii) 音乐和视频之间的相关性较弱。为了应对这些挑战,我们建立了首个多乐器 MIDI 和舞蹈配对数据集(D2MIDI)。在这个数据集的基础上,我们引入了一个以舞蹈视频为条件的多乐器 MIDI 生成框架(Dance2MIDI)。首先,为了捕捉舞蹈与音乐之间的关系,我们采用图卷积网络对舞蹈动作进行编码。这使我们能够提取与舞蹈动作和舞蹈风格相关的特征。其次,为了生成和谐的节奏,我们利用交叉注意机制,利用变压器模型对鼓声序列进行解码。第三,我们将根据鼓声音轨生成其余音轨的任务建模为序列理解和完成任务。我们采用类似于 BERT 的模型,通过自我监督学习来理解整首乐曲的上下文。我们在 D2MIDI 数据集上评估了由我们的框架训练生成的音乐,并证明我们的方法达到了最先进的性能。
{"title":"Dance2MIDI: Dance-driven multi-instrument music generation","authors":"Bo Han, Yuheng Li, Yixuan Shen, Yi Ren, Feilin Han","doi":"10.1007/s41095-024-0417-1","DOIUrl":"https://doi.org/10.1007/s41095-024-0417-1","url":null,"abstract":"<p>Dance-driven music generation aims to generate musical pieces conditioned on dance videos. Previous works focus on monophonic or raw audio generation, while the multi-instrument scenario is under-explored. The challenges associated with dance-driven multi-instrument music (MIDI) generation are twofold: (i) lack of a publicly available multi-instrument MIDI and video paired dataset and (ii) the weak correlation between music and video. To tackle these challenges, we have built the first multi-instrument MIDI and dance paired dataset (D2MIDI). Based on this dataset, we introduce a multi-instrument MIDI generation framework (Dance2MIDI) conditioned on dance video. Firstly, to capture the relationship between dance and music, we employ a graph convolutional network to encode the dance motion. This allows us to extract features related to dance movement and dance style. Secondly, to generate a harmonious rhythm, we utilize a transformer model to decode the drum track sequence, leveraging a cross-attention mechanism. Thirdly, we model the task of generating the remaining tracks based on the drum track as a sequence understanding and completion task. A BERT-like model is employed to comprehend the context of the entire music piece through self-supervised learning. We evaluate the music generated by our framework trained on the D2MIDI dataset and demonstrate that our method achieves state-of-the-art performance.</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141782674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continual few-shot patch-based learning for anime-style colorization 基于少镜头补丁的连续学习,实现动画风格着色
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-07-09 DOI: 10.1007/s41095-024-0414-4
Akinobu Maejima, Seitaro Shinagawa, Hiroyuki Kubo, Takuya Funatomi, Tatsuo Yotsukura, Satoshi Nakamura, Yasuhiro Mukaigawa

The automatic colorization of anime line drawings is a challenging problem in production pipelines. Recent advances in deep neural networks have addressed this problem; however, collectingmany images of colorization targets in novel anime work before the colorization process starts leads to chicken-and-egg problems and has become an obstacle to using them in production pipelines. To overcome this obstacle, we propose a new patch-based learning method for few-shot anime-style colorization. The learning method adopts an efficient patch sampling technique with position embedding according to the characteristics of anime line drawings. We also present a continuous learning strategy that continuously updates our colorization model using new samples colorized by human artists. The advantage of our method is that it can learn our colorization model from scratch or pre-trained weights using only a few pre- and post-colorized line drawings that are created by artists in their usual colorization work. Therefore, our method can be easily incorporated within existing production pipelines. We quantitatively demonstrate that our colorizationmethod outperforms state-of-the-art methods.

动漫线稿的自动着色是生产流水线中的一个难题。深度神经网络的最新进展已经解决了这一问题;然而,在着色过程开始前收集大量新颖动漫作品中着色目标的图像会导致鸡生蛋、蛋生鸡的问题,成为在生产流水线中使用它们的障碍。为了克服这一障碍,我们提出了一种新的基于补丁的动漫风格着色学习方法。该学习方法采用了高效的补丁采样技术,并根据动漫线图的特点进行了位置嵌入。我们还提出了一种持续学习策略,利用人类艺术家着色的新样本不断更新我们的着色模型。我们的方法的优势在于,它可以从头开始学习我们的着色模型,或者只使用一些着色前和着色后的线条图来预先训练权重,这些线条图是由艺术家们在日常着色工作中创建的。因此,我们的方法可以很容易地集成到现有的生产流水线中。我们通过定量分析证明,我们的着色方法优于最先进的方法。
{"title":"Continual few-shot patch-based learning for anime-style colorization","authors":"Akinobu Maejima, Seitaro Shinagawa, Hiroyuki Kubo, Takuya Funatomi, Tatsuo Yotsukura, Satoshi Nakamura, Yasuhiro Mukaigawa","doi":"10.1007/s41095-024-0414-4","DOIUrl":"https://doi.org/10.1007/s41095-024-0414-4","url":null,"abstract":"<p>The automatic colorization of anime line drawings is a challenging problem in production pipelines. Recent advances in deep neural networks have addressed this problem; however, collectingmany images of colorization targets in novel anime work before the colorization process starts leads to chicken-and-egg problems and has become an obstacle to using them in production pipelines. To overcome this obstacle, we propose a new patch-based learning method for few-shot anime-style colorization. The learning method adopts an efficient patch sampling technique with position embedding according to the characteristics of anime line drawings. We also present a continuous learning strategy that continuously updates our colorization model using new samples colorized by human artists. The advantage of our method is that it can learn our colorization model from scratch or pre-trained weights using only a few pre- and post-colorized line drawings that are created by artists in their usual colorization work. Therefore, our method can be easily incorporated within existing production pipelines. We quantitatively demonstrate that our colorizationmethod outperforms state-of-the-art methods.</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141576145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent advances in 3D Gaussian splatting 三维高斯拼接技术的最新进展
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-07-08 DOI: 10.1007/s41095-024-0436-y
Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang, Yan-Pei Cao, Ling-Qi Yan, Lin Gao

The emergence of 3D Gaussian splatting (3DGS) has greatly accelerated rendering in novel view synthesis. Unlike neural implicit representations like neural radiance fields (NeRFs) that represent a 3D scene with position and viewpoint-conditioned neural networks, 3D Gaussian splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished by rasterizing Gaussian ellipsoids into images. Apart from fast rendering, the explicit representation of 3D Gaussian splatting also facilitates downstream tasks like dynamic reconstruction, geometry editing, and physical simulation. Considering the rapid changes and growing number of works in this field, we present a literature review of recent 3D Gaussian splatting methods, which can be roughly classified by functionality into 3D reconstruction, 3D editing, and other downstream applications. Traditional point-based rendering methods and the rendering formulation of 3D Gaussian splatting are also covered to aid understanding of this technique. This survey aims to help beginners to quickly get started in this field and to provide experienced researchers with a comprehensive overview, aiming to stimulate future development of the 3D Gaussian splatting representation.

三维高斯拼接(3DGS)的出现大大加快了新型视图合成的渲染速度。神经辐射场(NeRFs)等神经隐式表示法通过位置和视点条件神经网络来表示三维场景,而三维高斯拼接法则不同,它利用一组高斯椭球来模拟场景,这样就可以通过将高斯椭球光栅化到图像中来实现高效渲染。除了快速渲染外,三维高斯拼接的显式表示法还有助于完成动态重建、几何编辑和物理模拟等下游任务。考虑到这一领域日新月异的变化和日益增多的作品,我们对最近的三维高斯拼接方法进行了文献综述,这些方法按功能可大致分为三维重建、三维编辑和其他下游应用。此外,我们还介绍了传统的基于点的渲染方法和三维高斯拼接的渲染公式,以帮助读者理解这项技术。本研究旨在帮助初学者快速入门这一领域,并为有经验的研究人员提供全面的概述,从而促进三维高斯拼接表示法的未来发展。
{"title":"Recent advances in 3D Gaussian splatting","authors":"Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang, Yan-Pei Cao, Ling-Qi Yan, Lin Gao","doi":"10.1007/s41095-024-0436-y","DOIUrl":"https://doi.org/10.1007/s41095-024-0436-y","url":null,"abstract":"<p>The emergence of 3D Gaussian splatting (3DGS) has greatly accelerated rendering in novel view synthesis. Unlike neural implicit representations like neural radiance fields (NeRFs) that represent a 3D scene with position and viewpoint-conditioned neural networks, 3D Gaussian splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished by rasterizing Gaussian ellipsoids into images. Apart from fast rendering, the explicit representation of 3D Gaussian splatting also facilitates downstream tasks like dynamic reconstruction, geometry editing, and physical simulation. Considering the rapid changes and growing number of works in this field, we present a literature review of recent 3D Gaussian splatting methods, which can be roughly classified by functionality into 3D reconstruction, 3D editing, and other downstream applications. Traditional point-based rendering methods and the rendering formulation of 3D Gaussian splatting are also covered to aid understanding of this technique. This survey aims to help beginners to quickly get started in this field and to provide experienced researchers with a comprehensive overview, aiming to stimulate future development of the 3D Gaussian splatting representation.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141576141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Illuminator: Image-based illumination editing for indoor scene harmonization 照明器基于图像的照明编辑,实现室内场景协调
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-07-05 DOI: 10.1007/s41095-023-0397-6
Zhongyun Bao, Gang Fu, Zipei Chen, Chunxia Xiao

Illumination harmonization is an important but challenging task that aims to achieve illumination compatibility between the foreground and background under different illumination conditions. Most current studies mainly focus on achieving seamless integration between the appearance (illumination or visual style) of the foreground object itself and the background scene or producing the foreground shadow. They rarely considered global illumination consistency (i.e., the illumination and shadow of the foreground object). In our work, we introduce “Illuminator”, an image-based illumination editing technique. This method aims to achieve more realistic global illumination harmonization, ensuring consistent illumination and plausible shadows in complex indoor environments. The Illuminator contains a shadow residual generation branch and an object illumination transfer branch. The shadow residual generation branch introduces a novel attention-aware graph convolutional mechanism to achieve reasonable foreground shadow generation. The object illumination transfer branch primarily transfers background illumination to the foreground region. In addition, we construct a real-world indoor illumination harmonization dataset called RIH, which consists of various foreground objects and background scenes captured under diverse illumination conditions for training and evaluating our Illuminator. Our comprehensive experiments, conducted on the RIH dataset and a collection of real-world everyday life photos, validate the effectiveness of our method.

光照协调是一项重要但极具挑战性的任务,其目的是在不同光照条件下实现前景和背景之间的光照兼容。目前的大多数研究主要集中在实现前景物体本身的外观(光照或视觉风格)与背景场景的无缝整合,或产生前景阴影。他们很少考虑全局光照一致性(即前景物体的光照和阴影)。在我们的工作中,我们引入了 "Illuminator"--一种基于图像的光照编辑技术。该方法旨在实现更逼真的全局光照协调,确保复杂室内环境中光照一致、阴影可信。照明器包含一个阴影残留生成分支和一个物体照明转移分支。阴影残留生成分支引入了一种新颖的注意力感知图卷积机制,以实现合理的前景阴影生成。物体光照转移分支主要是将背景光照转移到前景区域。此外,我们还构建了一个名为 RIH 的真实世界室内光照协调数据集,其中包括在不同光照条件下捕获的各种前景物体和背景场景,用于训练和评估我们的照明器。我们在 RIH 数据集和一组真实世界的日常生活照片上进行了综合实验,验证了我们方法的有效性。
{"title":"Illuminator: Image-based illumination editing for indoor scene harmonization","authors":"Zhongyun Bao, Gang Fu, Zipei Chen, Chunxia Xiao","doi":"10.1007/s41095-023-0397-6","DOIUrl":"https://doi.org/10.1007/s41095-023-0397-6","url":null,"abstract":"<p>Illumination harmonization is an important but challenging task that aims to achieve illumination compatibility between the foreground and background under different illumination conditions. Most current studies mainly focus on achieving seamless integration between the appearance (illumination or visual style) of the foreground object itself and the background scene or producing the foreground shadow. They rarely considered global illumination consistency (i.e., the illumination and shadow of the foreground object). In our work, we introduce “Illuminator”, an image-based illumination editing technique. This method aims to achieve more realistic global illumination harmonization, ensuring consistent illumination and plausible shadows in complex indoor environments. The Illuminator contains a shadow residual generation branch and an object illumination transfer branch. The shadow residual generation branch introduces a novel attention-aware graph convolutional mechanism to achieve reasonable foreground shadow generation. The object illumination transfer branch primarily transfers background illumination to the foreground region. In addition, we construct a real-world indoor illumination harmonization dataset called RIH, which consists of various foreground objects and background scenes captured under diverse illumination conditions for training and evaluating our Illuminator. Our comprehensive experiments, conducted on the RIH dataset and a collection of real-world everyday life photos, validate the effectiveness of our method.\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141549741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shell stand: Stable thin shell models for 3D fabrication 外壳支架:用于三维制造的稳定薄壳模型
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-06-24 DOI: 10.1007/s41095-024-0402-8
Yu Xing, Xiaoxuan Wang, Lin Lu, Andrei Sharf, Daniel Cohen-Or, Changhe Tu

A thin shell model refers to a surface or structure, where the object’s thickness is considered negligible. In the context of 3D printing, thin shell models are characterized by having lightweight, hollow structures, and reduced material usage. Their versatility and visual appeal make them popular in various fields, such as cloth simulation, character skinning, and for thin-walled structures like leaves, paper, or metal sheets. Nevertheless, optimization of thin shell models without external support remains a challenge due to their minimal interior operational space. For the same reasons, hollowing methods are also unsuitable for this task. In fact, thin shell modulation methods are required to preserve the visual appearance of a two-sided surface which further constrain the problem space. In this paper, we introduce a new visual disparity metric tailored for shell models, integrating local details and global shape attributes in terms of visual perception. Our method modulates thin shell models using global deformations and local thickening while accounting for visual saliency, stability, and structural integrity. Thereby, thin shell models such as bas-reliefs, hollow shapes, and cloth can be stabilized to stand in arbitrary orientations, making them ideal for 3D printing.

薄壳模型是指物体的厚度可以忽略不计的表面或结构。在三维打印中,薄壳模型的特点是重量轻、中空结构和材料用量少。薄壳模型的多功能性和视觉吸引力使其在布料仿真、人物皮肤制作以及树叶、纸张或金属片等薄壁结构等多个领域大受欢迎。然而,由于薄壳的内部操作空间极小,对无外部支撑的薄壳模型进行优化仍是一项挑战。出于同样的原因,掏空方法也不适合这项任务。事实上,薄壳调制方法需要保留双面表面的视觉外观,这进一步限制了问题空间。在本文中,我们针对薄壳模型引入了一种新的视觉差异度量,从视觉感知的角度整合了局部细节和整体形状属性。我们的方法利用全局变形和局部增厚来调节薄壳模型,同时考虑视觉显著性、稳定性和结构完整性。因此,薄壳模型(如浮雕、空心形状和布料)可以稳定地站在任意方向上,非常适合三维打印。
{"title":"Shell stand: Stable thin shell models for 3D fabrication","authors":"Yu Xing, Xiaoxuan Wang, Lin Lu, Andrei Sharf, Daniel Cohen-Or, Changhe Tu","doi":"10.1007/s41095-024-0402-8","DOIUrl":"https://doi.org/10.1007/s41095-024-0402-8","url":null,"abstract":"<p>A thin shell model refers to a surface or structure, where the object’s thickness is considered negligible. In the context of 3D printing, thin shell models are characterized by having lightweight, hollow structures, and reduced material usage. Their versatility and visual appeal make them popular in various fields, such as cloth simulation, character skinning, and for thin-walled structures like leaves, paper, or metal sheets. Nevertheless, optimization of thin shell models without external support remains a challenge due to their minimal interior operational space. For the same reasons, hollowing methods are also unsuitable for this task. In fact, thin shell modulation methods are required to preserve the visual appearance of a two-sided surface which further constrain the problem space. In this paper, we introduce a new visual disparity metric tailored for shell models, integrating local details and global shape attributes in terms of visual perception. Our method modulates thin shell models using global deformations and local thickening while accounting for visual saliency, stability, and structural integrity. Thereby, thin shell models such as bas-reliefs, hollow shapes, and cloth can be stabilized to stand in arbitrary orientations, making them ideal for 3D printing.</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141508780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Noise4Denoise: Leveraging noise for unsupervised point cloud denoising Noise4Denoise:利用噪声进行无监督点云去噪
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-06-14 DOI: 10.1007/s41095-024-0423-3
Weijia Wang, Xiao Liu, Hailing Zhou, Lei Wei, Zhigang Deng, Manzur Murshed, Xuequan Lu
{"title":"Noise4Denoise: Leveraging noise for unsupervised point cloud denoising","authors":"Weijia Wang, Xiao Liu, Hailing Zhou, Lei Wei, Zhigang Deng, Manzur Murshed, Xuequan Lu","doi":"10.1007/s41095-024-0423-3","DOIUrl":"https://doi.org/10.1007/s41095-024-0423-3","url":null,"abstract":"","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141340529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-modal learning using privileged information for long-tailed image classification 利用特权信息进行长尾图像分类的跨模态学习
IF 6.9 3区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-06-10 DOI: 10.1007/s41095-023-0382-0
Xiangxian Li, Yuze Zheng, Haokai Ma, Zhuang Qi, Xiangxu Meng, Lei Meng
{"title":"Cross-modal learning using privileged information for long-tailed image classification","authors":"Xiangxian Li, Yuze Zheng, Haokai Ma, Zhuang Qi, Xiangxu Meng, Lei Meng","doi":"10.1007/s41095-023-0382-0","DOIUrl":"https://doi.org/10.1007/s41095-023-0382-0","url":null,"abstract":"","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141360974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Temporal vectorized visibility for direct illumination of animated models 用于动画模型直接照明的时间矢量化可见度
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2024-05-29 DOI: 10.1007/s41095-023-0339-3
Zhenni Wang, Tze Yui Ho, Yi Xiao, Chi Sing Leung

Direct illumination rendering is an important technique in computer graphics. Precomputed radiance transfer algorithms can provide high quality rendering results in real time, but they can only support rigid models. On the other hand, ray tracing algorithms are flexible and can gracefully handle animated models. With NVIDIA RTX and the AI denoiser, we can use ray tracing algorithms to render visually appealing results in real time. Visually appealing though, they can deviate from the actual one considerably. We propose a visibility-boundary edge oriented infinite triangle bounding volume hierarchy (BVH) traversal algorithm to dynamically generate visibility in vector form. Our algorithm utilizes the properties of visibility-boundary edges and infinite triangle BVH traversal to maximize the efficiency of the vector form visibility generation. A novel data structure, temporal vectorized visibility, is proposed, which allows visibility in vector form to be shared across time and further increases the generation efficiency. Our algorithm can efficiently render close-to-reference direct illumination results. With the similar processing time, it provides a visual quality improvement around 10 dB in terms of peak signal-to-noise ratio (PSNR) w.r.t. the ray tracing algorithm reservoir-based spatiotemporal importance resampling (ReSTIR).

直接光照渲染是计算机图形学中的一项重要技术。预计算辐射传递算法可以实时提供高质量的渲染效果,但只能支持刚性模型。另一方面,光线追踪算法非常灵活,可以优雅地处理动画模型。借助英伟达™ RTX 和人工智能去噪器,我们可以使用光线追踪算法实时渲染出具有视觉吸引力的效果。虽然视觉效果吸引人,但它们可能与实际效果有很大偏差。我们提出了一种以可见性边界边缘为导向的无限三角形边界体层次结构(BVH)遍历算法,以矢量形式动态生成可见性。我们的算法利用可见性边界边和无限三角形边界体积层次遍历的特性,最大限度地提高了矢量形式可见性生成的效率。我们还提出了一种新颖的数据结构--时间矢量化可见性,它允许矢量形式的可见性跨时间共享,并进一步提高了生成效率。我们的算法可以高效地呈现接近参考的直接光照结果。与基于水库的时空重要性重采样(ReSTIR)的光线跟踪算法相比,在处理时间相近的情况下,该算法的峰值信噪比(PSNR)提高了约 10 dB。
{"title":"Temporal vectorized visibility for direct illumination of animated models","authors":"Zhenni Wang, Tze Yui Ho, Yi Xiao, Chi Sing Leung","doi":"10.1007/s41095-023-0339-3","DOIUrl":"https://doi.org/10.1007/s41095-023-0339-3","url":null,"abstract":"<p>Direct illumination rendering is an important technique in computer graphics. Precomputed radiance transfer algorithms can provide high quality rendering results in real time, but they can only support rigid models. On the other hand, ray tracing algorithms are flexible and can gracefully handle animated models. With NVIDIA RTX and the AI denoiser, we can use ray tracing algorithms to render visually appealing results in real time. Visually appealing though, they can deviate from the actual one considerably. We propose a visibility-boundary edge oriented infinite triangle bounding volume hierarchy (BVH) traversal algorithm to dynamically generate visibility in vector form. Our algorithm utilizes the properties of visibility-boundary edges and infinite triangle BVH traversal to maximize the efficiency of the vector form visibility generation. A novel data structure, temporal vectorized visibility, is proposed, which allows visibility in vector form to be shared across time and further increases the generation efficiency. Our algorithm can efficiently render close-to-reference direct illumination results. With the similar processing time, it provides a visual quality improvement around 10 dB in terms of peak signal-to-noise ratio (PSNR) w.r.t. the ray tracing algorithm reservoir-based spatiotemporal importance resampling (ReSTIR).\u0000</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141167717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Super-resolution reconstruction of single image for latent features 针对潜在特征的单幅图像超分辨率重建
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2024-05-24 DOI: 10.1007/s41095-023-0387-8
Xin Wang, Jing-Ke Yan, Jing-Ye Cai, Jian-Hua Deng, Qin Qin, Yao Cheng

Single-image super-resolution (SISR) typically focuses on restoring various degraded low-resolution (LR) images to a single high-resolution (HR) image. However, during SISR tasks, it is often challenging for models to simultaneously maintain high quality and rapid sampling while preserving diversity in details and texture features. This challenge can lead to issues such as model collapse, lack of rich details and texture features in the reconstructed HR images, and excessive time consumption for model sampling. To address these problems, this paper proposes a Latent Feature-oriented Diffusion Probability Model (LDDPM). First, we designed a conditional encoder capable of effectively encoding LR images, reducing the solution space for model image reconstruction and thereby improving the quality of the reconstructed images. We then employed a normalized flow and multimodal adversarial training, learning from complex multimodal distributions, to model the denoising distribution. Doing so boosts the generative modeling capabilities within a minimal number of sampling steps. Experimental comparisons of our proposed model with existing SISR methods on mainstream datasets demonstrate that our model reconstructs more realistic HR images and achieves better performance on multiple evaluation metrics, providing a fresh perspective for tackling SISR tasks.

单图像超分辨率(SISR)通常侧重于将各种降级的低分辨率(LR)图像复原为单一的高分辨率(HR)图像。然而,在 SISR 任务中,模型要同时保持高质量和快速采样,并保留细节和纹理特征的多样性,往往具有挑战性。这一挑战可能导致模型崩溃、重建的高分辨率图像缺乏丰富的细节和纹理特征以及模型采样耗时过长等问题。为了解决这些问题,本文提出了一种面向潜特征的扩散概率模型(LDDPM)。首先,我们设计了一种能够有效编码 LR 图像的条件编码器,减少了模型图像重建的解空间,从而提高了重建图像的质量。然后,我们采用归一化流和多模态对抗训练,从复杂的多模态分布中学习,对去噪分布进行建模。这样做可以在最少的采样步骤内提高生成建模能力。我们提出的模型与现有的 SISR 方法在主流数据集上进行的实验比较表明,我们的模型能重建更逼真的 HR 图像,并在多个评估指标上取得更好的性能,为解决 SISR 任务提供了一个全新的视角。
{"title":"Super-resolution reconstruction of single image for latent features","authors":"Xin Wang, Jing-Ke Yan, Jing-Ye Cai, Jian-Hua Deng, Qin Qin, Yao Cheng","doi":"10.1007/s41095-023-0387-8","DOIUrl":"https://doi.org/10.1007/s41095-023-0387-8","url":null,"abstract":"<p>Single-image super-resolution (SISR) typically focuses on restoring various degraded low-resolution (LR) images to a single high-resolution (HR) image. However, during SISR tasks, it is often challenging for models to simultaneously maintain high quality and rapid sampling while preserving diversity in details and texture features. This challenge can lead to issues such as model collapse, lack of rich details and texture features in the reconstructed HR images, and excessive time consumption for model sampling. To address these problems, this paper proposes a Latent Feature-oriented Diffusion Probability Model (LDDPM). First, we designed a conditional encoder capable of effectively encoding LR images, reducing the solution space for model image reconstruction and thereby improving the quality of the reconstructed images. We then employed a normalized flow and multimodal adversarial training, learning from complex multimodal distributions, to model the denoising distribution. Doing so boosts the generative modeling capabilities within a minimal number of sampling steps. Experimental comparisons of our proposed model with existing SISR methods on mainstream datasets demonstrate that our model reconstructs more realistic HR images and achieves better performance on multiple evaluation metrics, providing a fresh perspective for tackling SISR tasks.</p>","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141150666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
THP: Tensor-field-driven hierarchical path planning for autonomous scene exploration with depth sensors THP:利用深度传感器进行自主场景探索的张量场驱动分层路径规划
IF 6.9 3区 计算机科学 Q1 Computer Science Pub Date : 2024-05-18 DOI: 10.1007/s41095-022-0312-6
Yuefeng Xi, Chenyang Zhu, Yao Duan, Renjiao Yi, Lintao Zheng, Hongjun He, Kai Xu
{"title":"THP: Tensor-field-driven hierarchical path planning for autonomous scene exploration with depth sensors","authors":"Yuefeng Xi, Chenyang Zhu, Yao Duan, Renjiao Yi, Lintao Zheng, Hongjun He, Kai Xu","doi":"10.1007/s41095-022-0312-6","DOIUrl":"https://doi.org/10.1007/s41095-022-0312-6","url":null,"abstract":"","PeriodicalId":37301,"journal":{"name":"Computational Visual Media","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141125029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Visual Media
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1