首页 > 最新文献

Computer Graphics Forum最新文献

英文 中文
DSGI-Net: Density-based Selective Grouping Point Cloud Learning Network for Indoor Scene DSGI-Net:用于室内场景的基于密度的选择性分组点云学习网络
IF 2.7 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-24 DOI: 10.1111/cgf.15218
Xin Wen, Yao Duan, Kai Xu, Chenyang Zhu

Indoor scene point clouds exhibit diverse distributions and varying levels of sparsity, characterized by more intricate geometry and occlusion compared to outdoor scenes or individual objects. Despite recent advancements in 3D point cloud analysis introducing various network architectures, there remains a lack of frameworks tailored to the unique attributes of indoor scenarios. To address this, we propose DSGI-Net, a novel indoor scene point cloud learning network that can be integrated into existing models. The key innovation of this work is selectively grouping more informative neighbor points in sparse regions and promoting semantic consistency of the local area where different instances are in proximity but belong to distinct categories. Furthermore, our method encodes both semantic and spatial relationships between points in local regions to reduce the loss of local geometric details. Extensive experiments on the ScanNetv2, SUN RGB-D, and S3DIS indoor scene benchmarks demonstrate that our method is straightforward yet effective.

与室外场景或单个物体相比,室内场景点云呈现出多样化的分布和不同程度的稀疏性,其特点是几何形状和遮挡更为复杂。尽管最近在三维点云分析中引入了各种网络架构,但仍然缺乏针对室内场景独特属性的框架。为了解决这个问题,我们提出了 DSGI-Net,一种可集成到现有模型中的新型室内场景点云学习网络。这项工作的关键创新点在于有选择性地将稀疏区域中信息量更大的邻近点分组,并在不同实例相邻但属于不同类别的局部区域促进语义一致性。此外,我们的方法还对局部区域中点之间的语义和空间关系进行了编码,以减少局部几何细节的损失。在 ScanNetv2、SUN RGB-D 和 S3DIS 室内场景基准上进行的大量实验表明,我们的方法简单而有效。
{"title":"DSGI-Net: Density-based Selective Grouping Point Cloud Learning Network for Indoor Scene","authors":"Xin Wen,&nbsp;Yao Duan,&nbsp;Kai Xu,&nbsp;Chenyang Zhu","doi":"10.1111/cgf.15218","DOIUrl":"https://doi.org/10.1111/cgf.15218","url":null,"abstract":"<p>Indoor scene point clouds exhibit diverse distributions and varying levels of sparsity, characterized by more intricate geometry and occlusion compared to outdoor scenes or individual objects. Despite recent advancements in 3D point cloud analysis introducing various network architectures, there remains a lack of frameworks tailored to the unique attributes of indoor scenarios. To address this, we propose DSGI-Net, a novel indoor scene point cloud learning network that can be integrated into existing models. The key innovation of this work is selectively grouping more informative neighbor points in sparse regions and promoting semantic consistency of the local area where different instances are in proximity but belong to distinct categories. Furthermore, our method encodes both semantic and spatial relationships between points in local regions to reduce the loss of local geometric details. Extensive experiments on the ScanNetv2, SUN RGB-D, and S3DIS indoor scene benchmarks demonstrate that our method is straightforward yet effective.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evolutive 3D Urban Data Representation through Timeline Design Space 通过时间轴设计空间进化三维城市数据表示
IF 2.7 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-24 DOI: 10.1111/cgf.15237
C. Le Bihan Gautier, J. Delanoy, G. Gesquière

Cities are constantly changing to adapt to new societal and environmental challenges. Understanding their evolution is thus essential to make informed decisions about their future. To capture these changes, cities are increasingly offering digital 3D snapshots of their territory over time. However, existing tools to visualise these data typically represent the city at a specific point in time, limiting a comprehensive analysis of its evolution. In this paper, we propose a new method for simultaneously visualising different versions of the city in a 3D space. We integrate the different versions of the city along a new way of 3D timeline that can take different shapes depending on the needs of the user and the dataset being visualised. We propose four different shapes of timelines and three ways to place the versions along it. Our method places the versions such that there is no visual overlap for the user by varying the parameters of the timelines, and offer options to ease the understanding of the scene by changing the orientation or scale of the versions. We evaluate our method on different datasets to demonstrate the advantages and limitations of the different shapes of timeline and provide recommendations so as to which shape to chose.

城市在不断变化,以适应新的社会和环境挑战。因此,要对城市的未来做出明智的决策,了解城市的发展变化至关重要。为了捕捉这些变化,越来越多的城市开始提供其领土随时间变化的数字三维快照。然而,现有的可视化这些数据的工具通常代表城市在特定时间点的情况,从而限制了对其演变的全面分析。在本文中,我们提出了一种在三维空间中同时可视化不同版本城市的新方法。我们通过一种新的三维时间轴方式将不同版本的城市整合在一起,这种时间轴可以根据用户的需求和可视化数据集的不同而呈现出不同的形状。我们提出了四种不同形状的时间轴和三种沿时间轴放置版本的方法。我们的方法通过改变时间轴的参数来放置版本,从而避免用户视觉上的重叠,并通过改变版本的方向或比例来提供便于理解场景的选项。我们在不同的数据集上对我们的方法进行了评估,以展示不同形状时间线的优势和局限性,并就选择哪种形状的时间线提出建议。
{"title":"Evolutive 3D Urban Data Representation through Timeline Design Space","authors":"C. Le Bihan Gautier,&nbsp;J. Delanoy,&nbsp;G. Gesquière","doi":"10.1111/cgf.15237","DOIUrl":"https://doi.org/10.1111/cgf.15237","url":null,"abstract":"<p>Cities are constantly changing to adapt to new societal and environmental challenges. Understanding their evolution is thus essential to make informed decisions about their future. To capture these changes, cities are increasingly offering digital 3D snapshots of their territory over time. However, existing tools to visualise these data typically represent the city at a specific point in time, limiting a comprehensive analysis of its evolution. In this paper, we propose a new method for simultaneously visualising different versions of the city in a 3D space. We integrate the different versions of the city along a new way of 3D timeline that can take different shapes depending on the needs of the user and the dataset being visualised. We propose four different shapes of timelines and three ways to place the versions along it. Our method places the versions such that there is no visual overlap for the user by varying the parameters of the timelines, and offer options to ease the understanding of the scene by changing the orientation or scale of the versions. We evaluate our method on different datasets to demonstrate the advantages and limitations of the different shapes of timeline and provide recommendations so as to which shape to chose.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequency-Aware Facial Image Shadow Removal through Skin Color and Texture Learning 通过皮肤颜色和纹理学习实现频率感知面部图像阴影去除
IF 2.7 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-24 DOI: 10.1111/cgf.15220
Ling Zhang, Wenyang Xie, Chunxia Xiao

Existing facial image shadow removal methods predominantly rely on pre-extracted facial features. However, these methods often fail to capitalize on the full potential of these features, resorting to simplified utilization. Furthermore, they tend to overlook the importance of low-frequency information during the extraction of prior features, which can be easily compromised by noises. In our work, we propose a frequency-aware shadow removal network (FSRNet) for facial image shadow removal, which utilizes the skin color and texture information in the face to help recover illumination in shadow regions. Our FSRNet uses a frequency-domain image decomposition network to extract the low-frequency skin color map and high-frequency texture map from the face images, and applies a color-texture guided shadow removal network to produce final shadow removal result. Concretely, the designed fourier sparse attention block (FSABlock) can transform images from the spatial domain to the frequency domain and help the network focus on the key information. We also introduce a skin color fusion module (CFModule) and a texture fusion module (TFModule) to enhance the understanding and utilization of color and texture features, promoting high-quality result without color distortion and detail blurring. Extensive experiments demonstrate the superiority of the proposed method. The code is available at https://github.com/laoxie521/FSRNet.

现有的面部图像阴影去除方法主要依赖于预先提取的面部特征。然而,这些方法往往不能充分发挥这些特征的潜力,而只是简单地加以利用。此外,这些方法在提取先验特征时往往忽略了低频信息的重要性,而低频信息很容易受到噪声的影响。在我们的工作中,我们提出了一种用于面部图像阴影去除的频率感知阴影去除网络(FSRNet),它利用面部的肤色和纹理信息来帮助恢复阴影区域的光照度。我们的 FSRNet 利用频域图像分解网络从人脸图像中提取低频肤色图和高频纹理图,并应用颜色-纹理引导的阴影去除网络来生成最终的阴影去除结果。具体来说,所设计的傅立叶稀疏关注块(FSABlock)可以将图像从空间域转换到频率域,帮助网络聚焦于关键信息。此外,我们还引入了肤色融合模块(CFModule)和纹理融合模块(TFModule),以加强对颜色和纹理特征的理解和利用,从而获得无色彩失真和细节模糊的高质量结果。大量实验证明了所提方法的优越性。代码见 https://github.com/laoxie521/FSRNet。
{"title":"Frequency-Aware Facial Image Shadow Removal through Skin Color and Texture Learning","authors":"Ling Zhang,&nbsp;Wenyang Xie,&nbsp;Chunxia Xiao","doi":"10.1111/cgf.15220","DOIUrl":"https://doi.org/10.1111/cgf.15220","url":null,"abstract":"<p>Existing facial image shadow removal methods predominantly rely on pre-extracted facial features. However, these methods often fail to capitalize on the full potential of these features, resorting to simplified utilization. Furthermore, they tend to overlook the importance of low-frequency information during the extraction of prior features, which can be easily compromised by noises. In our work, we propose a frequency-aware shadow removal network (FSRNet) for facial image shadow removal, which utilizes the skin color and texture information in the face to help recover illumination in shadow regions. Our FSRNet uses a frequency-domain image decomposition network to extract the low-frequency skin color map and high-frequency texture map from the face images, and applies a color-texture guided shadow removal network to produce final shadow removal result. Concretely, the designed fourier sparse attention block (FSABlock) can transform images from the spatial domain to the frequency domain and help the network focus on the key information. We also introduce a skin color fusion module (CFModule) and a texture fusion module (TFModule) to enhance the understanding and utilization of color and texture features, promoting high-quality result without color distortion and detail blurring. Extensive experiments demonstrate the superiority of the proposed method. The code is available at https://github.com/laoxie521/FSRNet.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatially and Temporally Optimized Audio-Driven Talking Face Generation 时空优化的音频驱动人脸生成技术
IF 2.7 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-24 DOI: 10.1111/cgf.15228
Biao Dong, Bo-Yao Ma, Lei Zhang

Audio-driven talking face generation is essentially a cross-modal mapping from audio to video frames. The main challenge lies in the intricate one-to-many mapping, which affects lip sync accuracy. And the loss of facial details during image reconstruction often results in visual artifacts in the generated video. To overcome these challenges, this paper proposes to enhance the quality of generated talking faces with a new spatio-temporal consistency. Specifically, the temporal consistency is achieved through consecutive frames of the each phoneme, which form temporal modules that exhibit similar lip appearance changes. This allows for adaptive adjustment in the lip movement for accurate sync. The spatial consistency pertains to the uniform distribution of textures within local regions, which form spatial modules and regulate the texture distribution in the generator. This yields fine details in the reconstructed facial images. Extensive experiments show that our method can generate more natural talking faces than previous state-of-the-art methods in both accurate lip sync and realistic facial details.

音频驱动的人脸识别本质上是从音频到视频帧的跨模态映射。主要挑战在于复杂的一对多映射,这会影响唇语同步的准确性。而图像重建过程中面部细节的丢失往往会导致生成的视频出现视觉假象。为了克服这些挑战,本文提出用一种新的时空一致性来提高生成的会说话人脸的质量。具体来说,时空一致性是通过每个音素的连续帧来实现的,这些连续帧形成的时空模块表现出相似的唇部外观变化。这样就可以自适应地调整嘴唇的运动,实现准确的同步。空间一致性与局部区域内纹理的均匀分布有关,这些纹理形成空间模块,并调节生成器中的纹理分布。这就产生了重建面部图像中的精细细节。广泛的实验表明,我们的方法在准确的唇部同步和逼真的面部细节方面都能生成比以往最先进的方法更自然的说话人脸。
{"title":"Spatially and Temporally Optimized Audio-Driven Talking Face Generation","authors":"Biao Dong,&nbsp;Bo-Yao Ma,&nbsp;Lei Zhang","doi":"10.1111/cgf.15228","DOIUrl":"https://doi.org/10.1111/cgf.15228","url":null,"abstract":"<p>Audio-driven talking face generation is essentially a cross-modal mapping from audio to video frames. The main challenge lies in the intricate one-to-many mapping, which affects lip sync accuracy. And the loss of facial details during image reconstruction often results in visual artifacts in the generated video. To overcome these challenges, this paper proposes to enhance the quality of generated talking faces with a new spatio-temporal consistency. Specifically, the temporal consistency is achieved through consecutive frames of the each phoneme, which form temporal modules that exhibit similar lip appearance changes. This allows for adaptive adjustment in the lip movement for accurate sync. The spatial consistency pertains to the uniform distribution of textures within local regions, which form spatial modules and regulate the texture distribution in the generator. This yields fine details in the reconstructed facial images. Extensive experiments show that our method can generate more natural talking faces than previous state-of-the-art methods in both accurate lip sync and realistic facial details.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FastFlow: GPU Acceleration of Flow and Depression Routing for Landscape Simulation FastFlow:GPU 加速景观仿真中的流动和凹陷路由选择
IF 2.7 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-24 DOI: 10.1111/cgf.15243
Aryamaan Jain, Bernhard Kerbl, James Gain, Brandon Finley, Guillaume Cordonnier

Terrain analysis plays an important role in computer graphics, hydrology and geomorphology. In particular, analyzing the path of material flow over a terrain with consideration of local depressions is a precursor to many further tasks in erosion, river formation, and plant ecosystem simulation. For example, fluvial erosion simulation used in terrain modeling computes water discharge to repeatedly locate erosion channels for soil removal and transport. Despite its significance, traditional methods face performance constraints, limiting their broader applicability.

In this paper, we propose a novel GPU flow routing algorithm that computes the water discharge in 𝒪(log n) iterations for a terrain with n vertices (assuming n processors). We also provide a depression routing algorithm to route the water out of local minima formed by depressions in the terrain, which converges in 𝒪(log2 n) iterations. Our implementation of these algorithms leads to a 5× speedup for flow routing and 34 × to 52 × speedup for depression routing compared to previous work on a 10242 terrain, enabling interactive control of terrain simulation.

地形分析在计算机制图、水文学和地貌学中发挥着重要作用。特别是,分析物质在地形上的流动路径并考虑局部洼地,是侵蚀、河流形成和植物生态系统模拟等许多后续工作的先导。例如,地形建模中使用的河道侵蚀模拟可以计算水的排放量,从而反复确定侵蚀通道的位置,以清除和运输土壤。在本文中,我们提出了一种新颖的 GPU 水流路由算法,该算法可在 n 个顶点的地形(假设有 n 个处理器)中以 𝒪(log n) 的迭代次数计算水的排放量。我们还提供了一种洼地路由算法,用于将水从地形洼地形成的局部极小值中路由出来,该算法在𝒪(log2 n) 次迭代中收敛。与之前在 10242 个地形上的研究相比,我们对这些算法的实现使水流路由速度提高了 5 倍,洼地路由速度提高了 34 倍到 52 倍,从而实现了对地形模拟的交互式控制。
{"title":"FastFlow: GPU Acceleration of Flow and Depression Routing for Landscape Simulation","authors":"Aryamaan Jain,&nbsp;Bernhard Kerbl,&nbsp;James Gain,&nbsp;Brandon Finley,&nbsp;Guillaume Cordonnier","doi":"10.1111/cgf.15243","DOIUrl":"https://doi.org/10.1111/cgf.15243","url":null,"abstract":"<p>Terrain analysis plays an important role in computer graphics, hydrology and geomorphology. In particular, analyzing the path of material flow over a terrain with consideration of local depressions is a precursor to many further tasks in erosion, river formation, and plant ecosystem simulation. For example, fluvial erosion simulation used in terrain modeling computes water discharge to repeatedly locate erosion channels for soil removal and transport. Despite its significance, traditional methods face performance constraints, limiting their broader applicability.</p><p>In this paper, we propose a novel GPU flow routing algorithm that computes the water discharge in 𝒪(<i>log</i> n) iterations for a terrain with n vertices (assuming n processors). We also provide a depression routing algorithm to route the water out of local minima formed by depressions in the terrain, which converges in 𝒪(<i>log</i><sup>2</sup> n) iterations. Our implementation of these algorithms leads to a 5× speedup for flow routing and 34 × to 52 × speedup for depression routing compared to previous work on a 1024<sup>2</sup> terrain, enabling interactive control of terrain simulation.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Point-AGM : Attention Guided Masked Auto-Encoder for Joint Self-supervised Learning on Point Clouds 点-AGM:用于点云联合自监督学习的注意力引导掩码自动编码器
IF 2.7 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-24 DOI: 10.1111/cgf.15219
Jie Liu, Mengna Yang, Yu Tian, Yancui Li, Da Song, Kang Li, Xin Cao

Masked point modeling (MPM) has gained considerable attention in self-supervised learning for 3D point clouds. While existing self-supervised methods have progressed in learning from point clouds, we aim to address their limitation of capturing high-level semantics through our novel attention-guided masking framework, Point-AGM. Our approach introduces an attention-guided masking mechanism that selectively masks low-attended regions, enabling the model to concentrate on reconstructing more critical areas and addressing the limitations of random and block masking strategies. Furthermore, we exploit the inherent advantages of the teacher-student network to enable cross-view contrastive learning on augmented dual-view point clouds, enforcing consistency between complete and partially masked views of the same 3D shape in the feature space. This unified framework leverages the complementary strengths of masked point modeling, attention-guided masking, and contrastive learning for robust representation learning. Extensive experiments have shown the effectiveness of our approach and its well-transferable performance across various downstream tasks. Specifically, our model achieves an accuracy of 94.12% on ModelNet40 and 87.16% on the PB-T50-RS setting of ScanObjectNN, outperforming other self-supervised learning methods.

掩蔽点建模(MPM)在三维点云的自监督学习中获得了相当大的关注。虽然现有的自监督方法在点云学习方面取得了进展,但我们的目标是通过我们新颖的注意力引导屏蔽框架 Point-AGM 解决它们在捕捉高层语义方面的局限性。我们的方法引入了一种注意力引导遮挡机制,可选择性地遮挡低注意力区域,使模型能够集中重建更关键的区域,并解决随机和块状遮挡策略的局限性。此外,我们还利用师生网络的固有优势,在增强的双视角点云上进行跨视角对比学习,确保特征空间中同一三维形状的完整视角和部分遮挡视角之间的一致性。这一统一框架充分利用了遮蔽点建模、注意力引导遮蔽和对比学习的互补优势,实现了稳健的表征学习。广泛的实验证明了我们方法的有效性及其在各种下游任务中的良好转换性能。具体来说,我们的模型在 ModelNet40 上达到了 94.12% 的准确率,在 ScanObjectNN 的 PB-T50-RS 设置上达到了 87.16%,优于其他自监督学习方法。
{"title":"Point-AGM : Attention Guided Masked Auto-Encoder for Joint Self-supervised Learning on Point Clouds","authors":"Jie Liu,&nbsp;Mengna Yang,&nbsp;Yu Tian,&nbsp;Yancui Li,&nbsp;Da Song,&nbsp;Kang Li,&nbsp;Xin Cao","doi":"10.1111/cgf.15219","DOIUrl":"https://doi.org/10.1111/cgf.15219","url":null,"abstract":"<p>Masked point modeling (MPM) has gained considerable attention in self-supervised learning for 3D point clouds. While existing self-supervised methods have progressed in learning from point clouds, we aim to address their limitation of capturing high-level semantics through our novel attention-guided masking framework, Point-AGM. Our approach introduces an attention-guided masking mechanism that selectively masks low-attended regions, enabling the model to concentrate on reconstructing more critical areas and addressing the limitations of random and block masking strategies. Furthermore, we exploit the inherent advantages of the teacher-student network to enable cross-view contrastive learning on augmented dual-view point clouds, enforcing consistency between complete and partially masked views of the same 3D shape in the feature space. This unified framework leverages the complementary strengths of masked point modeling, attention-guided masking, and contrastive learning for robust representation learning. Extensive experiments have shown the effectiveness of our approach and its well-transferable performance across various downstream tasks. Specifically, our model achieves an accuracy of 94.12% on ModelNet40 and 87.16% on the PB-T50-RS setting of ScanObjectNN, outperforming other self-supervised learning methods.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SOD-diffusion: Salient Object Detection via Diffusion-Based Image Generators SOD-diffusion:通过基于扩散的图像生成器进行突出物体检测
IF 2.7 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-24 DOI: 10.1111/cgf.15251
Shuo Zhang, Jiaming Huang, Shizhe Chen, Yan Wu, Tao Hu, Jing Liu

Salient Object Detection (SOD) is a challenging task that aims to precisely identify and segment the salient objects. However, existing SOD methods still face challenges in making explicit predictions near the edges and often lack end-to-end training capabilities. To alleviate these problems, we propose SOD-diffusion, a novel framework that formulates salient object detection as a denoising diffusion process from noisy masks to object masks. Specifically, object masks diffuse from ground-truth masks to random distribution in latent space, and the model learns to reverse this noising process to reconstruct object masks. To enhance the denoising learning process, we design an attention feature interaction module (AFIM) and a specific fine-tuning protocol to integrate conditional semantic features from the input image with diffusion noise embedding. Extensive experiments on five widely used SOD benchmark datasets demonstrate that our proposed SOD-diffusion achieves favorable performance compared to previous well-established methods. Furthermore, leveraging the outstanding generalization capability of SOD-diffusion, we applied it to publicly available images, generating high-quality masks that serve as an additional SOD benchmark testset.

突出物体检测(SOD)是一项具有挑战性的任务,旨在精确识别和分割突出物体。然而,现有的 SOD 方法在对边缘进行明确预测方面仍然面临挑战,而且往往缺乏端到端的训练能力。为了缓解这些问题,我们提出了 SOD 扩散方法,这是一种新颖的框架,它将突出物体检测表述为从噪声掩模到物体掩模的去噪扩散过程。具体来说,物体掩码从地面实况掩码扩散到潜空间的随机分布,模型学会逆转这一噪声过程以重建物体掩码。为了增强去噪学习过程,我们设计了一个注意力特征交互模块(AFIM)和一个特定的微调协议,以整合输入图像中的条件语义特征和扩散噪声嵌入。在五个广泛使用的 SOD 基准数据集上进行的广泛实验表明,与之前成熟的方法相比,我们提出的 SOD 扩散方法取得了良好的性能。此外,利用 SOD 扩散出色的泛化能力,我们将其应用于公开图像,生成了高质量的掩码,作为额外的 SOD 基准测试集。
{"title":"SOD-diffusion: Salient Object Detection via Diffusion-Based Image Generators","authors":"Shuo Zhang,&nbsp;Jiaming Huang,&nbsp;Shizhe Chen,&nbsp;Yan Wu,&nbsp;Tao Hu,&nbsp;Jing Liu","doi":"10.1111/cgf.15251","DOIUrl":"https://doi.org/10.1111/cgf.15251","url":null,"abstract":"<p>Salient Object Detection (SOD) is a challenging task that aims to precisely identify and segment the salient objects. However, existing SOD methods still face challenges in making explicit predictions near the edges and often lack end-to-end training capabilities. To alleviate these problems, we propose SOD-diffusion, a novel framework that formulates salient object detection as a denoising diffusion process from noisy masks to object masks. Specifically, object masks diffuse from ground-truth masks to random distribution in latent space, and the model learns to reverse this noising process to reconstruct object masks. To enhance the denoising learning process, we design an attention feature interaction module (AFIM) and a specific fine-tuning protocol to integrate conditional semantic features from the input image with diffusion noise embedding. Extensive experiments on five widely used SOD benchmark datasets demonstrate that our proposed SOD-diffusion achieves favorable performance compared to previous well-established methods. Furthermore, leveraging the outstanding generalization capability of SOD-diffusion, we applied it to publicly available images, generating high-quality masks that serve as an additional SOD benchmark testset.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A TransISP Based Image Enhancement Method for Visual Disbalance in Low-light Images 基于 TransISP 的弱光图像视觉失衡增强方法
IF 2.7 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-24 DOI: 10.1111/cgf.15209
Jiaqi Wu, Jing Guo, Rui Jing, Shihao Zhang, Zijian Tian, Wei Chen, Zehua Wang

Existing image enhancement algorithms often fail to effectively address issues of visual disbalance, such as brightness unevenness and color distortion, in low-light images. To overcome these challenges, we propose a TransISP-based image enhancement method specifically designed for low-light images. To mitigate color distortion, we design dual encoders based on decoupled representation learning, which enable complete decoupling of the reflection and illumination components, thereby preventing mutual interference during the image enhancement process. To address brightness unevenness, we introduce CNNformer, a hybrid model combining CNN and Transformer. This model efficiently captures local details and long-distance dependencies between pixels, contributing to the enhancement of brightness features across various local regions. Additionally, we integrate traditional image signal processing algorithms to achieve efficient color correction and denoising of the reflection component. Furthermore, we employ a generative adversarial network (GAN) as the overarching framework to facilitate unsupervised learning. The experimental results show that, compared with six SOTA image enhancement algorithms, our method obtains significant improvement in evaluation indexes (e.g., on LOL, PSNR: 15.59%, SSIM: 9.77%, VIF: 9.65%), and it can improve visual disbalance defects in low-light images captured from real-world coal mine underground scenarios.

现有的图像增强算法往往无法有效解决低照度图像中的视觉失衡问题,如亮度不均和色彩失真。为了克服这些挑战,我们提出了一种基于 TransISP 的图像增强方法,专门针对弱光图像而设计。为了减轻色彩失真,我们设计了基于解耦表示学习的双编码器,实现了反射和照明成分的完全解耦,从而防止了图像增强过程中的相互干扰。为了解决亮度不均匀问题,我们引入了 CNNformer,这是一种结合了 CNN 和 Transformer 的混合模型。该模型能有效捕捉局部细节和像素间的远距离依赖关系,有助于增强各局部区域的亮度特征。此外,我们还整合了传统的图像信号处理算法,以实现高效的色彩校正和反射成分去噪。此外,我们还采用了生成式对抗网络(GAN)作为总体框架,以促进无监督学习。实验结果表明,与六种 SOTA 图像增强算法相比,我们的方法在评价指标上获得了显著改善(例如,在 LOL 上,PSNR:15.59%,SSIM:9.77%,VIF:9.65%),并能改善真实世界煤矿井下低照度图像中的视觉失衡缺陷。
{"title":"A TransISP Based Image Enhancement Method for Visual Disbalance in Low-light Images","authors":"Jiaqi Wu,&nbsp;Jing Guo,&nbsp;Rui Jing,&nbsp;Shihao Zhang,&nbsp;Zijian Tian,&nbsp;Wei Chen,&nbsp;Zehua Wang","doi":"10.1111/cgf.15209","DOIUrl":"https://doi.org/10.1111/cgf.15209","url":null,"abstract":"<p>Existing image enhancement algorithms often fail to effectively address issues of visual disbalance, such as brightness unevenness and color distortion, in low-light images. To overcome these challenges, we propose a TransISP-based image enhancement method specifically designed for low-light images. To mitigate color distortion, we design dual encoders based on decoupled representation learning, which enable complete decoupling of the reflection and illumination components, thereby preventing mutual interference during the image enhancement process. To address brightness unevenness, we introduce CNNformer, a hybrid model combining CNN and Transformer. This model efficiently captures local details and long-distance dependencies between pixels, contributing to the enhancement of brightness features across various local regions. Additionally, we integrate traditional image signal processing algorithms to achieve efficient color correction and denoising of the reflection component. Furthermore, we employ a generative adversarial network (GAN) as the overarching framework to facilitate unsupervised learning. The experimental results show that, compared with six SOTA image enhancement algorithms, our method obtains significant improvement in evaluation indexes (e.g., on LOL, PSNR: 15.59%, SSIM: 9.77%, VIF: 9.65%), and it can improve visual disbalance defects in low-light images captured from real-world coal mine underground scenarios.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Surface Cutting and Flattening to Target Shapes 根据目标形状进行表面切割和压平
IF 2.7 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-24 DOI: 10.1111/cgf.15223
Yuanhao Li, Wenzheng Wu, Ligang Liu

We introduce a novel framework for surface cutting and flattening, aiming to align the boundary of planar parameterization with a target shape. Diverging from traditional methods focused on minimizing distortion, we intend to also achieve shape similarity between the parameterized mesh and a specific planar target, which is important in some applications of art design and texture mapping. However, with existing methods commonly limited to ellipsoidal surfaces, it still remains a challenge to solve this problem on general surfaces. Our framework models the general case as a joint optimization of cuts and parameterization, guided by a novel metric assessing shape similarity. To circumvent the common issue of local minima, we introduce an extra global seam updating strategy which is guided by the target shape. Experimental results show that our framework not only aligns with previous approaches on ellipsoidal surfaces but also achieves satisfactory results on more complex ones.

我们介绍了一种新颖的曲面切割和扁平化框架,旨在使平面参数化的边界与目标形状保持一致。有别于只关注变形最小化的传统方法,我们还打算实现参数化网格与特定平面目标之间的形状相似性,这在艺术设计和纹理映射的某些应用中非常重要。然而,现有的方法通常局限于椭圆形表面,要解决一般表面上的这一问题仍是一个挑战。我们的框架将一般情况建模为切割和参数化的联合优化,并以评估形状相似性的新指标为指导。为了规避常见的局部最小值问题,我们引入了额外的全局接缝更新策略,该策略以目标形状为导向。实验结果表明,我们的框架不仅在椭圆曲面上与之前的方法一致,而且在更复杂的曲面上也取得了令人满意的结果。
{"title":"Surface Cutting and Flattening to Target Shapes","authors":"Yuanhao Li,&nbsp;Wenzheng Wu,&nbsp;Ligang Liu","doi":"10.1111/cgf.15223","DOIUrl":"https://doi.org/10.1111/cgf.15223","url":null,"abstract":"<p>We introduce a novel framework for surface cutting and flattening, aiming to align the boundary of planar parameterization with a target shape. Diverging from traditional methods focused on minimizing distortion, we intend to also achieve shape similarity between the parameterized mesh and a specific planar target, which is important in some applications of art design and texture mapping. However, with existing methods commonly limited to ellipsoidal surfaces, it still remains a challenge to solve this problem on general surfaces. Our framework models the general case as a joint optimization of cuts and parameterization, guided by a novel metric assessing shape similarity. To circumvent the common issue of local minima, we introduce an extra global seam updating strategy which is guided by the target shape. Experimental results show that our framework not only aligns with previous approaches on ellipsoidal surfaces but also achieves satisfactory results on more complex ones.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adversarial Unsupervised Domain Adaptation for 3D Semantic Segmentation with 2D Image Fusion of Dense Depth 利用密集深度的二维图像融合进行三维语义分割的对抗性无监督领域自适应
IF 2.7 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-10-24 DOI: 10.1111/cgf.15250
Xindan Zhang, Ying Li, Huankun Sheng, Xinnian Zhang

Unsupervised domain adaptation (UDA) is increasingly used for 3D point cloud semantic segmentation tasks due to its ability to address the issue of missing labels for new domains. However, most existing unsupervised domain adaptation methods focus only on uni-modal data and are rarely applied to multi-modal data. Therefore, we propose a cross-modal UDA on multi-modal datasets that contain 3D point clouds and 2D images for 3D Semantic Segmentation. Specifically, we first propose a Dual discriminator-based Domain Adaptation (Dd-bDA) module to enhance the adaptability of different domains. Second, given that the robustness of depth information to domain shifts can provide more details for semantic segmentation, we further employ a Dense depth Feature Fusion (DdFF) module to extract image features with rich depth cues. We evaluate our model in four unsupervised domain adaptation scenarios, i.e., dataset-to-dataset (A2D2 → SemanticKITTI), Day-to-Night, country-to-country (USA → Singapore), and synthetic-to-real (VirtualKITTI → SemanticKITTI). In all settings, the experimental results achieve significant improvements and surpass state-of-the-art models.

无监督领域适应(UDA)由于能够解决新领域标签缺失的问题,越来越多地被用于三维点云语义分割任务。然而,现有的大多数无监督域适应方法只关注单模态数据,很少应用于多模态数据。因此,我们在包含三维点云和二维图像的多模态数据集上提出了一种用于三维语义分割的跨模态 UDA。具体来说,我们首先提出了基于双判别器的领域适应(Dd-bDA)模块,以增强不同领域的适应性。其次,鉴于深度信息对域偏移的鲁棒性可以为语义分割提供更多细节,我们进一步采用了密集深度特征融合(DdFF)模块来提取具有丰富深度线索的图像特征。我们在四种无监督领域适应场景中评估了我们的模型,即数据集到数据集(A2D2 → SemanticKITTI)、白天到黑夜、国家到国家(美国 → 新加坡)以及合成到真实(VirtualKITTI → SemanticKITTI)。在所有设置中,实验结果都取得了显著的改进,并超越了最先进的模型。
{"title":"Adversarial Unsupervised Domain Adaptation for 3D Semantic Segmentation with 2D Image Fusion of Dense Depth","authors":"Xindan Zhang,&nbsp;Ying Li,&nbsp;Huankun Sheng,&nbsp;Xinnian Zhang","doi":"10.1111/cgf.15250","DOIUrl":"https://doi.org/10.1111/cgf.15250","url":null,"abstract":"<p>Unsupervised domain adaptation (UDA) is increasingly used for 3D point cloud semantic segmentation tasks due to its ability to address the issue of missing labels for new domains. However, most existing unsupervised domain adaptation methods focus only on uni-modal data and are rarely applied to multi-modal data. Therefore, we propose a cross-modal UDA on multi-modal datasets that contain 3D point clouds and 2D images for 3D Semantic Segmentation. Specifically, we first propose a Dual discriminator-based Domain Adaptation (Dd-bDA) module to enhance the adaptability of different domains. Second, given that the robustness of depth information to domain shifts can provide more details for semantic segmentation, we further employ a Dense depth Feature Fusion (DdFF) module to extract image features with rich depth cues. We evaluate our model in four unsupervised domain adaptation scenarios, i.e., dataset-to-dataset (A2D2 → SemanticKITTI), Day-to-Night, country-to-country (USA → Singapore), and synthetic-to-real (VirtualKITTI → SemanticKITTI). In all settings, the experimental results achieve significant improvements and surpass state-of-the-art models.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Graphics Forum
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1