Indoor scene point clouds exhibit diverse distributions and varying levels of sparsity, characterized by more intricate geometry and occlusion compared to outdoor scenes or individual objects. Despite recent advancements in 3D point cloud analysis introducing various network architectures, there remains a lack of frameworks tailored to the unique attributes of indoor scenarios. To address this, we propose DSGI-Net, a novel indoor scene point cloud learning network that can be integrated into existing models. The key innovation of this work is selectively grouping more informative neighbor points in sparse regions and promoting semantic consistency of the local area where different instances are in proximity but belong to distinct categories. Furthermore, our method encodes both semantic and spatial relationships between points in local regions to reduce the loss of local geometric details. Extensive experiments on the ScanNetv2, SUN RGB-D, and S3DIS indoor scene benchmarks demonstrate that our method is straightforward yet effective.
{"title":"DSGI-Net: Density-based Selective Grouping Point Cloud Learning Network for Indoor Scene","authors":"Xin Wen, Yao Duan, Kai Xu, Chenyang Zhu","doi":"10.1111/cgf.15218","DOIUrl":"https://doi.org/10.1111/cgf.15218","url":null,"abstract":"<p>Indoor scene point clouds exhibit diverse distributions and varying levels of sparsity, characterized by more intricate geometry and occlusion compared to outdoor scenes or individual objects. Despite recent advancements in 3D point cloud analysis introducing various network architectures, there remains a lack of frameworks tailored to the unique attributes of indoor scenarios. To address this, we propose DSGI-Net, a novel indoor scene point cloud learning network that can be integrated into existing models. The key innovation of this work is selectively grouping more informative neighbor points in sparse regions and promoting semantic consistency of the local area where different instances are in proximity but belong to distinct categories. Furthermore, our method encodes both semantic and spatial relationships between points in local regions to reduce the loss of local geometric details. Extensive experiments on the ScanNetv2, SUN RGB-D, and S3DIS indoor scene benchmarks demonstrate that our method is straightforward yet effective.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cities are constantly changing to adapt to new societal and environmental challenges. Understanding their evolution is thus essential to make informed decisions about their future. To capture these changes, cities are increasingly offering digital 3D snapshots of their territory over time. However, existing tools to visualise these data typically represent the city at a specific point in time, limiting a comprehensive analysis of its evolution. In this paper, we propose a new method for simultaneously visualising different versions of the city in a 3D space. We integrate the different versions of the city along a new way of 3D timeline that can take different shapes depending on the needs of the user and the dataset being visualised. We propose four different shapes of timelines and three ways to place the versions along it. Our method places the versions such that there is no visual overlap for the user by varying the parameters of the timelines, and offer options to ease the understanding of the scene by changing the orientation or scale of the versions. We evaluate our method on different datasets to demonstrate the advantages and limitations of the different shapes of timeline and provide recommendations so as to which shape to chose.
{"title":"Evolutive 3D Urban Data Representation through Timeline Design Space","authors":"C. Le Bihan Gautier, J. Delanoy, G. Gesquière","doi":"10.1111/cgf.15237","DOIUrl":"https://doi.org/10.1111/cgf.15237","url":null,"abstract":"<p>Cities are constantly changing to adapt to new societal and environmental challenges. Understanding their evolution is thus essential to make informed decisions about their future. To capture these changes, cities are increasingly offering digital 3D snapshots of their territory over time. However, existing tools to visualise these data typically represent the city at a specific point in time, limiting a comprehensive analysis of its evolution. In this paper, we propose a new method for simultaneously visualising different versions of the city in a 3D space. We integrate the different versions of the city along a new way of 3D timeline that can take different shapes depending on the needs of the user and the dataset being visualised. We propose four different shapes of timelines and three ways to place the versions along it. Our method places the versions such that there is no visual overlap for the user by varying the parameters of the timelines, and offer options to ease the understanding of the scene by changing the orientation or scale of the versions. We evaluate our method on different datasets to demonstrate the advantages and limitations of the different shapes of timeline and provide recommendations so as to which shape to chose.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing facial image shadow removal methods predominantly rely on pre-extracted facial features. However, these methods often fail to capitalize on the full potential of these features, resorting to simplified utilization. Furthermore, they tend to overlook the importance of low-frequency information during the extraction of prior features, which can be easily compromised by noises. In our work, we propose a frequency-aware shadow removal network (FSRNet) for facial image shadow removal, which utilizes the skin color and texture information in the face to help recover illumination in shadow regions. Our FSRNet uses a frequency-domain image decomposition network to extract the low-frequency skin color map and high-frequency texture map from the face images, and applies a color-texture guided shadow removal network to produce final shadow removal result. Concretely, the designed fourier sparse attention block (FSABlock) can transform images from the spatial domain to the frequency domain and help the network focus on the key information. We also introduce a skin color fusion module (CFModule) and a texture fusion module (TFModule) to enhance the understanding and utilization of color and texture features, promoting high-quality result without color distortion and detail blurring. Extensive experiments demonstrate the superiority of the proposed method. The code is available at https://github.com/laoxie521/FSRNet.
{"title":"Frequency-Aware Facial Image Shadow Removal through Skin Color and Texture Learning","authors":"Ling Zhang, Wenyang Xie, Chunxia Xiao","doi":"10.1111/cgf.15220","DOIUrl":"https://doi.org/10.1111/cgf.15220","url":null,"abstract":"<p>Existing facial image shadow removal methods predominantly rely on pre-extracted facial features. However, these methods often fail to capitalize on the full potential of these features, resorting to simplified utilization. Furthermore, they tend to overlook the importance of low-frequency information during the extraction of prior features, which can be easily compromised by noises. In our work, we propose a frequency-aware shadow removal network (FSRNet) for facial image shadow removal, which utilizes the skin color and texture information in the face to help recover illumination in shadow regions. Our FSRNet uses a frequency-domain image decomposition network to extract the low-frequency skin color map and high-frequency texture map from the face images, and applies a color-texture guided shadow removal network to produce final shadow removal result. Concretely, the designed fourier sparse attention block (FSABlock) can transform images from the spatial domain to the frequency domain and help the network focus on the key information. We also introduce a skin color fusion module (CFModule) and a texture fusion module (TFModule) to enhance the understanding and utilization of color and texture features, promoting high-quality result without color distortion and detail blurring. Extensive experiments demonstrate the superiority of the proposed method. The code is available at https://github.com/laoxie521/FSRNet.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Audio-driven talking face generation is essentially a cross-modal mapping from audio to video frames. The main challenge lies in the intricate one-to-many mapping, which affects lip sync accuracy. And the loss of facial details during image reconstruction often results in visual artifacts in the generated video. To overcome these challenges, this paper proposes to enhance the quality of generated talking faces with a new spatio-temporal consistency. Specifically, the temporal consistency is achieved through consecutive frames of the each phoneme, which form temporal modules that exhibit similar lip appearance changes. This allows for adaptive adjustment in the lip movement for accurate sync. The spatial consistency pertains to the uniform distribution of textures within local regions, which form spatial modules and regulate the texture distribution in the generator. This yields fine details in the reconstructed facial images. Extensive experiments show that our method can generate more natural talking faces than previous state-of-the-art methods in both accurate lip sync and realistic facial details.
{"title":"Spatially and Temporally Optimized Audio-Driven Talking Face Generation","authors":"Biao Dong, Bo-Yao Ma, Lei Zhang","doi":"10.1111/cgf.15228","DOIUrl":"https://doi.org/10.1111/cgf.15228","url":null,"abstract":"<p>Audio-driven talking face generation is essentially a cross-modal mapping from audio to video frames. The main challenge lies in the intricate one-to-many mapping, which affects lip sync accuracy. And the loss of facial details during image reconstruction often results in visual artifacts in the generated video. To overcome these challenges, this paper proposes to enhance the quality of generated talking faces with a new spatio-temporal consistency. Specifically, the temporal consistency is achieved through consecutive frames of the each phoneme, which form temporal modules that exhibit similar lip appearance changes. This allows for adaptive adjustment in the lip movement for accurate sync. The spatial consistency pertains to the uniform distribution of textures within local regions, which form spatial modules and regulate the texture distribution in the generator. This yields fine details in the reconstructed facial images. Extensive experiments show that our method can generate more natural talking faces than previous state-of-the-art methods in both accurate lip sync and realistic facial details.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aryamaan Jain, Bernhard Kerbl, James Gain, Brandon Finley, Guillaume Cordonnier
Terrain analysis plays an important role in computer graphics, hydrology and geomorphology. In particular, analyzing the path of material flow over a terrain with consideration of local depressions is a precursor to many further tasks in erosion, river formation, and plant ecosystem simulation. For example, fluvial erosion simulation used in terrain modeling computes water discharge to repeatedly locate erosion channels for soil removal and transport. Despite its significance, traditional methods face performance constraints, limiting their broader applicability.
In this paper, we propose a novel GPU flow routing algorithm that computes the water discharge in 𝒪(log n) iterations for a terrain with n vertices (assuming n processors). We also provide a depression routing algorithm to route the water out of local minima formed by depressions in the terrain, which converges in 𝒪(log2 n) iterations. Our implementation of these algorithms leads to a 5× speedup for flow routing and 34 × to 52 × speedup for depression routing compared to previous work on a 10242 terrain, enabling interactive control of terrain simulation.
地形分析在计算机制图、水文学和地貌学中发挥着重要作用。特别是,分析物质在地形上的流动路径并考虑局部洼地,是侵蚀、河流形成和植物生态系统模拟等许多后续工作的先导。例如,地形建模中使用的河道侵蚀模拟可以计算水的排放量,从而反复确定侵蚀通道的位置,以清除和运输土壤。在本文中,我们提出了一种新颖的 GPU 水流路由算法,该算法可在 n 个顶点的地形(假设有 n 个处理器)中以 𝒪(log n) 的迭代次数计算水的排放量。我们还提供了一种洼地路由算法,用于将水从地形洼地形成的局部极小值中路由出来,该算法在𝒪(log2 n) 次迭代中收敛。与之前在 10242 个地形上的研究相比,我们对这些算法的实现使水流路由速度提高了 5 倍,洼地路由速度提高了 34 倍到 52 倍,从而实现了对地形模拟的交互式控制。
{"title":"FastFlow: GPU Acceleration of Flow and Depression Routing for Landscape Simulation","authors":"Aryamaan Jain, Bernhard Kerbl, James Gain, Brandon Finley, Guillaume Cordonnier","doi":"10.1111/cgf.15243","DOIUrl":"https://doi.org/10.1111/cgf.15243","url":null,"abstract":"<p>Terrain analysis plays an important role in computer graphics, hydrology and geomorphology. In particular, analyzing the path of material flow over a terrain with consideration of local depressions is a precursor to many further tasks in erosion, river formation, and plant ecosystem simulation. For example, fluvial erosion simulation used in terrain modeling computes water discharge to repeatedly locate erosion channels for soil removal and transport. Despite its significance, traditional methods face performance constraints, limiting their broader applicability.</p><p>In this paper, we propose a novel GPU flow routing algorithm that computes the water discharge in 𝒪(<i>log</i> n) iterations for a terrain with n vertices (assuming n processors). We also provide a depression routing algorithm to route the water out of local minima formed by depressions in the terrain, which converges in 𝒪(<i>log</i><sup>2</sup> n) iterations. Our implementation of these algorithms leads to a 5× speedup for flow routing and 34 × to 52 × speedup for depression routing compared to previous work on a 1024<sup>2</sup> terrain, enabling interactive control of terrain simulation.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Liu, Mengna Yang, Yu Tian, Yancui Li, Da Song, Kang Li, Xin Cao
Masked point modeling (MPM) has gained considerable attention in self-supervised learning for 3D point clouds. While existing self-supervised methods have progressed in learning from point clouds, we aim to address their limitation of capturing high-level semantics through our novel attention-guided masking framework, Point-AGM. Our approach introduces an attention-guided masking mechanism that selectively masks low-attended regions, enabling the model to concentrate on reconstructing more critical areas and addressing the limitations of random and block masking strategies. Furthermore, we exploit the inherent advantages of the teacher-student network to enable cross-view contrastive learning on augmented dual-view point clouds, enforcing consistency between complete and partially masked views of the same 3D shape in the feature space. This unified framework leverages the complementary strengths of masked point modeling, attention-guided masking, and contrastive learning for robust representation learning. Extensive experiments have shown the effectiveness of our approach and its well-transferable performance across various downstream tasks. Specifically, our model achieves an accuracy of 94.12% on ModelNet40 and 87.16% on the PB-T50-RS setting of ScanObjectNN, outperforming other self-supervised learning methods.
{"title":"Point-AGM : Attention Guided Masked Auto-Encoder for Joint Self-supervised Learning on Point Clouds","authors":"Jie Liu, Mengna Yang, Yu Tian, Yancui Li, Da Song, Kang Li, Xin Cao","doi":"10.1111/cgf.15219","DOIUrl":"https://doi.org/10.1111/cgf.15219","url":null,"abstract":"<p>Masked point modeling (MPM) has gained considerable attention in self-supervised learning for 3D point clouds. While existing self-supervised methods have progressed in learning from point clouds, we aim to address their limitation of capturing high-level semantics through our novel attention-guided masking framework, Point-AGM. Our approach introduces an attention-guided masking mechanism that selectively masks low-attended regions, enabling the model to concentrate on reconstructing more critical areas and addressing the limitations of random and block masking strategies. Furthermore, we exploit the inherent advantages of the teacher-student network to enable cross-view contrastive learning on augmented dual-view point clouds, enforcing consistency between complete and partially masked views of the same 3D shape in the feature space. This unified framework leverages the complementary strengths of masked point modeling, attention-guided masking, and contrastive learning for robust representation learning. Extensive experiments have shown the effectiveness of our approach and its well-transferable performance across various downstream tasks. Specifically, our model achieves an accuracy of 94.12% on ModelNet40 and 87.16% on the PB-T50-RS setting of ScanObjectNN, outperforming other self-supervised learning methods.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuo Zhang, Jiaming Huang, Shizhe Chen, Yan Wu, Tao Hu, Jing Liu
Salient Object Detection (SOD) is a challenging task that aims to precisely identify and segment the salient objects. However, existing SOD methods still face challenges in making explicit predictions near the edges and often lack end-to-end training capabilities. To alleviate these problems, we propose SOD-diffusion, a novel framework that formulates salient object detection as a denoising diffusion process from noisy masks to object masks. Specifically, object masks diffuse from ground-truth masks to random distribution in latent space, and the model learns to reverse this noising process to reconstruct object masks. To enhance the denoising learning process, we design an attention feature interaction module (AFIM) and a specific fine-tuning protocol to integrate conditional semantic features from the input image with diffusion noise embedding. Extensive experiments on five widely used SOD benchmark datasets demonstrate that our proposed SOD-diffusion achieves favorable performance compared to previous well-established methods. Furthermore, leveraging the outstanding generalization capability of SOD-diffusion, we applied it to publicly available images, generating high-quality masks that serve as an additional SOD benchmark testset.
{"title":"SOD-diffusion: Salient Object Detection via Diffusion-Based Image Generators","authors":"Shuo Zhang, Jiaming Huang, Shizhe Chen, Yan Wu, Tao Hu, Jing Liu","doi":"10.1111/cgf.15251","DOIUrl":"https://doi.org/10.1111/cgf.15251","url":null,"abstract":"<p>Salient Object Detection (SOD) is a challenging task that aims to precisely identify and segment the salient objects. However, existing SOD methods still face challenges in making explicit predictions near the edges and often lack end-to-end training capabilities. To alleviate these problems, we propose SOD-diffusion, a novel framework that formulates salient object detection as a denoising diffusion process from noisy masks to object masks. Specifically, object masks diffuse from ground-truth masks to random distribution in latent space, and the model learns to reverse this noising process to reconstruct object masks. To enhance the denoising learning process, we design an attention feature interaction module (AFIM) and a specific fine-tuning protocol to integrate conditional semantic features from the input image with diffusion noise embedding. Extensive experiments on five widely used SOD benchmark datasets demonstrate that our proposed SOD-diffusion achieves favorable performance compared to previous well-established methods. Furthermore, leveraging the outstanding generalization capability of SOD-diffusion, we applied it to publicly available images, generating high-quality masks that serve as an additional SOD benchmark testset.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing image enhancement algorithms often fail to effectively address issues of visual disbalance, such as brightness unevenness and color distortion, in low-light images. To overcome these challenges, we propose a TransISP-based image enhancement method specifically designed for low-light images. To mitigate color distortion, we design dual encoders based on decoupled representation learning, which enable complete decoupling of the reflection and illumination components, thereby preventing mutual interference during the image enhancement process. To address brightness unevenness, we introduce CNNformer, a hybrid model combining CNN and Transformer. This model efficiently captures local details and long-distance dependencies between pixels, contributing to the enhancement of brightness features across various local regions. Additionally, we integrate traditional image signal processing algorithms to achieve efficient color correction and denoising of the reflection component. Furthermore, we employ a generative adversarial network (GAN) as the overarching framework to facilitate unsupervised learning. The experimental results show that, compared with six SOTA image enhancement algorithms, our method obtains significant improvement in evaluation indexes (e.g., on LOL, PSNR: 15.59%, SSIM: 9.77%, VIF: 9.65%), and it can improve visual disbalance defects in low-light images captured from real-world coal mine underground scenarios.
{"title":"A TransISP Based Image Enhancement Method for Visual Disbalance in Low-light Images","authors":"Jiaqi Wu, Jing Guo, Rui Jing, Shihao Zhang, Zijian Tian, Wei Chen, Zehua Wang","doi":"10.1111/cgf.15209","DOIUrl":"https://doi.org/10.1111/cgf.15209","url":null,"abstract":"<p>Existing image enhancement algorithms often fail to effectively address issues of visual disbalance, such as brightness unevenness and color distortion, in low-light images. To overcome these challenges, we propose a TransISP-based image enhancement method specifically designed for low-light images. To mitigate color distortion, we design dual encoders based on decoupled representation learning, which enable complete decoupling of the reflection and illumination components, thereby preventing mutual interference during the image enhancement process. To address brightness unevenness, we introduce CNNformer, a hybrid model combining CNN and Transformer. This model efficiently captures local details and long-distance dependencies between pixels, contributing to the enhancement of brightness features across various local regions. Additionally, we integrate traditional image signal processing algorithms to achieve efficient color correction and denoising of the reflection component. Furthermore, we employ a generative adversarial network (GAN) as the overarching framework to facilitate unsupervised learning. The experimental results show that, compared with six SOTA image enhancement algorithms, our method obtains significant improvement in evaluation indexes (e.g., on LOL, PSNR: 15.59%, SSIM: 9.77%, VIF: 9.65%), and it can improve visual disbalance defects in low-light images captured from real-world coal mine underground scenarios.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce a novel framework for surface cutting and flattening, aiming to align the boundary of planar parameterization with a target shape. Diverging from traditional methods focused on minimizing distortion, we intend to also achieve shape similarity between the parameterized mesh and a specific planar target, which is important in some applications of art design and texture mapping. However, with existing methods commonly limited to ellipsoidal surfaces, it still remains a challenge to solve this problem on general surfaces. Our framework models the general case as a joint optimization of cuts and parameterization, guided by a novel metric assessing shape similarity. To circumvent the common issue of local minima, we introduce an extra global seam updating strategy which is guided by the target shape. Experimental results show that our framework not only aligns with previous approaches on ellipsoidal surfaces but also achieves satisfactory results on more complex ones.
{"title":"Surface Cutting and Flattening to Target Shapes","authors":"Yuanhao Li, Wenzheng Wu, Ligang Liu","doi":"10.1111/cgf.15223","DOIUrl":"https://doi.org/10.1111/cgf.15223","url":null,"abstract":"<p>We introduce a novel framework for surface cutting and flattening, aiming to align the boundary of planar parameterization with a target shape. Diverging from traditional methods focused on minimizing distortion, we intend to also achieve shape similarity between the parameterized mesh and a specific planar target, which is important in some applications of art design and texture mapping. However, with existing methods commonly limited to ellipsoidal surfaces, it still remains a challenge to solve this problem on general surfaces. Our framework models the general case as a joint optimization of cuts and parameterization, guided by a novel metric assessing shape similarity. To circumvent the common issue of local minima, we introduce an extra global seam updating strategy which is guided by the target shape. Experimental results show that our framework not only aligns with previous approaches on ellipsoidal surfaces but also achieves satisfactory results on more complex ones.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unsupervised domain adaptation (UDA) is increasingly used for 3D point cloud semantic segmentation tasks due to its ability to address the issue of missing labels for new domains. However, most existing unsupervised domain adaptation methods focus only on uni-modal data and are rarely applied to multi-modal data. Therefore, we propose a cross-modal UDA on multi-modal datasets that contain 3D point clouds and 2D images for 3D Semantic Segmentation. Specifically, we first propose a Dual discriminator-based Domain Adaptation (Dd-bDA) module to enhance the adaptability of different domains. Second, given that the robustness of depth information to domain shifts can provide more details for semantic segmentation, we further employ a Dense depth Feature Fusion (DdFF) module to extract image features with rich depth cues. We evaluate our model in four unsupervised domain adaptation scenarios, i.e., dataset-to-dataset (A2D2 → SemanticKITTI), Day-to-Night, country-to-country (USA → Singapore), and synthetic-to-real (VirtualKITTI → SemanticKITTI). In all settings, the experimental results achieve significant improvements and surpass state-of-the-art models.
{"title":"Adversarial Unsupervised Domain Adaptation for 3D Semantic Segmentation with 2D Image Fusion of Dense Depth","authors":"Xindan Zhang, Ying Li, Huankun Sheng, Xinnian Zhang","doi":"10.1111/cgf.15250","DOIUrl":"https://doi.org/10.1111/cgf.15250","url":null,"abstract":"<p>Unsupervised domain adaptation (UDA) is increasingly used for 3D point cloud semantic segmentation tasks due to its ability to address the issue of missing labels for new domains. However, most existing unsupervised domain adaptation methods focus only on uni-modal data and are rarely applied to multi-modal data. Therefore, we propose a cross-modal UDA on multi-modal datasets that contain 3D point clouds and 2D images for 3D Semantic Segmentation. Specifically, we first propose a Dual discriminator-based Domain Adaptation (Dd-bDA) module to enhance the adaptability of different domains. Second, given that the robustness of depth information to domain shifts can provide more details for semantic segmentation, we further employ a Dense depth Feature Fusion (DdFF) module to extract image features with rich depth cues. We evaluate our model in four unsupervised domain adaptation scenarios, i.e., dataset-to-dataset (A2D2 → SemanticKITTI), Day-to-Night, country-to-country (USA → Singapore), and synthetic-to-real (VirtualKITTI → SemanticKITTI). In all settings, the experimental results achieve significant improvements and surpass state-of-the-art models.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 7","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}