Pub Date : 2024-10-09DOI: 10.1016/j.cag.2024.104103
Jiamin Cheng, Li Wang, Lianghao Zhang, Fangzhou Gao, Jiawan Zhang
In this paper, we address the task of estimating spatially-varying bi-directional reflectance distribution functions (SVBRDF) of a near-planar surface from a single flash-lit image. Disentangling SVBRDF from the material appearance by deep learning has proven a formidable challenge. This difficulty is particularly pronounced when dealing with images lit by a point light source because the uneven distribution of irradiance in the scene interacts with the surface, leading to significant global luminance variations across the image. These variations may be overemphasized by the network and wrongly baked into the material property space. To tackle this issue, we propose a high-frequency path that contains an auto-adaptive subband “knob”. This path aims to extract crucial image textures and details while eliminating global luminance variations present in the original image. Furthermore, recognizing that color information is ignored in this path, we design a two-path strategy to jointly estimate material reflectance from both the high-frequency path and the original image. Extensive experiments on a substantial dataset have confirmed the effectiveness of our method. Our method outperforms state-of-the-art methods across a wide range of materials.
{"title":"Single-image SVBRDF estimation with auto-adaptive high-frequency feature extraction","authors":"Jiamin Cheng, Li Wang, Lianghao Zhang, Fangzhou Gao, Jiawan Zhang","doi":"10.1016/j.cag.2024.104103","DOIUrl":"10.1016/j.cag.2024.104103","url":null,"abstract":"<div><div>In this paper, we address the task of estimating spatially-varying bi-directional reflectance distribution functions (SVBRDF) of a near-planar surface from a single flash-lit image. Disentangling SVBRDF from the material appearance by deep learning has proven a formidable challenge. This difficulty is particularly pronounced when dealing with images lit by a point light source because the uneven distribution of irradiance in the scene interacts with the surface, leading to significant global luminance variations across the image. These variations may be overemphasized by the network and wrongly baked into the material property space. To tackle this issue, we propose a high-frequency path that contains an auto-adaptive subband “knob”. This path aims to extract crucial image textures and details while eliminating global luminance variations present in the original image. Furthermore, recognizing that color information is ignored in this path, we design a two-path strategy to jointly estimate material reflectance from both the high-frequency path and the original image. Extensive experiments on a substantial dataset have confirmed the effectiveness of our method. Our method outperforms state-of-the-art methods across a wide range of materials.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104103"},"PeriodicalIF":2.5,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142533416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-05DOI: 10.1016/j.cag.2024.104101
Tianfang Lin , Zhongyuan Yu , Matthew McGinity , Stefan Gumhold
3D point clouds, such as those produced by 3D scanners, often require labeling – the accurate classification of each point into structural or semantic categories – before they can be used in their intended application. However, in the absence of fully automated methods, such labeling must be performed manually, which can prove extremely time and labor intensive. To address this we present a virtual reality tool for accelerating and improving the manual labeling of very large 3D point clouds. The labeling tool provides a variety of 3D interactions for efficient viewing, selection and labeling of points using the controllers of consumer VR-kits. The main contribution of our work is a mixed CPU/GPU-based data structure that supports rendering, selection and labeling with immediate visual feedback at high frame rates necessary for a convenient VR experience. Our mixed CPU/GPU data structure supports fluid interaction with very large point clouds in VR, what is not possible with existing continuous level-of-detail rendering algorithms. We evaluate our method with 25 users on tasks involving point clouds of up to 50 million points and find convincing results that support the case for VR-based point cloud labeling.
{"title":"An immersive labeling method for large point clouds","authors":"Tianfang Lin , Zhongyuan Yu , Matthew McGinity , Stefan Gumhold","doi":"10.1016/j.cag.2024.104101","DOIUrl":"10.1016/j.cag.2024.104101","url":null,"abstract":"<div><div>3D point clouds, such as those produced by 3D scanners, often require labeling – the accurate classification of each point into structural or semantic categories – before they can be used in their intended application. However, in the absence of fully automated methods, such labeling must be performed manually, which can prove extremely time and labor intensive. To address this we present a virtual reality tool for accelerating and improving the manual labeling of very large 3D point clouds. The labeling tool provides a variety of 3D interactions for efficient viewing, selection and labeling of points using the controllers of consumer VR-kits. The main contribution of our work is a mixed CPU/GPU-based data structure that supports rendering, selection and labeling with immediate visual feedback at high frame rates necessary for a convenient VR experience. Our mixed CPU/GPU data structure supports fluid interaction with very large point clouds in VR, what is not possible with existing continuous level-of-detail rendering algorithms. We evaluate our method with 25 users on tasks involving point clouds of up to 50 million points and find convincing results that support the case for VR-based point cloud labeling.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104101"},"PeriodicalIF":2.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-05DOI: 10.1016/j.cag.2024.104102
Yu Miao, Yue Liu
Vision-based hand reconstructions have become noteworthy tools in enhancing interactive experiences in various applications such as virtual reality, augmented reality, and autonomous driving, which enable sophisticated interactions by reconstructing complex motions of human hands. Despite significant progress driven by deep-learning methodologies, the quest for high-fidelity interacting hands reconstruction faces challenges such as limited dataset diversity, lack of detailed hand representation, occlusions, and differentiation between similar hand structures. This survey thoroughly reviews deep learning-based methods, diverse datasets, loss functions, and evaluation metrics addressing the complexities of interacting hands reconstruction. Mainstream algorithms of the past five years are systematically classified into two main categories: algorithms that employ explicit representations, such as parametric meshes and 3D Gaussian splatting, and those that utilize implicit representations, including signed distance fields and neural radiance fields. Novel deep-learning models like graph convolutional networks and transformers are applied to solve the aforementioned challenges in hand reconstruction effectively. Beyond summarizing these interaction-aware algorithms, this survey also briefly discusses hand tracking in virtual reality and augmented reality. To the best of our knowledge, this is the first survey specifically focusing on the reconstruction of both hands and their interactions with objects. The survey contains the various facets of hand modeling, deep learning approaches, and datasets, broadening the horizon of hand reconstruction research and future innovation in natural user interactions.
{"title":"Advances in vision-based deep learning methods for interacting hands reconstruction: A survey","authors":"Yu Miao, Yue Liu","doi":"10.1016/j.cag.2024.104102","DOIUrl":"10.1016/j.cag.2024.104102","url":null,"abstract":"<div><div>Vision-based hand reconstructions have become noteworthy tools in enhancing interactive experiences in various applications such as virtual reality, augmented reality, and autonomous driving, which enable sophisticated interactions by reconstructing complex motions of human hands. Despite significant progress driven by deep-learning methodologies, the quest for high-fidelity interacting hands reconstruction faces challenges such as limited dataset diversity, lack of detailed hand representation, occlusions, and differentiation between similar hand structures. This survey thoroughly reviews deep learning-based methods, diverse datasets, loss functions, and evaluation metrics addressing the complexities of interacting hands reconstruction. Mainstream algorithms of the past five years are systematically classified into two main categories: algorithms that employ explicit representations, such as parametric meshes and 3D Gaussian splatting, and those that utilize implicit representations, including signed distance fields and neural radiance fields. Novel deep-learning models like graph convolutional networks and transformers are applied to solve the aforementioned challenges in hand reconstruction effectively. Beyond summarizing these interaction-aware algorithms, this survey also briefly discusses hand tracking in virtual reality and augmented reality. To the best of our knowledge, this is the first survey specifically focusing on the reconstruction of both hands and their interactions with objects. The survey contains the various facets of hand modeling, deep learning approaches, and datasets, broadening the horizon of hand reconstruction research and future innovation in natural user interactions.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104102"},"PeriodicalIF":2.5,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-04DOI: 10.1016/j.cag.2024.104099
A. Phillips , J. Lang , D. Mould
Capturing non-local, long range features present in non-homogeneous textures is difficult to achieve with existing techniques. We introduce a new training method and architecture for single-exemplar texture synthesis that combines a Generative Adversarial Network (GAN) and a Variational Autoencoder (VAE). In the proposed architecture, the combined networks share information during training via structurally identical, independent blocks, facilitating highly diverse texture variations from a single image exemplar. Supporting this training method, we also include a similarity loss term that further encourages diverse output while also improving the overall quality. Using our approach, it is possible to produce diverse results over the entire sample size taken from a single model that can be trained in approximately 15 min. We show that our approach obtains superior performance when compared to SOTA texture synthesis methods and single image GAN methods using standard diversity and quality metrics.
现有技术难以捕捉非同质纹理中的非局部、长距离特征。我们为单例纹理合成引入了一种新的训练方法和架构,它结合了生成对抗网络(GAN)和变异自动编码器(VAE)。在所提出的架构中,组合网络在训练过程中通过结构相同的独立块共享信息,从而促进单个图像示例的纹理变化高度多样化。为了支持这种训练方法,我们还加入了一个相似性损失项,在提高整体质量的同时,进一步鼓励多样化的输出。使用我们的方法,可以在大约 15 分钟的时间内,通过一个单一模型的训练,在整个样本大小上产生多样化的结果。我们的研究表明,与 SOTA 纹理合成方法和使用标准多样性和质量指标的单图像 GAN 方法相比,我们的方法具有更优越的性能。
{"title":"Diverse non-homogeneous texture synthesis from a single exemplar","authors":"A. Phillips , J. Lang , D. Mould","doi":"10.1016/j.cag.2024.104099","DOIUrl":"10.1016/j.cag.2024.104099","url":null,"abstract":"<div><div>Capturing non-local, long range features present in non-homogeneous textures is difficult to achieve with existing techniques. We introduce a new training method and architecture for single-exemplar texture synthesis that combines a Generative Adversarial Network (GAN) and a Variational Autoencoder (VAE). In the proposed architecture, the combined networks share information during training via structurally identical, independent blocks, facilitating highly diverse texture variations from a single image exemplar. Supporting this training method, we also include a similarity loss term that further encourages diverse output while also improving the overall quality. Using our approach, it is possible to produce diverse results over the entire sample size taken from a single model that can be trained in approximately 15 min. We show that our approach obtains superior performance when compared to SOTA texture synthesis methods and single image GAN methods using standard diversity and quality metrics.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104099"},"PeriodicalIF":2.5,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Implicit neural representations (INRs) have emerged as a promising framework for representing signals in low-dimensional spaces. This survey reviews the existing literature on the specialized INR problem of approximating signed distance functions (SDFs) for surface scenes, using either oriented point clouds or a set of posed images. We refer to neural SDFs that incorporate differential geometry tools, such as normals and curvatures, in their loss functions as geometric INRs. The key idea behind this 3D reconstruction approach is to include additional regularization terms in the loss function, ensuring that the INR satisfies certain global properties that the function should hold — such as having unit gradient in the case of SDFs. We explore key methodological components, including the definition of INR, the construction of geometric loss functions, and sampling schemes from a differential geometry perspective. Our review highlights the significant advancements enabled by geometric INRs in surface reconstruction from oriented point clouds and posed images.
{"title":"Geometric implicit neural representations for signed distance functions","authors":"Luiz Schirmer , Tiago Novello , Vinícius da Silva , Guilherme Schardong , Daniel Perazzo , Hélio Lopes , Nuno Gonçalves , Luiz Velho","doi":"10.1016/j.cag.2024.104085","DOIUrl":"10.1016/j.cag.2024.104085","url":null,"abstract":"<div><div><em>Implicit neural representations</em> (INRs) have emerged as a promising framework for representing signals in low-dimensional spaces. This survey reviews the existing literature on the specialized INR problem of approximating <em>signed distance functions</em> (SDFs) for surface scenes, using either oriented point clouds or a set of posed images. We refer to neural SDFs that incorporate differential geometry tools, such as normals and curvatures, in their loss functions as <em>geometric</em> INRs. The key idea behind this 3D reconstruction approach is to include additional <em>regularization</em> terms in the loss function, ensuring that the INR satisfies certain global properties that the function should hold — such as having unit gradient in the case of SDFs. We explore key methodological components, including the definition of INR, the construction of geometric loss functions, and sampling schemes from a differential geometry perspective. Our review highlights the significant advancements enabled by geometric INRs in surface reconstruction from oriented point clouds and posed images.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104085"},"PeriodicalIF":2.5,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-29DOI: 10.1016/j.cag.2024.104098
Zhenshan Hu, Bin Ge, Chenxing Xia, Wenyan Wu, Guangao Zhou, Baotong Wang
Researchers have recently proposed arbitrary style transfer methods based on various model frameworks. Although all of them have achieved good results, they still face the problems of insufficient stylization, artifacts and inadequate retention of content structure. In order to solve these problems, we propose a flow style-aware network (FSANet) for arbitrary style transfer, which combines a VGG network and a flow network. FSANet consists of a flow style transfer module (FSTM), a dynamic regulation attention module (DRAM), and a style feature interaction module (SFIM). The flow style transfer module uses the reversible residue block features of the flow network to create a sample feature containing the target content and style. To adapt the FSTM to VGG networks, we design the dynamic regulation attention module and exploit the sample features both at the channel and pixel levels. The style feature interaction module computes a style tensor that optimizes the fused features. Extensive qualitative and quantitative experiments demonstrate that our proposed FSANet can effectively avoid artifacts and enhance the preservation of content details while migrating style features.
{"title":"Flow style-aware network for arbitrary style transfer","authors":"Zhenshan Hu, Bin Ge, Chenxing Xia, Wenyan Wu, Guangao Zhou, Baotong Wang","doi":"10.1016/j.cag.2024.104098","DOIUrl":"10.1016/j.cag.2024.104098","url":null,"abstract":"<div><div>Researchers have recently proposed arbitrary style transfer methods based on various model frameworks. Although all of them have achieved good results, they still face the problems of insufficient stylization, artifacts and inadequate retention of content structure. In order to solve these problems, we propose a flow style-aware network (FSANet) for arbitrary style transfer, which combines a VGG network and a flow network. FSANet consists of a flow style transfer module (FSTM), a dynamic regulation attention module (DRAM), and a style feature interaction module (SFIM). The flow style transfer module uses the reversible residue block features of the flow network to create a sample feature containing the target content and style. To adapt the FSTM to VGG networks, we design the dynamic regulation attention module and exploit the sample features both at the channel and pixel levels. The style feature interaction module computes a style tensor that optimizes the fused features. Extensive qualitative and quantitative experiments demonstrate that our proposed FSANet can effectively avoid artifacts and enhance the preservation of content details while migrating style features.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104098"},"PeriodicalIF":2.5,"publicationDate":"2024-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reconstructing high-fidelity 3D facial texture from a single image is a quite challenging task due to the lack of complete face information and the domain gap between the 3D face and 2D image. Further, obtaining re-renderable 3D faces has become a strongly desired property in many applications, where the term ’re-renderable’ demands the facial texture to be spatially complete and disentangled with environmental illumination. In this paper, we propose a new self-supervised deep learning framework for reconstructing high-quality and re-renderable facial albedos from single-view images in the wild. Our main idea is to first utilize a prior generation module based on the 3DMM proxy model to produce an unwrapped texture and a globally parameterized prior albedo. Then we apply a detail refinement module to synthesize the final texture with both high-frequency details and completeness. To further make facial textures disentangled with illumination, we propose a novel detailed illumination representation that is reconstructed with the detailed albedo together. We also design several novel regularization losses on both the albedo and illumination maps to facilitate the disentanglement of these two factors. Finally, by leveraging a differentiable renderer, each face attribute can be jointly trained in a self-supervised manner without requiring ground-truth facial reflectance. Extensive comparisons and ablation studies on challenging datasets demonstrate that our framework outperforms state-of-the-art approaches.
{"title":"Self-supervised reconstruction of re-renderable facial textures from single image","authors":"Mingxin Yang , Jianwei Guo , Xiaopeng Zhang , Zhanglin Cheng","doi":"10.1016/j.cag.2024.104096","DOIUrl":"10.1016/j.cag.2024.104096","url":null,"abstract":"<div><div>Reconstructing high-fidelity 3D facial texture from a single image is a quite challenging task due to the lack of complete face information and the domain gap between the 3D face and 2D image. Further, obtaining re-renderable 3D faces has become a strongly desired property in many applications, where the term ’re-renderable’ demands the facial texture to be spatially complete and disentangled with environmental illumination. In this paper, we propose a new self-supervised deep learning framework for reconstructing high-quality and re-renderable facial albedos from single-view images in the wild. Our main idea is to first utilize a <em>prior generation module</em> based on the 3DMM proxy model to produce an unwrapped texture and a globally parameterized prior albedo. Then we apply a <em>detail refinement module</em> to synthesize the final texture with both high-frequency details and completeness. To further make facial textures disentangled with illumination, we propose a novel detailed illumination representation that is reconstructed with the detailed albedo together. We also design several novel regularization losses on both the albedo and illumination maps to facilitate the disentanglement of these two factors. Finally, by leveraging a differentiable renderer, each face attribute can be jointly trained in a self-supervised manner without requiring ground-truth facial reflectance. Extensive comparisons and ablation studies on challenging datasets demonstrate that our framework outperforms state-of-the-art approaches.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104096"},"PeriodicalIF":2.5,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142446516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-27DOI: 10.1016/j.cag.2024.104097
Stéven Picard, Jean Botev
Time experience is an essential part of one’s perception of any environment, real or virtual. In this article, from a virtual environment design perspective, we explore how rhythmic stimuli can influence an unrelated cognitive task regarding time experience and performance in virtual reality. This study explicitly includes physiological data to investigate how, overall, experience correlates with psychophysiological observations. The task involves sorting 3D objects by shape, with varying rhythmic stimuli in terms of their tempo and sensory channel (auditory and/or visual) in different trials, to collect subjective measures of time estimation and judgment. The results indicate different effects on time experience and performance depending on the context, such as user fatigue and trial repetition. Depending on the context, a positive impact of audio stimuli or a negative impact of visual stimuli on task performance can be observed, as well as time being underestimated concerning tempo in relation to task familiarity. However, some effects are consistent regardless of context, such as time being judged to pass faster with additional stimuli or consistent correlations between participants’ performance and time experience, suggesting flow-related aspects. We also observe correlations between time experience with eye-tracking data and body temperature, yet some of these correlations may be due to a confounding effect of fatigue. If confirmed as separate from fatigue, these physiological data could be used as reference point for evaluating a user’s time experience. This might be of great interest for designing virtual environments, as purposeful stimuli can strongly influence task performance and time experience, both essential components of virtual environment user experience.
{"title":"Psychophysiology of rhythmic stimuli and time experience in virtual reality","authors":"Stéven Picard, Jean Botev","doi":"10.1016/j.cag.2024.104097","DOIUrl":"10.1016/j.cag.2024.104097","url":null,"abstract":"<div><div>Time experience is an essential part of one’s perception of any environment, real or virtual. In this article, from a virtual environment design perspective, we explore how rhythmic stimuli can influence an unrelated cognitive task regarding time experience and performance in virtual reality. This study explicitly includes physiological data to investigate how, overall, experience correlates with psychophysiological observations. The task involves sorting 3D objects by shape, with varying rhythmic stimuli in terms of their tempo and sensory channel (auditory and/or visual) in different trials, to collect subjective measures of time estimation and judgment. The results indicate different effects on time experience and performance depending on the context, such as user fatigue and trial repetition. Depending on the context, a positive impact of audio stimuli or a negative impact of visual stimuli on task performance can be observed, as well as time being underestimated concerning tempo in relation to task familiarity. However, some effects are consistent regardless of context, such as time being judged to pass faster with additional stimuli or consistent correlations between participants’ performance and time experience, suggesting flow-related aspects. We also observe correlations between time experience with eye-tracking data and body temperature, yet some of these correlations may be due to a confounding effect of fatigue. If confirmed as separate from fatigue, these physiological data could be used as reference point for evaluating a user’s time experience. This might be of great interest for designing virtual environments, as purposeful stimuli can strongly influence task performance and time experience, both essential components of virtual environment user experience.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104097"},"PeriodicalIF":2.5,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142417389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-25DOI: 10.1016/j.cag.2024.104094
Meng Liu, Feiyu Zhao
With the development of deep learning and computer vision, an increasing amount of research has focused on applying deep learning models to the recognition and classification of three-dimensional shapes. In classification tasks, differences in sample quantity, feature amount, model complexity, and other aspects among different categories of 3D model data cause significant variations in classification difficulty. However, simple cross-entropy loss is generally used as the loss function, but it is insufficient to address these differences. In this paper, we used MeshNet as the base model and introduced focal loss as a metric for the loss function. Additionally, to prevent deep learning models from developing a preference for specific categories, we incorporated regularization loss. The combined use of focal loss and regularization loss in optimizing the MeshNet model’s loss function resulted in a classification accuracy of up to 92.46%, representing a 0.20% improvement over the original model’s highest accuracy of 92.26%. Furthermore, the average accuracy over the final 50 epochs remained stable at a higher level of 92.01%, reflecting a 0.71% improvement compared to the original MeshNet model’s 91.30%. These results indicate that our method performs better in 3D shape classification task.
{"title":"Enhancing MeshNet for 3D shape classification with focal and regularization losses","authors":"Meng Liu, Feiyu Zhao","doi":"10.1016/j.cag.2024.104094","DOIUrl":"10.1016/j.cag.2024.104094","url":null,"abstract":"<div><div>With the development of deep learning and computer vision, an increasing amount of research has focused on applying deep learning models to the recognition and classification of three-dimensional shapes. In classification tasks, differences in sample quantity, feature amount, model complexity, and other aspects among different categories of 3D model data cause significant variations in classification difficulty. However, simple cross-entropy loss is generally used as the loss function, but it is insufficient to address these differences. In this paper, we used MeshNet as the base model and introduced focal loss as a metric for the loss function. Additionally, to prevent deep learning models from developing a preference for specific categories, we incorporated regularization loss. The combined use of focal loss and regularization loss in optimizing the MeshNet model’s loss function resulted in a classification accuracy of up to 92.46%, representing a 0.20% improvement over the original model’s highest accuracy of 92.26%. Furthermore, the average accuracy over the final 50 epochs remained stable at a higher level of 92.01%, reflecting a 0.71% improvement compared to the original MeshNet model’s 91.30%. These results indicate that our method performs better in 3D shape classification task.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104094"},"PeriodicalIF":2.5,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-24DOI: 10.1016/j.cag.2024.104092
Leonardo Christino , Fernando V. Paulovich
Line-chart visualizations of temporal data enable users to identify interesting patterns for the user to inquire about. Using Intelligent Agents (IA), Visual Analytic tools can automatically uncover explicit knowledge related information to said patterns. Yet, visualizing the association of data, patterns, and knowledge is not straightforward. In this paper, we present ChatKG, a novel visual analytics strategy that allows exploratory data analysis of a Knowledge Graph that associates temporal sequences, the patterns found in each sequence, the temporal overlap between patterns, the related knowledge of each given pattern gathered from a multi-agent IA, and the IA’s suggestions of related datasets for further analysis visualized as annotations. We exemplify and informally evaluate ChatKG by analyzing the world’s life expectancy. For this, we implement an oracle that automatically extracts relevant or interesting patterns, populates the Knowledge Graph to be visualized, and, during user interaction, inquires the multi-agent IA for related information and suggests related datasets to be displayed as visual annotations. Our tests and an interview conducted showed that ChatKG is well suited for temporal analysis of temporal patterns and their related knowledge when applied to history studies.
{"title":"ChatKG: Visualizing time-series patterns aided by intelligent agents and a knowledge graph","authors":"Leonardo Christino , Fernando V. Paulovich","doi":"10.1016/j.cag.2024.104092","DOIUrl":"10.1016/j.cag.2024.104092","url":null,"abstract":"<div><div>Line-chart visualizations of temporal data enable users to identify interesting patterns for the user to inquire about. Using Intelligent Agents (IA), Visual Analytic tools can automatically uncover <em>explicit knowledge</em> related information to said patterns. Yet, visualizing the association of data, patterns, and knowledge is not straightforward. In this paper, we present <em>ChatKG</em>, a novel visual analytics strategy that allows exploratory data analysis of a Knowledge Graph that associates temporal sequences, the patterns found in each sequence, the temporal overlap between patterns, the related knowledge of each given pattern gathered from a multi-agent IA, and the IA’s suggestions of related datasets for further analysis visualized as annotations. We exemplify and informally evaluate ChatKG by analyzing the world’s life expectancy. For this, we implement an oracle that automatically extracts relevant or interesting patterns, populates the Knowledge Graph to be visualized, and, during user interaction, inquires the multi-agent IA for related information and suggests related datasets to be displayed as visual annotations. Our tests and an interview conducted showed that ChatKG is well suited for temporal analysis of temporal patterns and their related knowledge when applied to history studies.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104092"},"PeriodicalIF":2.5,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142357034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}