Pub Date : 2024-11-07DOI: 10.1016/j.cag.2024.104120
Markus Höhn , Sarah Schwindt-Drews , Sara Hahn , Sammy Patyna , Stefan Büttner , Jörn Kohlhammer
Chronic Kidney Disease (CKD) is a prominent health problem. Progressive CKD leads to impaired kidney function with decreased ability to filter the patients’ blood, concluding in multiple complications, like heart disease and ultimately death from the disease. In previous work, we developed a prototype to support nephrologists in gaining an overview of their CKD patients. The prototype visualizes the patients in cohorts according to their pairwise similarity. The user can interactively modify the similarity by changing the underlying weights of the included features. The work in this paper expands upon this previous work by the enlargement of the data set and the user interface of the application. With a focus on the distinction between individual CKD classes we introduce a color scheme used throughout all visualization. Furthermore, the visualizations were adopted to display the data of several patients at once. This also involved the option to align the visualizations to sentinel points, such as the onset of a particular CKD stage, in order to quantify the progression of all selected patients in relation to this event. The prototype was developed in response to the identified potential for improvement of the earlier application. An additional user study concerning the intuitiveness and usability confirms good results for the prototype and leads to the assessment of an easy-to-use approach.
{"title":"RenalViz: Visual analysis of cohorts with chronic kidney disease","authors":"Markus Höhn , Sarah Schwindt-Drews , Sara Hahn , Sammy Patyna , Stefan Büttner , Jörn Kohlhammer","doi":"10.1016/j.cag.2024.104120","DOIUrl":"10.1016/j.cag.2024.104120","url":null,"abstract":"<div><div>Chronic Kidney Disease (CKD) is a prominent health problem. Progressive CKD leads to impaired kidney function with decreased ability to filter the patients’ blood, concluding in multiple complications, like heart disease and ultimately death from the disease. In previous work, we developed a prototype to support nephrologists in gaining an overview of their CKD patients. The prototype visualizes the patients in cohorts according to their pairwise similarity. The user can interactively modify the similarity by changing the underlying weights of the included features. The work in this paper expands upon this previous work by the enlargement of the data set and the user interface of the application. With a focus on the distinction between individual CKD classes we introduce a color scheme used throughout all visualization. Furthermore, the visualizations were adopted to display the data of several patients at once. This also involved the option to align the visualizations to sentinel points, such as the onset of a particular CKD stage, in order to quantify the progression of all selected patients in relation to this event. The prototype was developed in response to the identified potential for improvement of the earlier application. An additional user study concerning the intuitiveness and usability confirms good results for the prototype and leads to the assessment of an easy-to-use approach.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104120"},"PeriodicalIF":2.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-04DOI: 10.1016/j.cag.2024.104108
Mengmeng Yu, Chongke Bi
Timeline control is a crucial interaction during video viewing, aiding users in quickly locating or jumping to specific points in the video playback, especially when dealing with lengthy content. 360°videos, with their ability to offer an all-encompassing view, have gradually gained popularity, providing a more immersive experience compared to videos with a single perspective. While most 360°videos are currently displayed on two-dimensional screens, the timeline design has largely remained similar to that of conventional videos. However, virtual reality (VR) headsets provide a more immersive viewing experience for 360°videos and offer additional dimensions for timeline design. In this paper, we initially explored 6 timeline design styles by varying the shape and interaction distance of the timeline, aiming to discover designs more suitable for the VR environment of 360°videos. Subsequently, we introduced an adaptive timeline display mechanism based on eye gaze sequences to optimize the timeline, addressing issues like obstructing the view and causing distractions when the timeline is consistently visible. Through two studies, we first demonstrated that in the 360°space, the three-dimensional timeline performs better in terms of usability than the two-dimensional one, and the reachable timeline has advantages in performance and experience over the distant one. Secondly, we verified that, without compromising interaction efficiency and system usability, the adaptive display timeline gained more user preference due to its accurate prediction of user timeline needs.
{"title":"Adaptive 360° video timeline exploration in VR environment","authors":"Mengmeng Yu, Chongke Bi","doi":"10.1016/j.cag.2024.104108","DOIUrl":"10.1016/j.cag.2024.104108","url":null,"abstract":"<div><div>Timeline control is a crucial interaction during video viewing, aiding users in quickly locating or jumping to specific points in the video playback, especially when dealing with lengthy content. 360°videos, with their ability to offer an all-encompassing view, have gradually gained popularity, providing a more immersive experience compared to videos with a single perspective. While most 360°videos are currently displayed on two-dimensional screens, the timeline design has largely remained similar to that of conventional videos. However, virtual reality (VR) headsets provide a more immersive viewing experience for 360°videos and offer additional dimensions for timeline design. In this paper, we initially explored 6 timeline design styles by varying the shape and interaction distance of the timeline, aiming to discover designs more suitable for the VR environment of 360°videos. Subsequently, we introduced an adaptive timeline display mechanism based on eye gaze sequences to optimize the timeline, addressing issues like obstructing the view and causing distractions when the timeline is consistently visible. Through two studies, we first demonstrated that in the 360°space, the three-dimensional timeline performs better in terms of usability than the two-dimensional one, and the reachable timeline has advantages in performance and experience over the distant one. Secondly, we verified that, without compromising interaction efficiency and system usability, the adaptive display timeline gained more user preference due to its accurate prediction of user timeline needs.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104108"},"PeriodicalIF":2.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-04DOI: 10.1016/j.cag.2024.104122
Yang Wang , Shijia Song , Lijun Zhao , Huijuan Xia , Zhenyu Yuan , Ying Zhang
Illumination consistency is a key factor for seamlessly integrating virtual objects with real scenes in augmented reality (AR) systems. High dynamic range (HDR) panoramic images are widely used to estimate scene lighting accurately. However, generating environment maps requires complex deep network architectures, which cannot operate on devices with limited memory space. To address this issue, this paper proposes CGLight, an effective illumination estimation method that predicts HDR panoramic environment maps from a single limited field-of-view (LFOV) image. We first design a CMAtten encoder to extract features from input images, which learns the spherical harmonic (SH) lighting representation with fewer model parameters. Guided by the lighting parameters, we train a generative adversarial network (GAN) to generate HDR environment maps. In addition, to enrich lighting details and reduce training time, we specifically introduce the color consistency loss and independent discriminator, considering the impact of color properties on the lighting estimation task while improving computational efficiency. Furthermore, the effectiveness of CGLight is verified by relighting virtual objects using the predicted environment maps, and the root mean square error and angular error are 0.0494 and 4.0607 in the gray diffuse sphere, respectively. Extensive experiments and analyses demonstrate that CGLight achieves a balance between indoor illumination estimation accuracy and resource efficiency, attaining higher accuracy with nearly 4 times fewer model parameters than the ViT-B16 model.
{"title":"CGLight: An effective indoor illumination estimation method based on improved convmixer and GauGAN","authors":"Yang Wang , Shijia Song , Lijun Zhao , Huijuan Xia , Zhenyu Yuan , Ying Zhang","doi":"10.1016/j.cag.2024.104122","DOIUrl":"10.1016/j.cag.2024.104122","url":null,"abstract":"<div><div>Illumination consistency is a key factor for seamlessly integrating virtual objects with real scenes in augmented reality (AR) systems. High dynamic range (HDR) panoramic images are widely used to estimate scene lighting accurately. However, generating environment maps requires complex deep network architectures, which cannot operate on devices with limited memory space. To address this issue, this paper proposes CGLight, an effective illumination estimation method that predicts HDR panoramic environment maps from a single limited field-of-view (LFOV) image. We first design a CMAtten encoder to extract features from input images, which learns the spherical harmonic (SH) lighting representation with fewer model parameters. Guided by the lighting parameters, we train a generative adversarial network (GAN) to generate HDR environment maps. In addition, to enrich lighting details and reduce training time, we specifically introduce the color consistency loss and independent discriminator, considering the impact of color properties on the lighting estimation task while improving computational efficiency. Furthermore, the effectiveness of CGLight is verified by relighting virtual objects using the predicted environment maps, and the root mean square error and angular error are 0.0494 and 4.0607 in the gray diffuse sphere, respectively. Extensive experiments and analyses demonstrate that CGLight achieves a balance between indoor illumination estimation accuracy and resource efficiency, attaining higher accuracy with nearly 4 times fewer model parameters than the ViT-B16 model.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104122"},"PeriodicalIF":2.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-31DOI: 10.1016/j.cag.2024.104111
Rosa Costa, Cléber Corrêa, Skip Rizzo
{"title":"Foreword to the special section on Symposium on Virtual and Augmented Reality 2024 (SVR 2024)","authors":"Rosa Costa, Cléber Corrêa, Skip Rizzo","doi":"10.1016/j.cag.2024.104111","DOIUrl":"10.1016/j.cag.2024.104111","url":null,"abstract":"","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104111"},"PeriodicalIF":2.5,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Virtual Reality (VR) is an immersive virtual environment generated through computer technology. VR teaching, by utilizing an immersive learning model, offers innovative learning methods for Science, Technology, Engineering and Mathematics (STEM) education as well as programming education. This study developed a Drone Virtual Reality Teaching (DVRT) system aimed at beginners in drone operation and programming, with the goal of addressing the challenges in traditional drone and programming education, such as difficulty in engaging students and lack of practicality. Through the system's curriculum, students learn basic drone operation skills and advanced programming techniques. We conducted a course experiment primarily targeting undergraduate students who are beginners in drone operation. The test results showed that most students achieved scores above 4 out of 5, indicating that DVRT can effectively promote the development of users' comprehensive STEM literacy and computational thinking, thereby demonstrating the great potential of VR technology in STEM education. Through this innovative teaching method, students not only gain knowledge but also enjoy the fun of immersive learning.
{"title":"DVRT: Design and evaluation of a virtual reality drone programming teaching system","authors":"Zean Jin, Yulong Bai, Wei Song, Qinghe Yu, Xiaoxin Yue, Xiang Jia","doi":"10.1016/j.cag.2024.104114","DOIUrl":"10.1016/j.cag.2024.104114","url":null,"abstract":"<div><div>Virtual Reality (VR) is an immersive virtual environment generated through computer technology. VR teaching, by utilizing an immersive learning model, offers innovative learning methods for Science, Technology, Engineering and Mathematics (STEM) education as well as programming education. This study developed a Drone Virtual Reality Teaching (DVRT) system aimed at beginners in drone operation and programming, with the goal of addressing the challenges in traditional drone and programming education, such as difficulty in engaging students and lack of practicality. Through the system's curriculum, students learn basic drone operation skills and advanced programming techniques. We conducted a course experiment primarily targeting undergraduate students who are beginners in drone operation. The test results showed that most students achieved scores above 4 out of 5, indicating that DVRT can effectively promote the development of users' comprehensive STEM literacy and computational thinking, thereby demonstrating the great potential of VR technology in STEM education. Through this innovative teaching method, students not only gain knowledge but also enjoy the fun of immersive learning.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104114"},"PeriodicalIF":2.5,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-20DOI: 10.1016/j.cag.2024.104107
Lucas Zanusso Morais , Marcelo Gomes Martins , Rafael Piccin Torchelsen , Anderson Maciel , Luciana Porcher Nedel
Collision detection has been widely studied in the last decades. While plenty of solutions exist, certain simulation scenarios are still challenging when permanent contact and deformable bodies are involved. In this paper, we introduce a novel approach based on volumetric splines that is applicable to complex deformable tubes, such as in the simulation of colonoscopy and other endoscopies. The method relies on modeling radial control points, extracting surface information from a triangle mesh, and storing the volume information around a spline path. Such information is later used to compute the intersection between the object surfaces under the assumption of spatial coherence between neighboring splines. We analyze the method’s performance in terms of both speed and accuracy, comparing it with previous works. Results show that our method solves collisions between complex meshes with over 300k triangles, generating over 1,000 collisions per frame between objects while maintaining an average time of under 1ms without compromising accuracy.
{"title":"Fast spline collision detection (FSCD) algorithm for solving multiple contacts in real-time","authors":"Lucas Zanusso Morais , Marcelo Gomes Martins , Rafael Piccin Torchelsen , Anderson Maciel , Luciana Porcher Nedel","doi":"10.1016/j.cag.2024.104107","DOIUrl":"10.1016/j.cag.2024.104107","url":null,"abstract":"<div><div>Collision detection has been widely studied in the last decades. While plenty of solutions exist, certain simulation scenarios are still challenging when permanent contact and deformable bodies are involved. In this paper, we introduce a novel approach based on volumetric splines that is applicable to complex deformable tubes, such as in the simulation of colonoscopy and other endoscopies. The method relies on modeling radial control points, extracting surface information from a triangle mesh, and storing the volume information around a spline path. Such information is later used to compute the intersection between the object surfaces under the assumption of spatial coherence between neighboring splines. We analyze the method’s performance in terms of both speed and accuracy, comparing it with previous works. Results show that our method solves collisions between complex meshes with over 300k triangles, generating over 1,000 collisions per frame between objects while maintaining an average time of under 1ms without compromising accuracy.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104107"},"PeriodicalIF":2.5,"publicationDate":"2024-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-16DOI: 10.1016/j.cag.2024.104095
Troels Rasmussen , Kaj Grønbæk , Weidong Huang
Research on remote assistance in real-world industries is sparse, as most research is conducted in the laboratory under controlled conditions. Consequently, little is known about how users tailor remote assistance technologies at work. Therefore, we developed an augmented reality-based remote assistance prototype called Remote Assist Kit (RAK). RAK is a component-based system, allowing us to study tailoring activities and the usefulness of tailorable remote assistance technologies. We conducted a user evaluation with employees from the plastic manufacturing industry. The employees configured the RAK to solve real-world problems in three collaborative scenarios: (1) troubleshooting a running injection molding machine, (2) tool maintenance, (3) solving a trigonometry problem. Our results show that the tailorability of RAK was perceived as useful, and users were able to successfully tailor RAK to the distinct properties of the scenarios. Specific findings and their implications for the design of tailorable remote assistance technologies are presented. Among other findings, requirements specific to remote assistance in the manufacturing industry were discussed, such as the importance of sharing machine sounds between the local operator and the remote helper.
由于大多数研究都是在受控条件下在实验室中进行的,因此对实际行业中远程协助的研究很少。因此,人们对用户在工作中如何使用远程协助技术知之甚少。因此,我们开发了一个基于增强现实技术的远程协助原型,名为远程协助工具包(RAK)。RAK 是一个基于组件的系统,使我们能够研究定制活动和可定制远程协助技术的实用性。我们对塑料制造业的员工进行了用户评估。员工们对 RAK 进行了配置,以解决三个协作场景中的实际问题:(1) 对运行中的注塑机进行故障排除,(2) 工具维护,(3) 解决三角函数问题。我们的结果表明,RAK 的可定制性被认为是有用的,用户能够成功地定制 RAK 以适应不同场景的不同特性。本文介绍了具体的研究结果及其对量身定制的远程协助技术设计的影响。除其他发现外,还讨论了制造业对远程协助的具体要求,例如本地操作员和远程协助人员共享机器声音的重要性。
{"title":"Supporting tailorability in augmented reality based remote assistance in the manufacturing industry: A user study","authors":"Troels Rasmussen , Kaj Grønbæk , Weidong Huang","doi":"10.1016/j.cag.2024.104095","DOIUrl":"10.1016/j.cag.2024.104095","url":null,"abstract":"<div><div>Research on remote assistance in real-world industries is sparse, as most research is conducted in the laboratory under controlled conditions. Consequently, little is known about how users tailor remote assistance technologies at work. Therefore, we developed an augmented reality-based remote assistance prototype called Remote Assist Kit (RAK). RAK is a component-based system, allowing us to study tailoring activities and the usefulness of tailorable remote assistance technologies. We conducted a user evaluation with employees from the plastic manufacturing industry. The employees configured the RAK to solve real-world problems in three collaborative scenarios: (1) troubleshooting a running injection molding machine, (2) tool maintenance, (3) solving a trigonometry problem. Our results show that the tailorability of RAK was perceived as useful, and users were able to successfully tailor RAK to the distinct properties of the scenarios. Specific findings and their implications for the design of tailorable remote assistance technologies are presented. Among other findings, requirements specific to remote assistance in the manufacturing industry were discussed, such as the importance of sharing machine sounds between the local operator and the remote helper.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104095"},"PeriodicalIF":2.5,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-15DOI: 10.1016/j.cag.2024.104104
Alfonso López , Antonio J. Rueda , Rafael J. Segura , Carlos J. Ogayar , Pablo Navarro , José M. Fuertes
One of the primary challenges inherent in utilizing deep learning models is the scarcity and accessibility hurdles associated with acquiring datasets of sufficient size to facilitate effective training of these networks. This is particularly significant in object detection, shape completion, and fracture assembly. Instead of scanning a large number of real-world fragments, it is possible to generate massive datasets with synthetic pieces. However, realistic fragmentation is computationally intensive in the preparation (e.g., pre-factured models) and generation. Otherwise, simpler algorithms such as Voronoi diagrams provide faster processing speeds at the expense of compromising realism. In this context, it is required to balance computational efficiency and realism. This paper introduces a GPU-based framework for the massive generation of voxelized fragments derived from high-resolution 3D models, specifically prepared for their utilization as training sets for machine learning models. This rapid pipeline enables controlling how many pieces are produced, their dispersion and the appearance of subtle effects such as erosion. We have tested our pipeline with an archaeological dataset, producing more than 1M fragmented pieces from 1,052 Iberian vessels (Github). Although this work primarily intends to provide pieces as implicit data represented by voxels, triangle meshes and point clouds can also be inferred from the initial implicit representation. To underscore the unparalleled benefits of CPU and GPU acceleration in generating vast datasets, we compared against a realistic fragment generator that highlights the potential of our approach, both in terms of applicability and processing time. We also demonstrate the synergies between our pipeline and realistic simulators, which frequently cannot select the number and size of resulting pieces. To this end, a deep learning model was trained over realistic fragments and our dataset, showing similar results.
{"title":"Generating implicit object fragment datasets for machine learning","authors":"Alfonso López , Antonio J. Rueda , Rafael J. Segura , Carlos J. Ogayar , Pablo Navarro , José M. Fuertes","doi":"10.1016/j.cag.2024.104104","DOIUrl":"10.1016/j.cag.2024.104104","url":null,"abstract":"<div><div>One of the primary challenges inherent in utilizing deep learning models is the scarcity and accessibility hurdles associated with acquiring datasets of sufficient size to facilitate effective training of these networks. This is particularly significant in object detection, shape completion, and fracture assembly. Instead of scanning a large number of real-world fragments, it is possible to generate massive datasets with synthetic pieces. However, realistic fragmentation is computationally intensive in the preparation (e.g., pre-factured models) and generation. Otherwise, simpler algorithms such as Voronoi diagrams provide faster processing speeds at the expense of compromising realism. In this context, it is required to balance computational efficiency and realism. This paper introduces a GPU-based framework for the massive generation of voxelized fragments derived from high-resolution 3D models, specifically prepared for their utilization as training sets for machine learning models. This rapid pipeline enables controlling how many pieces are produced, their dispersion and the appearance of subtle effects such as erosion. We have tested our pipeline with an archaeological dataset, producing more than 1M fragmented pieces from 1,052 Iberian vessels (<span><span>Github</span><svg><path></path></svg></span>). Although this work primarily intends to provide pieces as implicit data represented by voxels, triangle meshes and point clouds can also be inferred from the initial implicit representation. To underscore the unparalleled benefits of CPU and GPU acceleration in generating vast datasets, we compared against a realistic fragment generator that highlights the potential of our approach, both in terms of applicability and processing time. We also demonstrate the synergies between our pipeline and realistic simulators, which frequently cannot select the number and size of resulting pieces. To this end, a deep learning model was trained over realistic fragments and our dataset, showing similar results.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104104"},"PeriodicalIF":2.5,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-11DOI: 10.1016/j.cag.2024.104100
Xue Jiao , Xiaohui Yang
Despite significant advances in 3D mesh segmentation techniques driven by deep learning, segmenting 3D meshes without exhaustive manual labeling remains a challenging due to difficulties in acquiring high-quality labeled datasets. This paper introduces an aggregation dual autoencoder self-supervised clustering-based mesh segmentation network for unlabeled 3D meshes (ADA-SCMS Net). Expanding upon the previously proposed SCMS-Net, the ADA-SCMS Net enhances the segmentation process by incorporating a denoising autoencoder with an improved graph autoencoder as its basic structure. This modification prompts the segmentation network to concentrate on the primary structure of the input data during training, enabling the capture of robust features. In addition, the ADA-SCMS network introduces two new modules. One module is named the branch aggregation module, which combines the strengths of two branches to create a semantic latent representation. The other is the aggregation self-supervised clustering module, which facilitates end-to-end clustering training by iteratively updating each branch through mutual supervision. Extensive experiments on benchmark datasets validate the effectiveness of the ADA-SCMS network, demonstrating superior segmentation performance compared to the SCMS network.
{"title":"ADA-SCMS Net: A self-supervised clustering-based 3D mesh segmentation network with aggregation dual autoencoder","authors":"Xue Jiao , Xiaohui Yang","doi":"10.1016/j.cag.2024.104100","DOIUrl":"10.1016/j.cag.2024.104100","url":null,"abstract":"<div><div>Despite significant advances in 3D mesh segmentation techniques driven by deep learning, segmenting 3D meshes without exhaustive manual labeling remains a challenging due to difficulties in acquiring high-quality labeled datasets. This paper introduces an <strong>a</strong>ggregation <strong>d</strong>ual <strong>a</strong>utoencoder <strong>s</strong>elf-supervised <strong>c</strong>lustering-based <strong>m</strong>esh <strong>s</strong>egmentation network for unlabeled 3D meshes (ADA-SCMS Net). Expanding upon the previously proposed SCMS-Net, the ADA-SCMS Net enhances the segmentation process by incorporating a denoising autoencoder with an improved graph autoencoder as its basic structure. This modification prompts the segmentation network to concentrate on the primary structure of the input data during training, enabling the capture of robust features. In addition, the ADA-SCMS network introduces two new modules. One module is named the branch aggregation module, which combines the strengths of two branches to create a semantic latent representation. The other is the aggregation self-supervised clustering module, which facilitates end-to-end clustering training by iteratively updating each branch through mutual supervision. Extensive experiments on benchmark datasets validate the effectiveness of the ADA-SCMS network, demonstrating superior segmentation performance compared to the SCMS network.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104100"},"PeriodicalIF":2.5,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142437819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}