Pub Date : 2026-01-05DOI: 10.1016/j.cag.2025.104529
Xiaoqun Wu, Tian Yang, Liu Yu, Jian Cao, Huiling Si
To address the issue of blurry artifacts in texture mapping for 3D reconstruction, we propose an innovative approach that optimizes textures based on semantic-aware similarity. Unlike previous algorithms that require significant computational costs, our method introduces a novel metric that provides a more efficient solution for texture mapping. This allows for high-quality texture mapping in 3D reconstructions using multi-view captured images. Our approach begins by establishing mapping within the image sequence using the available 3D information. We then quantitatively assess pixel similarity using our proposed semantic-aware metric, which guides the texture image generation process. By leveraging semantic-aware similarity, we constrain texture mapping and enhance texture clarity. Finally, the texture image is projected onto the geometry to produce a 3D textured mesh. Experimental results conclusively demonstrate that our method can generate 3D meshes with crisp, high-fidelity textures faster than existing methods, even in scenarios involving substantial camera pose errors and low-precision reconstruction geometry.
{"title":"Efficient semantic-aware texture optimization for 3D scene reconstruction","authors":"Xiaoqun Wu, Tian Yang, Liu Yu, Jian Cao, Huiling Si","doi":"10.1016/j.cag.2025.104529","DOIUrl":"10.1016/j.cag.2025.104529","url":null,"abstract":"<div><div>To address the issue of blurry artifacts in texture mapping for 3D reconstruction, we propose an innovative approach that optimizes textures based on semantic-aware similarity. Unlike previous algorithms that require significant computational costs, our method introduces a novel metric that provides a more efficient solution for texture mapping. This allows for high-quality texture mapping in 3D reconstructions using multi-view captured images. Our approach begins by establishing mapping within the image sequence using the available 3D information. We then quantitatively assess pixel similarity using our proposed semantic-aware metric, which guides the texture image generation process. By leveraging semantic-aware similarity, we constrain texture mapping and enhance texture clarity. Finally, the texture image is projected onto the geometry to produce a 3D textured mesh. Experimental results conclusively demonstrate that our method can generate 3D meshes with crisp, high-fidelity textures faster than existing methods, even in scenarios involving substantial camera pose errors and low-precision reconstruction geometry.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"134 ","pages":"Article 104529"},"PeriodicalIF":2.8,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-02DOI: 10.1016/j.cag.2025.104528
Yifei Tian, Xiangyu Li, Jieming Yin
Most existing point cloud registration methods heavily rely on accurate correspondences between the source and target point clouds, such as point-level or superpoint-level matches. In dense and balanced point clouds where local geometric structures are relatively complete, correspondences are easier to establish, leading to satisfactory registration performance. However, real-world point clouds can be sparse or imbalanced. The absence or inconsistency of local geometric structures makes it challenging to construct reliable correspondences, significantly degrading the performance of mainstream registration methods. To address this challenge, we propose P2NCorr, a pseudo-to-non-correspondence registration method designed for robust alignment in point clouds with missing or low-quality correspondences. Our method leverages an attention-guided soft matching module that uses self- and cross-attention mechanisms to extract contextual features and constructs pseudo correspondences under slack constraints. On this basis, we introduce a geometric consistency metric based on the thickness-guided self-correction module, which enables fine-grained alignment and optimization of micro-surfaces in the fused point cloud. This thickness evaluation serves as a supplementary supervisory signal, forming a comprehensive feedback from the post-registration fusion to the feature extraction module, thereby improving both the accuracy and stability of the registration process. Experiments conducted on public datasets such as ModelNet40 and 7Scenes demonstrate that P2NCorr achieves high-precision registration even under challenging conditions. Especially when point clouds are sparse, sampling is imbalanced, and measurements are noisy, experiments demonstrate strong robustness and promising potential.
{"title":"From pseudo- to non-correspondences: Robust point cloud registration via thickness-guided self-correction","authors":"Yifei Tian, Xiangyu Li, Jieming Yin","doi":"10.1016/j.cag.2025.104528","DOIUrl":"10.1016/j.cag.2025.104528","url":null,"abstract":"<div><div>Most existing point cloud registration methods heavily rely on accurate correspondences between the source and target point clouds, such as point-level or superpoint-level matches. In dense and balanced point clouds where local geometric structures are relatively complete, correspondences are easier to establish, leading to satisfactory registration performance. However, real-world point clouds can be sparse or imbalanced. The absence or inconsistency of local geometric structures makes it challenging to construct reliable correspondences, significantly degrading the performance of mainstream registration methods. To address this challenge, we propose P2NCorr, a pseudo-to-non-correspondence registration method designed for robust alignment in point clouds with missing or low-quality correspondences. Our method leverages an attention-guided soft matching module that uses self- and cross-attention mechanisms to extract contextual features and constructs pseudo correspondences under slack constraints. On this basis, we introduce a geometric consistency metric based on the thickness-guided self-correction module, which enables fine-grained alignment and optimization of micro-surfaces in the fused point cloud. This thickness evaluation serves as a supplementary supervisory signal, forming a comprehensive feedback from the post-registration fusion to the feature extraction module, thereby improving both the accuracy and stability of the registration process. Experiments conducted on public datasets such as ModelNet40 and 7Scenes demonstrate that P2NCorr achieves high-precision registration even under challenging conditions. Especially when point clouds are sparse, sampling is imbalanced, and measurements are noisy, experiments demonstrate strong robustness and promising potential.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"134 ","pages":"Article 104528"},"PeriodicalIF":2.8,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145938376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-23DOI: 10.1016/j.cag.2025.104524
Lei He , Mingbo Hu , Wenli Xiu , Hongyu Wu , Siming Zheng , Shuai Li , Qian Dong , Aimin Hao
Haptic-based surgical simulation is widely utilized for training surgical skills. However, simulating the interaction between rigid surgical instruments and soft tissues presents significant technical challenges. In this paper, we propose an energy-based haptic rendering method to achieve both large deformations and rigid–soft haptic interaction. Different from existing methods, both the rigid tools and soft tissues are modeled by an energy-based virtual coupling system. The constraints of soft deformation, tool-object interaction and haptic rendering are defined by potential energy. Benefit from energy-based constraints, we can realize complex surgical operations, such as inserting tools into soft tissue. The virtual coupling of soft tissue enables the separation of haptic interaction into two components: soft deformation with high computational complexity, and high-frequency haptic rendering. The soft deformation with shape constraints is accelerated GPU at a relatively low frequency(60Hz 100Hz), while the haptic rendering runs in another thread at a high frequency ( 1000Hz). We have implemented haptic simulation for two commonly used surgical operations, pressing and pulling. The experimental results show that our method can achieve stable feedback force and non-penetration between the tool and soft tissue under the condition of large soft deformation.
{"title":"Energy-based haptic rendering for real-time surgical simulation","authors":"Lei He , Mingbo Hu , Wenli Xiu , Hongyu Wu , Siming Zheng , Shuai Li , Qian Dong , Aimin Hao","doi":"10.1016/j.cag.2025.104524","DOIUrl":"10.1016/j.cag.2025.104524","url":null,"abstract":"<div><div>Haptic-based surgical simulation is widely utilized for training surgical skills. However, simulating the interaction between rigid surgical instruments and soft tissues presents significant technical challenges. In this paper, we propose an energy-based haptic rendering method to achieve both large deformations and rigid–soft haptic interaction. Different from existing methods, both the rigid tools and soft tissues are modeled by an energy-based virtual coupling system. The constraints of soft deformation, tool-object interaction and haptic rendering are defined by potential energy. Benefit from energy-based constraints, we can realize complex surgical operations, such as inserting tools into soft tissue. The virtual coupling of soft tissue enables the separation of haptic interaction into two components: soft deformation with high computational complexity, and high-frequency haptic rendering. The soft deformation with shape constraints is accelerated GPU at a relatively low frequency(60Hz <span><math><mo>∼</mo></math></span> 100Hz), while the haptic rendering runs in another thread at a high frequency (<span><math><mo>≥</mo></math></span> 1000Hz). We have implemented haptic simulation for two commonly used surgical operations, pressing and pulling. The experimental results show that our method can achieve stable feedback force and non-penetration between the tool and soft tissue under the condition of large soft deformation.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"134 ","pages":"Article 104524"},"PeriodicalIF":2.8,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-22DOI: 10.1016/j.cag.2025.104527
Gilda Manfredi, Nicola Capece, Ugo Erra
In high-rise buildings, complex layouts, and frequent structural modifications can make emergency rescues challenging. Conventional 2D rescue plans offer no real-time guidance and can be difficult for rescuers to interpret under stress. To overcome these limitations, we introduce an advanced Mixed Reality (MR) application designed for real-time rescue assistance in multi-floor buildings. Built for the Meta Quest 3, a cost-efficient standalone MR headset, our system enables users to scan, update, and navigate a dynamic 3D model of their surroundings. We implement external data storage and utilize spatial anchors to ensure accurate realignment to bypass the Meta Quest 3’s constraint of storing only 15 rooms. Additionally, the application utilizes the A* algorithm to dynamically calculate optimal routes based on the user’s real-time location, taking into account the room layout and any obstacles inside. Users can navigate using either a floating 3D minimap or a 2D minimap anchored to their left hand, with staircases seamlessly incorporated into navigation routes, including virtual door warnings and automatic removal of navigation cues near stairs for safety. To improve user experience, we have implemented hand tracking for interactions. We conducted a study with a large sample of participants to evaluate usability and effectiveness. This study included the NASA Task Load Index (NASA-TLX) to assess the cognitive load, along with the System Usability Scale (SUS), the Self-Assessment Manikin (SAM), the Single Ease Question (SEQ), task completion times, and a post-evaluation feedback questionnaire. The results demonstrate that the system achieves high usability, low cognitive workload, and positive user experience while supporting situational awareness, user confidence, and efficient navigation. The results indicate that a system based on an MR headset has the potential to improve situational awareness and decision-making in dynamic indoor environments.
{"title":"MRescue: A mixed reality system for real-time navigation and rescue in complex multi-floor buildings","authors":"Gilda Manfredi, Nicola Capece, Ugo Erra","doi":"10.1016/j.cag.2025.104527","DOIUrl":"10.1016/j.cag.2025.104527","url":null,"abstract":"<div><div>In high-rise buildings, complex layouts, and frequent structural modifications can make emergency rescues challenging. Conventional 2D rescue plans offer no real-time guidance and can be difficult for rescuers to interpret under stress. To overcome these limitations, we introduce an advanced Mixed Reality (MR) application designed for real-time rescue assistance in multi-floor buildings. Built for the Meta Quest 3, a cost-efficient standalone MR headset, our system enables users to scan, update, and navigate a dynamic 3D model of their surroundings. We implement external data storage and utilize spatial anchors to ensure accurate realignment to bypass the Meta Quest 3’s constraint of storing only 15 rooms. Additionally, the application utilizes the A* algorithm to dynamically calculate optimal routes based on the user’s real-time location, taking into account the room layout and any obstacles inside. Users can navigate using either a floating 3D minimap or a 2D minimap anchored to their left hand, with staircases seamlessly incorporated into navigation routes, including virtual door warnings and automatic removal of navigation cues near stairs for safety. To improve user experience, we have implemented hand tracking for interactions. We conducted a study with a large sample of participants to evaluate usability and effectiveness. This study included the NASA Task Load Index (NASA-TLX) to assess the cognitive load, along with the System Usability Scale (SUS), the Self-Assessment Manikin (SAM), the Single Ease Question (SEQ), task completion times, and a post-evaluation feedback questionnaire. The results demonstrate that the system achieves high usability, low cognitive workload, and positive user experience while supporting situational awareness, user confidence, and efficient navigation. The results indicate that a system based on an MR headset has the potential to improve situational awareness and decision-making in dynamic indoor environments.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"134 ","pages":"Article 104527"},"PeriodicalIF":2.8,"publicationDate":"2025-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1016/j.cag.2025.104526
Leonardo Sacht, Marcos Lage, Ricardo Marroquim
{"title":"Foreword to special section on SIBGRAPI 2025","authors":"Leonardo Sacht, Marcos Lage, Ricardo Marroquim","doi":"10.1016/j.cag.2025.104526","DOIUrl":"10.1016/j.cag.2025.104526","url":null,"abstract":"","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"134 ","pages":"Article 104526"},"PeriodicalIF":2.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145796670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1016/j.cag.2025.104523
A. Augusto de Sousa , Miguel Angel Guevara López , Traian Lavric
{"title":"Foreword to the special section on the 29th international ACM conference on 3D web technology","authors":"A. Augusto de Sousa , Miguel Angel Guevara López , Traian Lavric","doi":"10.1016/j.cag.2025.104523","DOIUrl":"10.1016/j.cag.2025.104523","url":null,"abstract":"","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"134 ","pages":"Article 104523"},"PeriodicalIF":2.8,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145883787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1016/j.cag.2025.104525
Haijiao Gu, Yan Piao
Dense light fields contain rich spatial and angular information, making them highly valuable for applications such as depth estimation, 3D reconstruction, and multi-view elemental image synthesis. Light-field cameras capture both spatial and angular scene information in a single shot. However, due to high hardware requirements and substantial storage costs, practical acquisitions often yield only sparse light-field maps. To address this problem, this paper proposes an efficient end-to-end sparse-to-dense light-field reconstruction method based on Spatial–Angular Multi-Dimensional Interaction and Guided Residual Networks. The Spatial–Angular Multi-Dimensional Interaction Module (SAMDIM) fully exploits the four-dimensional structural information of light-field image data in both spatial and angular domains. It performs dual-modal interaction across spatial and angular dimensions to generate dense subviews. The channel attention mechanism within the interaction module significantly improves the image quality of these dense subviews. Finally, the Guided Residual Refinement Module (GRRM) further enhances the texture details of the generated dense subviews, enhancing the reconstruction quality of the dense light field. Experimental results demonstrate that our proposed network model achieves clear advantages over state-of-the-art methods in both visual quality and quantitative metrics on real-world datasets.
{"title":"Sparse-to-dense light field reconstruction based on Spatial–Angular Multi-Dimensional Interaction and Guided Residual Networks","authors":"Haijiao Gu, Yan Piao","doi":"10.1016/j.cag.2025.104525","DOIUrl":"10.1016/j.cag.2025.104525","url":null,"abstract":"<div><div>Dense light fields contain rich spatial and angular information, making them highly valuable for applications such as depth estimation, 3D reconstruction, and multi-view elemental image synthesis. Light-field cameras capture both spatial and angular scene information in a single shot. However, due to high hardware requirements and substantial storage costs, practical acquisitions often yield only sparse light-field maps. To address this problem, this paper proposes an efficient end-to-end sparse-to-dense light-field reconstruction method based on Spatial–Angular Multi-Dimensional Interaction and Guided Residual Networks. The Spatial–Angular Multi-Dimensional Interaction Module (SAMDIM) fully exploits the four-dimensional structural information of light-field image data in both spatial and angular domains. It performs dual-modal interaction across spatial and angular dimensions to generate dense subviews. The channel attention mechanism within the interaction module significantly improves the image quality of these dense subviews. Finally, the Guided Residual Refinement Module (GRRM) further enhances the texture details of the generated dense subviews, enhancing the reconstruction quality of the dense light field. Experimental results demonstrate that our proposed network model achieves clear advantages over state-of-the-art methods in both visual quality and quantitative metrics on real-world datasets.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"134 ","pages":"Article 104525"},"PeriodicalIF":2.8,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145796669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16DOI: 10.1016/j.cag.2025.104522
Hao-Zhong Yang , Wen-Tong Shu , Yi-Jun Li , Miao Wang
Social virtual reality enables multi-user co-presence and collaboration but introduces privacy challenges such as personal space intrusion and unwanted interruptions. Teleportation negotiation techniques help address these issues by allowing users to define teleportation-permitted zone, maintaining spatial boundaries and comfort. However, existing methods primarily focus on forward view and often require physical rotation to monitor and respond to requests originating from behind. This can disrupt immersion and reduce social presence.
To better understand these challenges, we first conducted a preliminary study to identify users’ needs for rear-space awareness during teleportation negotiation. Based on the findings, we designed two rear-awareness negotiation techniques, Window negotiation and MiniMap negotiation. These techniques display rear-space information within the forward view and allow direct interaction without excessive head movement. In a within-subjects study with 16 participants in a virtual museum, we compared these methods against a baseline front-facing approach. Results showed that MiniMap was the preferred technique, significantly improving spatial awareness, usability, and user comfort. Our findings emphasize the importance of integrating rear-space awareness in social VR negotiation systems to enhance interaction efficiency, comfort, and immersion.
{"title":"Negotiating without turning: Exploring rear-space interaction for negotiated teleportation in VR","authors":"Hao-Zhong Yang , Wen-Tong Shu , Yi-Jun Li , Miao Wang","doi":"10.1016/j.cag.2025.104522","DOIUrl":"10.1016/j.cag.2025.104522","url":null,"abstract":"<div><div>Social virtual reality enables multi-user co-presence and collaboration but introduces privacy challenges such as personal space intrusion and unwanted interruptions. Teleportation negotiation techniques help address these issues by allowing users to define teleportation-permitted zone, maintaining spatial boundaries and comfort. However, existing methods primarily focus on forward view and often require physical rotation to monitor and respond to requests originating from behind. This can disrupt immersion and reduce social presence.</div><div>To better understand these challenges, we first conducted a preliminary study to identify users’ needs for rear-space awareness during teleportation negotiation. Based on the findings, we designed two rear-awareness negotiation techniques, Window negotiation and MiniMap negotiation. These techniques display rear-space information within the forward view and allow direct interaction without excessive head movement. In a within-subjects study with 16 participants in a virtual museum, we compared these methods against a baseline front-facing approach. Results showed that MiniMap was the preferred technique, significantly improving spatial awareness, usability, and user comfort. Our findings emphasize the importance of integrating rear-space awareness in social VR negotiation systems to enhance interaction efficiency, comfort, and immersion.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"134 ","pages":"Article 104522"},"PeriodicalIF":2.8,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145796672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-06DOI: 10.1016/j.cag.2025.104510
Xiaohui Li , Xiaolong Liu , Zhongchen Shi , Wei Chen , Liang Xie , Meng Gai , Jun Cao , Suxia Zhang , Erwei Yin
Cave Automatic Virtual Environment (CAVE) is one of the virtual reality (VR) immersive devices currently used to present virtual environments. However, the locomotion methods in the CAVE are limited by unnatural interaction methods, severely hindering the user experience and immersion in the CAVE. We proposed a locomotion framework for CAVE environments aimed at enhancing the immersive locomotion experience through optimized human motion recognition technology. Firstly, we construct a four-sided display CAVE system, then through the dynamic method based on Perspective-n-Point to calibrate the camera, using the obtained camera intrinsics and extrinsic parameters, and an action recognition architecture to get the action category. At last, transform the action category to a graphical workstation that renders display effects on the screen. We designed a user study to validate the effectiveness of our method. Compared to the traditional methods, our method has significant improvements in realness and self-presence in the virtual environment, effectively reducing motion sickness.
{"title":"Locomotion in CAVE: Enhancing immersion through full-body motion","authors":"Xiaohui Li , Xiaolong Liu , Zhongchen Shi , Wei Chen , Liang Xie , Meng Gai , Jun Cao , Suxia Zhang , Erwei Yin","doi":"10.1016/j.cag.2025.104510","DOIUrl":"10.1016/j.cag.2025.104510","url":null,"abstract":"<div><div>Cave Automatic Virtual Environment (CAVE) is one of the virtual reality (VR) immersive devices currently used to present virtual environments. However, the locomotion methods in the CAVE are limited by unnatural interaction methods, severely hindering the user experience and immersion in the CAVE. We proposed a locomotion framework for CAVE environments aimed at enhancing the immersive locomotion experience through optimized human motion recognition technology. Firstly, we construct a four-sided display CAVE system, then through the dynamic method based on Perspective-n-Point to calibrate the camera, using the obtained camera intrinsics and extrinsic parameters, and an action recognition architecture to get the action category. At last, transform the action category to a graphical workstation that renders display effects on the screen. We designed a user study to validate the effectiveness of our method. Compared to the traditional methods, our method has significant improvements in realness and self-presence in the virtual environment, effectively reducing motion sickness.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"134 ","pages":"Article 104510"},"PeriodicalIF":2.8,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145796671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-05DOI: 10.1016/j.cag.2025.104512
Chun-Hau Yu, Yu-Hsiang Chen, Cheng-Yen Yu, Li-Chen Fu
3D human avatar reconstruction has become a popular research field in recent years. Although many studies have shown remarkable results, most existing methods either impose overly strict data requirements, such as depth information or multi-view images, or suffer from significant performance drops in specific areas. To address these challenges, we propose HARDER. We combine the Score Distillation Sampling (SDS) technique with the designed modules, Feature-Specific Image Captioning (FSIC) and RADR (Region-Aware Differentiable Rendering), allowing the Latent Diffusion Model (LDM) to guide the reconstruction process, especially in unseen regions. Furthermore, we have developed various training strategies, including personalized LDM, delayed SDS, focused SDS, and multi-pose SDS, to make the training process more efficient.
Our avatars use an explicit representation that is compatible with modern computer graphics pipelines. Also, the entire reconstruction and real-time animation process can be completed on a single consumer-grade GPU, making this application more accessible.
{"title":"HARDER: 3D human avatar reconstruction with distillation and explicit representation","authors":"Chun-Hau Yu, Yu-Hsiang Chen, Cheng-Yen Yu, Li-Chen Fu","doi":"10.1016/j.cag.2025.104512","DOIUrl":"10.1016/j.cag.2025.104512","url":null,"abstract":"<div><div>3D human avatar reconstruction has become a popular research field in recent years. Although many studies have shown remarkable results, most existing methods either impose overly strict data requirements, such as depth information or multi-view images, or suffer from significant performance drops in specific areas. To address these challenges, we propose HARDER. We combine the Score Distillation Sampling (SDS) technique with the designed modules, Feature-Specific Image Captioning (FSIC) and RADR (Region-Aware Differentiable Rendering), allowing the Latent Diffusion Model (LDM) to guide the reconstruction process, especially in unseen regions. Furthermore, we have developed various training strategies, including personalized LDM, delayed SDS, focused SDS, and multi-pose SDS, to make the training process more efficient.</div><div>Our avatars use an explicit representation that is compatible with modern computer graphics pipelines. Also, the entire reconstruction and real-time animation process can be completed on a single consumer-grade GPU, making this application more accessible.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"134 ","pages":"Article 104512"},"PeriodicalIF":2.8,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145796668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}