Pub Date : 2025-04-01DOI: 10.1016/j.vrih.2025.02.001
Xiongjie Yin , Jinquan He , Zhanglin Cheng
Efficient three-dimensional (3D) building reconstruction from drone imagery often faces data acquisition, storage, and computational challenges because of its reliance on dense point clouds. In this study, we introduced a novel method for efficient and lightweight 3D building reconstruction from drone imagery using line clouds and sparse point clouds. Our approach eliminates the need to generate dense point clouds, and thus significantly reduces the computational burden by reconstructing 3D models directly from sparse data. We addressed the limitations of line clouds for plane detection and reconstruction by using a new algorithm. This algorithm projects 3D line clouds onto a 2D plane, clusters the projections to identify potential planes, and refines them using sparse point clouds to ensure an accurate and efficient model reconstruction. Extensive qualitative and quantitative experiments demonstrated the effectiveness of our method, demonstrating its superiority over existing techniques in terms of simplicity and efficiency.
{"title":"Efficient and lightweight 3D building reconstruction from drone imagery using sparse line and point clouds","authors":"Xiongjie Yin , Jinquan He , Zhanglin Cheng","doi":"10.1016/j.vrih.2025.02.001","DOIUrl":"10.1016/j.vrih.2025.02.001","url":null,"abstract":"<div><div>Efficient three-dimensional (3D) building reconstruction from drone imagery often faces data acquisition, storage, and computational challenges because of its reliance on dense point clouds. In this study, we introduced a novel method for efficient and lightweight 3D building reconstruction from drone imagery using line clouds and sparse point clouds. Our approach eliminates the need to generate dense point clouds, and thus significantly reduces the computational burden by reconstructing 3D models directly from sparse data. We addressed the limitations of line clouds for plane detection and reconstruction by using a new algorithm. This algorithm projects 3D line clouds onto a 2D plane, clusters the projections to identify potential planes, and refines them using sparse point clouds to ensure an accurate and efficient model reconstruction. Extensive qualitative and quantitative experiments demonstrated the effectiveness of our method, demonstrating its superiority over existing techniques in terms of simplicity and efficiency.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 111-126"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01DOI: 10.1016/j.vrih.2024.08.002
Tao Peng, Weiqiao Yin, Junping Liu, Li Li, Xinrong Hu
Background
The annotation of fashion images is a significantly important task in the fashion industry as well as social media and e-commerce. However, owing to the complexity and diversity of fashion images, this task entails multiple challenges, including the lack of fine-grained captions and confounders caused by dataset bias. Specifically, confounders often cause models to learn spurious correlations, thereby reducing their generalization capabilities.
Method
In this work, we propose the Deconfounded Fashion Image Captioning (DFIC) framework, which first uses multimodal retrieval to enrich the predicted captions of clothing, and then constructs a detailed causal graph using causal inference in the decoder to perform deconfounding. Multimodal retrieval is used to obtain semantic words related to image features, which are input into the decoder as prompt words to enrich sentence descriptions. In the decoder, causal inference is applied to disentangle visual and semantic features while concurrently eliminating visual and language confounding.
Results
Overall, our method can not only effectively enrich the captions of target images, but also greatly reduce confounders caused by the dataset. To verify the effectiveness of the proposed framework, the model was experimentally verified using the FACAD dataset.
{"title":"Deconfounded fashion image captioning with transformer and multimodal retrieval","authors":"Tao Peng, Weiqiao Yin, Junping Liu, Li Li, Xinrong Hu","doi":"10.1016/j.vrih.2024.08.002","DOIUrl":"10.1016/j.vrih.2024.08.002","url":null,"abstract":"<div><h3>Background</h3><div>The annotation of fashion images is a significantly important task in the fashion industry as well as social media and e-commerce. However, owing to the complexity and diversity of fashion images, this task entails multiple challenges, including the lack of fine-grained captions and confounders caused by dataset bias. Specifically, confounders often cause models to learn spurious correlations, thereby reducing their generalization capabilities.</div></div><div><h3>Method</h3><div>In this work, we propose the Deconfounded Fashion Image Captioning (DFIC) framework, which first uses multimodal retrieval to enrich the predicted captions of clothing, and then constructs a detailed causal graph using causal inference in the decoder to perform deconfounding. Multimodal retrieval is used to obtain semantic words related to image features, which are input into the decoder as prompt words to enrich sentence descriptions. In the decoder, causal inference is applied to disentangle visual and semantic features while concurrently eliminating visual and language confounding.</div></div><div><h3>Results</h3><div>Overall, our method can not only effectively enrich the captions of target images, but also greatly reduce confounders caused by the dataset. To verify the effectiveness of the proposed framework, the model was experimentally verified using the FACAD dataset.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 127-138"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-01DOI: 10.1016/j.vrih.2024.08.005
Amir Azizi , Panayiotis Charalambous , Yiorgos Chrysanthou
Background
Efficient disaster victim detection (DVD) in urban areas after natural disasters is crucial for minimizing losses. However, conventional search and rescue (SAR) methods often experience delays, which can hinder the timely detection of victims. SAR teams face various challenges, including limited access to debris and collapsed structures, safety risks due to unstable conditions, and disrupted communication networks.
Methods
In this paper, we present DeepSafe, a novel two-level deep learning approach for multilevel classification and object detection using a simulated disaster victim dataset. DeepSafe first employs YOLOv8 to classify images into victim and non-victim categories. Subsequently, Detectron2 is used to precisely locate and outline the victims.
Results
Experimental results demonstrate the promising performance of DeepSafe in both victim classification and detection. The model effectively identified and located victims under the challenging conditions presented in the dataset.
Conclusion
DeepSafe offers a practical tool for real-time disaster management and SAR operations, significantly improving conventional methods by reducing delays and enhancing victim detection accuracy in disaster-stricken urban areas.
{"title":"DeepSafe:Two-level deep learning approach for disaster victims detection","authors":"Amir Azizi , Panayiotis Charalambous , Yiorgos Chrysanthou","doi":"10.1016/j.vrih.2024.08.005","DOIUrl":"10.1016/j.vrih.2024.08.005","url":null,"abstract":"<div><h3>Background</h3><div>Efficient disaster victim detection (DVD) in urban areas after natural disasters is crucial for minimizing losses. However, conventional search and rescue (SAR) methods often experience delays, which can hinder the timely detection of victims. SAR teams face various challenges, including limited access to debris and collapsed structures, safety risks due to unstable conditions, and disrupted communication networks.</div></div><div><h3>Methods</h3><div>In this paper, we present DeepSafe, a novel two-level deep learning approach for multilevel classification and object detection using a simulated disaster victim dataset. DeepSafe first employs YOLOv8 to classify images into victim and non-victim categories. Subsequently, Detectron2 is used to precisely locate and outline the victims.</div></div><div><h3>Results</h3><div>Experimental results demonstrate the promising performance of DeepSafe in both victim classification and detection. The model effectively identified and located victims under the challenging conditions presented in the dataset.</div></div><div><h3>Conclusion</h3><div>DeepSafe offers a practical tool for real-time disaster management and SAR operations, significantly improving conventional methods by reducing delays and enhancing victim detection accuracy in disaster-stricken urban areas.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 2","pages":"Pages 139-154"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The redirected walking (RDW) method for multi-user collaboration requires maintaining the relative position between users in a virtual environment (VE) and physical environment (PE). A chasing game in a VE is a typical virtual reality game that entails multi-user collaboration. When a user approaches and interacts with a target user in the VE, the user is expected to approach and interact with the target user in the corresponding PE as well. Existing methods of multi-user RDW mainly focus on obstacle avoidance, which does not account for the relative positional relationship between the users in both VE and PE.
Methods
To enhance the user experience and facilitate potential interaction, this paper presents a novel dynamic alignment algorithm for multi-user collaborative redirected walking (DA-RDW) in a shared PE where the target user and other users are moving. This algorithm adopts improved artificial potential fields, where the repulsive force is a function of the relative position and velocity of the user with respect to dynamic obstacles. For the best alignment, this algorithm sets the alignment-guidance force in several cases and then converts it into a constrained optimization problem to obtain the optimal direction. Moreover, this algorithm introduces a potential interaction object selection strategy for a dynamically uncertain environment to speed up the subsequent alignment. To balance obstacle avoidance and alignment, this algorithm uses the dynamic weightings of the virtual and physical distances between users and the target to determine the resultant force vector.
Results
The efficacy of the proposed method was evaluated using a series of simulations and live-user experiments. The experimental results demonstrate that our novel dynamic alignment method for multi-user collaborative redirected walking can reduce the distance error in both VE and PE to improve alignment with fewer collisions.
{"title":"Chasing in virtual environment:Dynamic alignment for multi-user collaborative redirected walking","authors":"Tianyang Dong, Shuqian Lv, Hubin Kong, Huanbo Zhang","doi":"10.1016/j.vrih.2024.07.002","DOIUrl":"10.1016/j.vrih.2024.07.002","url":null,"abstract":"<div><h3>Background</h3><div>The redirected walking (RDW) method for multi-user collaboration requires maintaining the relative position between users in a virtual environment (VE) and physical environment (PE). A chasing game in a VE is a typical virtual reality game that entails multi-user collaboration. When a user approaches and interacts with a target user in the VE, the user is expected to approach and interact with the target user in the corresponding PE as well. Existing methods of multi-user RDW mainly focus on obstacle avoidance, which does not account for the relative positional relationship between the users in both VE and PE.</div></div><div><h3>Methods</h3><div>To enhance the user experience and facilitate potential interaction, this paper presents a novel dynamic alignment algorithm for multi-user collaborative redirected walking (DA-RDW) in a shared PE where the target user and other users are moving. This algorithm adopts improved artificial potential fields, where the repulsive force is a function of the relative position and velocity of the user with respect to dynamic obstacles. For the best alignment, this algorithm sets the alignment-guidance force in several cases and then converts it into a constrained optimization problem to obtain the optimal direction. Moreover, this algorithm introduces a potential interaction object selection strategy for a dynamically uncertain environment to speed up the subsequent alignment. To balance obstacle avoidance and alignment, this algorithm uses the dynamic weightings of the virtual and physical distances between users and the target to determine the resultant force vector.</div></div><div><h3>Results</h3><div>The efficacy of the proposed method was evaluated using a series of simulations and live-user experiments. The experimental results demonstrate that our novel dynamic alignment method for multi-user collaborative redirected walking can reduce the distance error in both VE and PE to improve alignment with fewer collisions.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 1","pages":"Pages 26-46"},"PeriodicalIF":0.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.vrih.2024.08.003
Ruizhi Chen
Background
With the development of the Internet, the topology optimization of wireless sensor networks has received increasing attention. However, traditional optimization methods often overlook the energy imbalance caused by node loads, which affects network performance.
Methods
To improve the overall performance and efficiency of wireless sensor networks, a new method for optimizing the wireless sensor network topology based on K-means clustering and firefly algorithms is proposed. The K-means clustering algorithm partitions nodes by minimizing the within-cluster variance, while the firefly algorithm is an optimization algorithm based on swarm intelligence that simulates the flashing interaction between fireflies to guide the search process. The proposed method first introduces the K-means clustering algorithm to cluster nodes and then introduces a firefly algorithm to dynamically adjust the nodes.
Results
The results showed that the average clustering accuracies in the Wine and Iris data sets were 86.59% and 94.55%, respectively, demonstrating good clustering performance. When calculating the node mortality rate and network load balancing standard deviation, the proposed algorithm showed dead nodes at approximately 50 iterations, with an average load balancing standard deviation of 1.7×104, proving its contribution to extending the network lifespan.
Conclusions
This demonstrates the superiority of the proposed algorithm in significantly improving the energy efficiency and load balancing of wireless sensor networks to extend the network lifespan. The research results indicate that wireless sensor networks have theoretical and practical significance in fields such as monitoring, healthcare, and agriculture.
{"title":"Optimizing wireless sensor network topology with node load consideration","authors":"Ruizhi Chen","doi":"10.1016/j.vrih.2024.08.003","DOIUrl":"10.1016/j.vrih.2024.08.003","url":null,"abstract":"<div><h3>Background</h3><div>With the development of the Internet, the topology optimization of wireless sensor networks has received increasing attention. However, traditional optimization methods often overlook the energy imbalance caused by node loads, which affects network performance.</div></div><div><h3>Methods</h3><div>To improve the overall performance and efficiency of wireless sensor networks, a new method for optimizing the wireless sensor network topology based on K-means clustering and firefly algorithms is proposed. The K-means clustering algorithm partitions nodes by minimizing the within-cluster variance, while the firefly algorithm is an optimization algorithm based on swarm intelligence that simulates the flashing interaction between fireflies to guide the search process. The proposed method first introduces the K-means clustering algorithm to cluster nodes and then introduces a firefly algorithm to dynamically adjust the nodes.</div></div><div><h3>Results</h3><div>The results showed that the average clustering accuracies in the Wine and Iris data sets were 86.59% and 94.55%, respectively, demonstrating good clustering performance. When calculating the node mortality rate and network load balancing standard deviation, the proposed algorithm showed dead nodes at approximately 50 iterations, with an average load balancing standard deviation of 1.7×10<sup>4</sup>, proving its contribution to extending the network lifespan.</div></div><div><h3>Conclusions</h3><div>This demonstrates the superiority of the proposed algorithm in significantly improving the energy efficiency and load balancing of wireless sensor networks to extend the network lifespan. The research results indicate that wireless sensor networks have theoretical and practical significance in fields such as monitoring, healthcare, and agriculture.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 1","pages":"Pages 47-61"},"PeriodicalIF":0.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.vrih.2024.03.001
Roshan Thilakarathna, Maroay Phlernjai
Background
With the increasing prominence of hand and finger motion tracking in virtual reality (VR) applications and rehabilitation studies, data gloves have emerged as a prevalent solution. In this study, we developed an innovative, lightweight, and detachable data glove tailored for finger motion tracking in VR environments.
Methods
The glove design incorporates a potentiometer coupled with a flexible rack and pinion gear system, facilitating precise and natural hand gestures for interaction with VR applications. Initially, we calibrated the potentiometer to align with the actual finger bending angle, and verified the accuracy of angle measurements recorded by the data glove. To verify the precision and reliability of our data glove, we conducted repeatability testing for flexion (grip test) and extension (flat test), with 250 measurements each, across five users. We employed the Gage Repeatability and Reproducibility to analyze and interpret the repeatable data. Furthermore, we integrated the gloves into a SteamVR home environment using the OpenGlove auto-calibration tool.
Conclusions
The repeatability analysis revealed an aggregate error of 1.45 degrees in both the gripped and flat hand positions. This outcome was notably favorable when compared with the findings from assessments of nine alternative data gloves that employed similar protocols. In these experiments, users navigated and engaged with virtual objects, underlining the glove's exact tracking of finger motion. Furthermore, the proposed data glove exhibited a low response time of 17–34 ms and back-drive force of only 0.19 N. Additionally, according to a comfort evaluation using the Comfort Rating Scales, the proposed glove system is wearable, placing it at the WL1 level.
{"title":"Finger tracking for wearable VR glove using flexible rack mechanism","authors":"Roshan Thilakarathna, Maroay Phlernjai","doi":"10.1016/j.vrih.2024.03.001","DOIUrl":"10.1016/j.vrih.2024.03.001","url":null,"abstract":"<div><h3>Background</h3><div>With the increasing prominence of hand and finger motion tracking in virtual reality (VR) applications and rehabilitation studies, data gloves have emerged as a prevalent solution. In this study, we developed an innovative, lightweight, and detachable data glove tailored for finger motion tracking in VR environments.</div></div><div><h3>Methods</h3><div>The glove design incorporates a potentiometer coupled with a flexible rack and pinion gear system, facilitating precise and natural hand gestures for interaction with VR applications. Initially, we calibrated the potentiometer to align with the actual finger bending angle, and verified the accuracy of angle measurements recorded by the data glove. To verify the precision and reliability of our data glove, we conducted repeatability testing for flexion (grip test) and extension (flat test), with 250 measurements each, across five users. We employed the Gage Repeatability and Reproducibility to analyze and interpret the repeatable data. Furthermore, we integrated the gloves into a SteamVR home environment using the OpenGlove auto-calibration tool.</div></div><div><h3>Conclusions</h3><div>The repeatability analysis revealed an aggregate error of 1.45 degrees in both the gripped and flat hand positions. This outcome was notably favorable when compared with the findings from assessments of nine alternative data gloves that employed similar protocols. In these experiments, users navigated and engaged with virtual objects, underlining the glove's exact tracking of finger motion. Furthermore, the proposed data glove exhibited a low response time of 17–34 ms and back-drive force of only 0.19 N. Additionally, according to a comfort evaluation using the Comfort Rating Scales, the proposed glove system is wearable, placing it at the WL1 level.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 1","pages":"Pages 1-25"},"PeriodicalIF":0.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.vrih.2024.06.001
Weimin SHI, Yuan XIONG, Qianwen WANG, Han JIANG, Zhong ZHOU
Background
Three-dimensional (3D) shape representation using mesh data is essential in various applications, such as virtual reality and simulation technologies. Current methods for extracting features from mesh edges or faces struggle with complex 3D models because edge-based approaches miss global contexts and face-based methods overlook variations in adjacent areas, which affects the overall precision. To address these issues, we propose the Feature Discrimination and Context Propagation Network (FDCPNet), which is a novel approach that synergistically integrates local and global features in mesh datasets.
Methods
FDCPNet is composed of two modules: (1) the Feature Discrimination Module, which employs an attention mechanism to enhance the identification of key local features, and (2) the Context Propagation Module, which enriches key local features by integrating global contextual information, thereby facilitating a more detailed and comprehensive representation of crucial areas within the mesh model.
Results
Experiments on popular datasets validated the effectiveness of FDCPNet, showing an improvement in the classification accuracy over the baseline MeshNet. Furthermore, even with reduced mesh face numbers and limited training data, FDCPNet achieved promising results, demonstrating its robustness in scenarios of variable complexity.
{"title":"FDCPNet:feature discrimination and context propagation network for 3D shape representation","authors":"Weimin SHI, Yuan XIONG, Qianwen WANG, Han JIANG, Zhong ZHOU","doi":"10.1016/j.vrih.2024.06.001","DOIUrl":"10.1016/j.vrih.2024.06.001","url":null,"abstract":"<div><h3>Background</h3><div>Three-dimensional (3D) shape representation using mesh data is essential in various applications, such as virtual reality and simulation technologies. Current methods for extracting features from mesh edges or faces struggle with complex 3D models because edge-based approaches miss global contexts and face-based methods overlook variations in adjacent areas, which affects the overall precision. To address these issues, we propose the Feature Discrimination and Context Propagation Network (FDCPNet), which is a novel approach that synergistically integrates local and global features in mesh datasets.</div></div><div><h3>Methods</h3><div>FDCPNet is composed of two modules: (1) the Feature Discrimination Module, which employs an attention mechanism to enhance the identification of key local features, and (2) the Context Propagation Module, which enriches key local features by integrating global contextual information, thereby facilitating a more detailed and comprehensive representation of crucial areas within the mesh model.</div></div><div><h3>Results</h3><div>Experiments on popular datasets validated the effectiveness of FDCPNet, showing an improvement in the classification accuracy over the baseline MeshNet. Furthermore, even with reduced mesh face numbers and limited training data, FDCPNet achieved promising results, demonstrating its robustness in scenarios of variable complexity.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 1","pages":"Pages 83-94"},"PeriodicalIF":0.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.vrih.2024.07.001
Yifan FU, Jialin LIU, Xu LI, Xiaoying SUN
Background
Haptic feedback plays a crucial role in virtual reality (VR) interaction, helping to improve the precision of user operation and enhancing the immersion of the user experience. Instrumental haptic feedback in virtual environments is primarily realized using grounded force or vibration feedback devices. However, improvements are required in terms of the active space and feedback realism.
Methods
We propose a lightweight and flexible haptic feedback glove that can haptically render objects in VR environments via kinesthetic and vibration feedback, thereby enabling users to enjoy a rich virtual piano-playing experience. The kinesthetic feedback of the glove relies on a cable-pulling mechanism that rotates the mechanism and pulls the two cables connected to it, thereby changing the amount of force generated to simulate the hardness or softness of the object. Vibration feedback is provided by small vibration motors embedded in the bottom of the fingertips of the glove. We designed a piano-playing scenario in the virtual environment and conducted user tests. The evaluation metrics were clarity, realism, enjoyment, and satisfaction.
Results
A total of 14 subjects participated in the test, and the results showed that our proposed glove scored significantly higher on the four evaluation metrics than the no-feedback and vibration feedback methods.
Conclusions
Our proposed glove significantly enhances the user experience when interacting with virtual objects.
{"title":"A haptic feedback glove for virtual piano interaction","authors":"Yifan FU, Jialin LIU, Xu LI, Xiaoying SUN","doi":"10.1016/j.vrih.2024.07.001","DOIUrl":"10.1016/j.vrih.2024.07.001","url":null,"abstract":"<div><h3>Background</h3><div>Haptic feedback plays a crucial role in virtual reality (VR) interaction, helping to improve the precision of user operation and enhancing the immersion of the user experience. Instrumental haptic feedback in virtual environments is primarily realized using grounded force or vibration feedback devices. However, improvements are required in terms of the active space and feedback realism.</div></div><div><h3>Methods</h3><div>We propose a lightweight and flexible haptic feedback glove that can haptically render objects in VR environments via kinesthetic and vibration feedback, thereby enabling users to enjoy a rich virtual piano-playing experience. The kinesthetic feedback of the glove relies on a cable-pulling mechanism that rotates the mechanism and pulls the two cables connected to it, thereby changing the amount of force generated to simulate the hardness or softness of the object. Vibration feedback is provided by small vibration motors embedded in the bottom of the fingertips of the glove. We designed a piano-playing scenario in the virtual environment and conducted user tests. The evaluation metrics were clarity, realism, enjoyment, and satisfaction.</div></div><div><h3>Results</h3><div>A total of 14 subjects participated in the test, and the results showed that our proposed glove scored significantly higher on the four evaluation metrics than the no-feedback and vibration feedback methods.</div></div><div><h3>Conclusions</h3><div>Our proposed glove significantly enhances the user experience when interacting with virtual objects.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 1","pages":"Pages 95-110"},"PeriodicalIF":0.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-01DOI: 10.1016/j.vrih.2024.05.001
Juncheng ZHANG , Fuyang KE , Qinqin TANG , Wenming YU , Ming ZHANG
Background
As visual simultaneous localization and mapping (SLAM) is primarily based on the assumption of a static scene, the presence of dynamic objects in the frame causes problems such as a deterioration of system robustness and inaccurate position estimation. In this study, we propose a YGC-SLAM for indoor dynamic environments based on the ORB-SLAM2 framework combined with semantic and geometric constraints to improve the positioning accuracy and robustness of the system.
Methods
First, the recognition accuracy of YOLOv5 was improved by introducing the convolution block attention model and the improved EIOU loss function, whereby the prediction frame converges quickly for better detection. The improved YOLOv5 was then added to the tracking thread for dynamic target detection to eliminate dynamic points. Subsequently, multi-view geometric constraints were used for re-judging to further eliminate dynamic points while enabling more useful feature points to be retained and preventing the semantic approach from over-eliminating feature points, causing a failure of map building. The K-means clustering algorithm was used to accelerate this process and quickly calculate and determine the motion state of each cluster of pixel points. Finally, a strategy for drawing keyframes with de-redundancy was implemented to construct a clear 3D dense static point-cloud map.
Results
Through testing on TUM dataset and a real environment, the experimental results show that our algorithm reduces the absolute trajectory error by 98.22% and the relative trajectory error by 97.98% compared with the original ORB-SLAM2, which is more accurate and has better real-time performance than similar algorithms, such as DynaSLAM and DS-SLAM.
Conclusions
The YGC-SLAM proposed in this study can effectively eliminate the adverse effects of dynamic objects, and the system can better complete positioning and map building tasks in complex environments.
{"title":"YGC-SLAM:A visual SLAM based on improved YOLOv5 and geometric constraints for dynamic indoor environments","authors":"Juncheng ZHANG , Fuyang KE , Qinqin TANG , Wenming YU , Ming ZHANG","doi":"10.1016/j.vrih.2024.05.001","DOIUrl":"10.1016/j.vrih.2024.05.001","url":null,"abstract":"<div><h3>Background</h3><div>As visual simultaneous localization and mapping (SLAM) is primarily based on the assumption of a static scene, the presence of dynamic objects in the frame causes problems such as a deterioration of system robustness and inaccurate position estimation. In this study, we propose a YGC-SLAM for indoor dynamic environments based on the ORB-SLAM2 framework combined with semantic and geometric constraints to improve the positioning accuracy and robustness of the system.</div></div><div><h3>Methods</h3><div>First, the recognition accuracy of YOLOv5 was improved by introducing the convolution block attention model and the improved EIOU loss function, whereby the prediction frame converges quickly for better detection. The improved YOLOv5 was then added to the tracking thread for dynamic target detection to eliminate dynamic points. Subsequently, multi-view geometric constraints were used for re-judging to further eliminate dynamic points while enabling more useful feature points to be retained and preventing the semantic approach from over-eliminating feature points, causing a failure of map building. The K-means clustering algorithm was used to accelerate this process and quickly calculate and determine the motion state of each cluster of pixel points. Finally, a strategy for drawing keyframes with de-redundancy was implemented to construct a clear 3D dense static point-cloud map.</div></div><div><h3>Results</h3><div>Through testing on TUM dataset and a real environment, the experimental results show that our algorithm reduces the absolute trajectory error by 98.22% and the relative trajectory error by 97.98% compared with the original ORB-SLAM2, which is more accurate and has better real-time performance than similar algorithms, such as DynaSLAM and DS-SLAM.</div></div><div><h3>Conclusions</h3><div>The YGC-SLAM proposed in this study can effectively eliminate the adverse effects of dynamic objects, and the system can better complete positioning and map building tasks in complex environments.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"7 1","pages":"Pages 62-82"},"PeriodicalIF":0.0,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01DOI: 10.1016/j.vrih.2024.08.001
Tian ZHENG , Xinheng WANG , Xiaolan PENG , Ning SU , Tianyi XU , Xurong XIE , Jin HUANG , Lun XIE , Feng TIAN
The global trend of population aging poses significant challenges to society and healthcare systems, particularly because of neurocognitive disorders (NCDs) such as Parkinson's disease (PD) and Alzheimer's disease (AD). In this context, artificial intelligence techniques have demonstrated promising potential for the objective assessment and detection of NCDs. Multimodal contactless screening technologies, such as speech-language processing, computer vision, and virtual reality, offer efficient and convenient methods for disease diagnosis and progression tracking. This paper systematically reviews the specific methods and applications of these technologies in the detection of NCDs using data collection paradigms, feature extraction, and modeling approaches. Additionally, the potential applications and future prospects of these technologies for the detection of cognitive and motor disorders are explored. By providing a comprehensive summary and refinement of the extant theories, methodologies, and applications, this study aims to facilitate an in-depth understanding of these technologies for researchers, both within and outside the field. To the best of our knowledge, this is the first survey to cover the use of speech-language processing, computer vision, and virtual reality technologies for the detection of NSDs.
{"title":"Survey of neurocognitive disorder detection methods based on speech, visual, and virtual reality technologies","authors":"Tian ZHENG , Xinheng WANG , Xiaolan PENG , Ning SU , Tianyi XU , Xurong XIE , Jin HUANG , Lun XIE , Feng TIAN","doi":"10.1016/j.vrih.2024.08.001","DOIUrl":"10.1016/j.vrih.2024.08.001","url":null,"abstract":"<div><div>The global trend of population aging poses significant challenges to society and healthcare systems, particularly because of neurocognitive disorders (NCDs) such as Parkinson's disease (PD) and Alzheimer's disease (AD). In this context, artificial intelligence techniques have demonstrated promising potential for the objective assessment and detection of NCDs. Multimodal contactless screening technologies, such as speech-language processing, computer vision, and virtual reality, offer efficient and convenient methods for disease diagnosis and progression tracking. This paper systematically reviews the specific methods and applications of these technologies in the detection of NCDs using data collection paradigms, feature extraction, and modeling approaches. Additionally, the potential applications and future prospects of these technologies for the detection of cognitive and motor disorders are explored. By providing a comprehensive summary and refinement of the extant theories, methodologies, and applications, this study aims to facilitate an in-depth understanding of these technologies for researchers, both within and outside the field. To the best of our knowledge, this is the first survey to cover the use of speech-language processing, computer vision, and virtual reality technologies for the detection of NSDs.</div></div>","PeriodicalId":33538,"journal":{"name":"Virtual Reality Intelligent Hardware","volume":"6 6","pages":"Pages 421-472"},"PeriodicalIF":0.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143315199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}