Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3624569
Huawei Tu, BoYu Gao, Yujun Lu, Weiqiang Xin, Hui Cui, Weiqi Luo, Jian Weng, Henry Been-Lirn Duh
This study explores the design space of two-handed input (i.e., clicking or tapping with the thumb) on the touchpads of controllers for virtual reality (VR) interaction. Four experiments were conducted to fulfill this purpose. Experiment 1 investigated how users employed two VR controllers to perform four representative interaction tasks in VR and identified 14 potentially usable two-handed operations that involved tapping or clicking. Experiments 2 and 3 analyzed user performance of the 14 operations, providing insights into their interaction characteristics in terms of completion time, accuracy, and subjective feedback. In Experiment 4, we designed a command-input technique based on the proposed operations. We verified its effectiveness compared to context menus and marking menus in a VR text entry scenario. Our technique generally had shorter times and similar accuracy to the two menu types. Our work contributes to the design of VR interactions using two-handed controllers.
{"title":"Two-Handed Click and Tap: Expanding Input Vocabulary of Controllers for Virtual Reality Interaction.","authors":"Huawei Tu, BoYu Gao, Yujun Lu, Weiqiang Xin, Hui Cui, Weiqi Luo, Jian Weng, Henry Been-Lirn Duh","doi":"10.1109/TVCG.2025.3624569","DOIUrl":"10.1109/TVCG.2025.3624569","url":null,"abstract":"<p><p>This study explores the design space of two-handed input (i.e., clicking or tapping with the thumb) on the touchpads of controllers for virtual reality (VR) interaction. Four experiments were conducted to fulfill this purpose. Experiment 1 investigated how users employed two VR controllers to perform four representative interaction tasks in VR and identified 14 potentially usable two-handed operations that involved tapping or clicking. Experiments 2 and 3 analyzed user performance of the 14 operations, providing insights into their interaction characteristics in terms of completion time, accuracy, and subjective feedback. In Experiment 4, we designed a command-input technique based on the proposed operations. We verified its effectiveness compared to context menus and marking menus in a VR text entry scenario. Our technique generally had shorter times and similar accuracy to the two menu types. Our work contributes to the design of VR interactions using two-handed controllers.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1682-1697"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145357317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3636949
Haiming Zhu, Yangyang Xu, Jun Yu, Shengfeng He
With the revolution of generative AI, video-related tasks have been widely studied. However, current state-of-the-art video models still lag behind image models in visual quality and user control over generated content. In this paper, we introduce TokenWarping, a novel framework for temporally coherent video translation. Existing diffusion-based video editing approaches rely solely on key and value patches in self-attention to ensure temporal consistency, often sacrificing the preservation of local and structural regions. Critically, these methods overlook the significance of the query patches in achieving accurate feature aggregation and temporal coherence. In contrast, TokenWarping leverages complementary token priors by constructing temporal correlations across different frames. Our method begins by extracting optical flows from source videos. During the denoising process of the diffusion model, these optical flows are used to warp the previous frame's query, key, and value patches, aligning them with the current frame's patches. By directly warping the query patches, we enhance feature aggregation in self-attention, while warping the key and value patches ensures temporal consistency across frames. This token warping imposes explicit constraints on the self-attention layer outputs, effectively ensuring temporally coherent translation. Our framework does not require any additional training or fine-tuning and can be seamlessly integrated with existing text-to-image editing methods. We conduct extensive experiments on various video translation tasks, demonstrating that TokenWarping surpasses state-of-the-art methods both qualitatively and quantitatively. Video demonstrations are available in supplementary materials.
{"title":"Zero-Shot Video Translation via Token Warping.","authors":"Haiming Zhu, Yangyang Xu, Jun Yu, Shengfeng He","doi":"10.1109/TVCG.2025.3636949","DOIUrl":"10.1109/TVCG.2025.3636949","url":null,"abstract":"<p><p>With the revolution of generative AI, video-related tasks have been widely studied. However, current state-of-the-art video models still lag behind image models in visual quality and user control over generated content. In this paper, we introduce TokenWarping, a novel framework for temporally coherent video translation. Existing diffusion-based video editing approaches rely solely on key and value patches in self-attention to ensure temporal consistency, often sacrificing the preservation of local and structural regions. Critically, these methods overlook the significance of the query patches in achieving accurate feature aggregation and temporal coherence. In contrast, TokenWarping leverages complementary token priors by constructing temporal correlations across different frames. Our method begins by extracting optical flows from source videos. During the denoising process of the diffusion model, these optical flows are used to warp the previous frame's query, key, and value patches, aligning them with the current frame's patches. By directly warping the query patches, we enhance feature aggregation in self-attention, while warping the key and value patches ensures temporal consistency across frames. This token warping imposes explicit constraints on the self-attention layer outputs, effectively ensuring temporally coherent translation. Our framework does not require any additional training or fine-tuning and can be seamlessly integrated with existing text-to-image editing methods. We conduct extensive experiments on various video translation tasks, demonstrating that TokenWarping surpasses state-of-the-art methods both qualitatively and quantitatively. Video demonstrations are available in supplementary materials.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1582-1592"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3638450
Qing Li, Huifang Feng, Kanle Shi, Yue Gao, Yi Fang, Yu-Shen Liu, Zhizhong Han
Estimating the normal of a point requires constructing a local patch to provide center-surrounding context, but determining the appropriate neighborhood size is difficult when dealing with different data or geometries. Existing methods commonly employ various parameter-heavy strategies to extract a full feature description from the input patch. However, they still have difficulties in accurately and efficiently predicting normals for various point clouds. In this work, we present a new idea of feature extraction for robust normal estimation of point clouds. We use the fusion of multi-scale features from different neighborhood sizes to address the issue of selecting reasonable patch sizes for various data or geometries. We seek to model a patch feature fitting (PFF) based on multi-scale features to approximate the optimal geometric description for normal estimation and implement the approximation process via multi-scale feature aggregation and cross-scale feature compensation. The feature aggregation module progressively aggregates the patch features of different scales to the center of the patch and shrinks the patch size by removing points far from the center. It not only enables the network to precisely capture the structure characteristic in a wide range, but also describes highly detailed geometries. The feature compensation module ensures the reusability of features from earlier layers of large scales and reveals associated information in different patch sizes. Our approximation strategy based on aggregating the features of multiple scales enables the model to achieve scale adaptation of varying local patches and deliver the optimal feature description. Extensive experiments demonstrate that our method achieves state-of-the-art performance on both synthetic and real-world datasets with fewer network parameters and running time.
{"title":"PFF-Net: Patch Feature Fitting for Point Cloud Normal Estimation.","authors":"Qing Li, Huifang Feng, Kanle Shi, Yue Gao, Yi Fang, Yu-Shen Liu, Zhizhong Han","doi":"10.1109/TVCG.2025.3638450","DOIUrl":"10.1109/TVCG.2025.3638450","url":null,"abstract":"<p><p>Estimating the normal of a point requires constructing a local patch to provide center-surrounding context, but determining the appropriate neighborhood size is difficult when dealing with different data or geometries. Existing methods commonly employ various parameter-heavy strategies to extract a full feature description from the input patch. However, they still have difficulties in accurately and efficiently predicting normals for various point clouds. In this work, we present a new idea of feature extraction for robust normal estimation of point clouds. We use the fusion of multi-scale features from different neighborhood sizes to address the issue of selecting reasonable patch sizes for various data or geometries. We seek to model a patch feature fitting (PFF) based on multi-scale features to approximate the optimal geometric description for normal estimation and implement the approximation process via multi-scale feature aggregation and cross-scale feature compensation. The feature aggregation module progressively aggregates the patch features of different scales to the center of the patch and shrinks the patch size by removing points far from the center. It not only enables the network to precisely capture the structure characteristic in a wide range, but also describes highly detailed geometries. The feature compensation module ensures the reusability of features from earlier layers of large scales and reveals associated information in different patch sizes. Our approximation strategy based on aggregating the features of multiple scales enables the model to achieve scale adaptation of varying local patches and deliver the optimal feature description. Extensive experiments demonstrate that our method achieves state-of-the-art performance on both synthetic and real-world datasets with fewer network parameters and running time.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1713-1728"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145644183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Human-human interaction generation has garnered significant attention in motion synthesis due to its vital role in understanding humans as social beings. However, existing methods typically rely on transformer-based architectures, which often face challenges related to scalability and efficiency. To address these challenges, we propose InterMamba, a novel and efficient human-human interaction generation method built on the Mamba framework, designed to capture long-sequence dependencies effectively while enabling real-time feedback. Specifically, we introduce an adaptive spatio-temporal Mamba framework that utilizes two parallel SSM branches with an adaptive mechanism to integrate the spatial and temporal features of motion sequences. To further enhance the model's ability to capture dependencies within individual motion sequences and the interactions between different individual sequences, we develop two key modules: the self adaptive spatio-temporal Mamba module and the cross adaptive spatio-temporal Mamba module, enabling efficient feature learning. Extensive experiments demonstrate that our method achieves the state-of-the-art results on both two interaction datasets with remarkable quality and efficiency. Compared to the baseline method InterGen, our approach not only improves accuracy but also reduces the parameter size to just 66 M (36% of InterGen's), while achieving an average inference speed of 0.57 seconds, which is 46% of InterGen's execution time.
在运动合成中,人-人互动生成由于其在理解人类作为社会生物方面的重要作用而引起了极大的关注。然而,现有的方法通常依赖于基于转换器的体系结构,这通常面临着与可伸缩性和效率相关的挑战。为了应对这些挑战,我们提出了InterMamba,这是一种基于Mamba框架的新型高效人机交互生成方法,旨在有效捕获长序列依赖关系,同时实现实时反馈。具体来说,我们引入了一个自适应时空曼巴框架,该框架利用两个平行的SSM分支和自适应机制来整合运动序列的空间和时间特征。为了进一步增强模型捕获单个运动序列中的依赖关系和不同单个序列之间相互作用的能力,我们开发了两个关键模块:自适应时空曼巴模块和交叉适应时空曼巴模块,实现了高效的特征学习。大量的实验表明,我们的方法在两个交互数据集上都达到了最先进的结果,并且具有显著的质量和效率。与基准方法InterGen相比,我们的方法不仅提高了精度,而且将参数大小减少到仅66 M (InterGen的36%),同时实现了0.57秒的平均推理速度,这是InterGen执行时间的46%。
{"title":"InterMamba: Efficient Human-Human Interaction Generation With Adaptive Spatio-Temporal Mamba.","authors":"Zizhao Wu, Yingying Sun, Yiming Chen, Xiaoling Gu, Ruyu Liu, Jiazhou Chen","doi":"10.1109/TVCG.2025.3635116","DOIUrl":"10.1109/TVCG.2025.3635116","url":null,"abstract":"<p><p>Human-human interaction generation has garnered significant attention in motion synthesis due to its vital role in understanding humans as social beings. However, existing methods typically rely on transformer-based architectures, which often face challenges related to scalability and efficiency. To address these challenges, we propose InterMamba, a novel and efficient human-human interaction generation method built on the Mamba framework, designed to capture long-sequence dependencies effectively while enabling real-time feedback. Specifically, we introduce an adaptive spatio-temporal Mamba framework that utilizes two parallel SSM branches with an adaptive mechanism to integrate the spatial and temporal features of motion sequences. To further enhance the model's ability to capture dependencies within individual motion sequences and the interactions between different individual sequences, we develop two key modules: the self adaptive spatio-temporal Mamba module and the cross adaptive spatio-temporal Mamba module, enabling efficient feature learning. Extensive experiments demonstrate that our method achieves the state-of-the-art results on both two interaction datasets with remarkable quality and efficiency. Compared to the baseline method InterGen, our approach not only improves accuracy but also reduces the parameter size to just 66 M (36% of InterGen's), while achieving an average inference speed of 0.57 seconds, which is 46% of InterGen's execution time.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1928-1940"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145574686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3622157
Ruihan Yu, Tianyu Huang, Jingwang Ling, Feng Xu
2D Gaussian Splatting has recently emerged as a significant method in 3D reconstruction, enabling novel view synthesis and geometry reconstruction simultaneously. While the well-known Gaussian kernel is broadly used, its lack of anisotropy and deformation ability leads to dim and vague edges at object silhouettes, limiting the reconstruction quality of current Gaussian splatting methods. To enhance the representation power, we draw inspiration from quantum physics and propose to use the Gaussian-Hermite kernel as the new primitive in Gaussian splatting. The new kernel takes a unified mathematical form and extends the Gaussian function, which serves as the zero-rank special case in the updated general formulation. Our experiments demonstrate that the proposed Gaussian-Hermite kernel achieves improved performance over traditional Gaussian Splatting kernels on both geometry reconstruction and novel-view synthesis tasks. Specifically, on the DTU dataset, our method yields more accurate geometry reconstruction, while on datasets such as MipNeRF360 and our customized Detail dataset, it achieves better results in novel-view synthesis. These results highlight the potential of the Gaussian-Hermite kernel for high-quality 3D reconstruction and rendering.
{"title":"2DGH: 2D Gaussian-Hermite Splatting for High-Quality Rendering and Better Geometry Features.","authors":"Ruihan Yu, Tianyu Huang, Jingwang Ling, Feng Xu","doi":"10.1109/TVCG.2025.3622157","DOIUrl":"10.1109/TVCG.2025.3622157","url":null,"abstract":"<p><p>2D Gaussian Splatting has recently emerged as a significant method in 3D reconstruction, enabling novel view synthesis and geometry reconstruction simultaneously. While the well-known Gaussian kernel is broadly used, its lack of anisotropy and deformation ability leads to dim and vague edges at object silhouettes, limiting the reconstruction quality of current Gaussian splatting methods. To enhance the representation power, we draw inspiration from quantum physics and propose to use the Gaussian-Hermite kernel as the new primitive in Gaussian splatting. The new kernel takes a unified mathematical form and extends the Gaussian function, which serves as the zero-rank special case in the updated general formulation. Our experiments demonstrate that the proposed Gaussian-Hermite kernel achieves improved performance over traditional Gaussian Splatting kernels on both geometry reconstruction and novel-view synthesis tasks. Specifically, on the DTU dataset, our method yields more accurate geometry reconstruction, while on datasets such as MipNeRF360 and our customized Detail dataset, it achieves better results in novel-view synthesis. These results highlight the potential of the Gaussian-Hermite kernel for high-quality 3D reconstruction and rendering.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1513-1524"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145305095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3635528
Stefan Cobeli, Kazi Shahrukh Omar, Rodrigo Valenca, Nivan Ferreira, Fabio Miranda
Despite the growing availability of 3D urban datasets, extracting insights remains challenging due to computational bottlenecks and the complexity of interacting with data. In fact, the intricate geometry of 3D urban environments results in high degrees of occlusion and requires extensive manual viewpoint adjustments that make large-scale exploration inefficient. To address this, we propose a view-based approach for 3D data exploration, where a vector field encodes views from the environment. To support this approach, we introduce a neural field-based method that constructs an efficient implicit representation of 3D environments. This representation enables both faster direct queries, which consist of the computation of view assessment indices, and inverse queries, which help avoid occlusion and facilitate the search for views that match desired data patterns. Our approach supports key urban analysis tasks such as visibility assessments, solar exposure evaluation, and assessing the visual impact of new developments. We validate our method through quantitative experiments, case studies informed by real-world urban challenges, and feedback from domain experts. Results show its effectiveness in finding desirable viewpoints, analyzing building facade visibility, and evaluating views from outdoor spaces.
{"title":"A Neural Field-Based Approach for View Computation & Data Exploration in 3D Urban Environments.","authors":"Stefan Cobeli, Kazi Shahrukh Omar, Rodrigo Valenca, Nivan Ferreira, Fabio Miranda","doi":"10.1109/TVCG.2025.3635528","DOIUrl":"10.1109/TVCG.2025.3635528","url":null,"abstract":"<p><p>Despite the growing availability of 3D urban datasets, extracting insights remains challenging due to computational bottlenecks and the complexity of interacting with data. In fact, the intricate geometry of 3D urban environments results in high degrees of occlusion and requires extensive manual viewpoint adjustments that make large-scale exploration inefficient. To address this, we propose a view-based approach for 3D data exploration, where a vector field encodes views from the environment. To support this approach, we introduce a neural field-based method that constructs an efficient implicit representation of 3D environments. This representation enables both faster direct queries, which consist of the computation of view assessment indices, and inverse queries, which help avoid occlusion and facilitate the search for views that match desired data patterns. Our approach supports key urban analysis tasks such as visibility assessments, solar exposure evaluation, and assessing the visual impact of new developments. We validate our method through quantitative experiments, case studies informed by real-world urban challenges, and feedback from domain experts. Results show its effectiveness in finding desirable viewpoints, analyzing building facade visibility, and evaluating views from outdoor spaces.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1540-1553"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145575099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3622483
Tetsuya Takahashi, Christopher Batty
We propose a parameter optimization method for achieving static equilibrium of discrete elastic rods. Our method simultaneously optimizes material stiffness and rest shape parameters under box constraints to exactly enforce zero net forces while avoiding stability issues and violations of physical laws. For efficiency, we split our constrained optimization problem into primal and dual subproblems via the augmented Lagrangian method, while handling the dual maximization subproblem via simple vector updates. To efficiently solve the box-constrained primal minimization subproblem, we propose a new active-set Cholesky preconditioner for variants of conjugate gradient solvers with active sets. Our method surpasses prior work in generality, robustness, and speed.
{"title":"Optimizing Parameters for Static Equilibrium of Discrete Elastic Rods With Active-Set Cholesky.","authors":"Tetsuya Takahashi, Christopher Batty","doi":"10.1109/TVCG.2025.3622483","DOIUrl":"10.1109/TVCG.2025.3622483","url":null,"abstract":"<p><p>We propose a parameter optimization method for achieving static equilibrium of discrete elastic rods. Our method simultaneously optimizes material stiffness and rest shape parameters under box constraints to exactly enforce zero net forces while avoiding stability issues and violations of physical laws. For efficiency, we split our constrained optimization problem into primal and dual subproblems via the augmented Lagrangian method, while handling the dual maximization subproblem via simple vector updates. To efficiently solve the box-constrained primal minimization subproblem, we propose a new active-set Cholesky preconditioner for variants of conjugate gradient solvers with active sets. Our method surpasses prior work in generality, robustness, and speed.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1951-1962"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145310348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3622042
Yushi Wei, Rongkai Shi, Sen Zhang, Anil Ufuk Batmaz, Pan Hui, Hai-Ning Liang
Cursors and how they are presented significantly influence user experience in both VR and non-VR environments by shaping how users interact with and perceive interfaces. In traditional interfaces, cursors serve as a fundamental component for translating human movement into digital interactions, enhancing interaction accuracy, efficiency, and experience. The design and visibility of cursors can affect users' ability to locate interactive elements and understand system feedback. In VR, cursor manipulation is more complex than in non-VR environments, as it can be controlled through hand, head, and gaze movements. With the arrival of the Apple Vision Pro, the use of gaze-controlled non-visible cursors has gained some prominence. However, there has been limited exploration of the effect of this type of cursor. This work presents a comprehensive study of the effects of cursor visibility (visible versus invisible) in gaze-based interactions within VR environments. Through two user studies, we investigate how cursor visibility impacts user performance and experience across different confirmation mechanisms and tasks. The first study focuses on selection tasks, examining the influence of target width, movement amplitude, and three common confirmation methods (air tap, blinking, and dwell). The second study explores pursuit tasks, analyzing cursor effects under varying movement speeds. Our findings reveal that cursor visibility significantly affects both objective performance metrics and subjective user preferences, but these effects vary depending on the confirmation mechanism used and task type. We propose eight design implications based on our empirical results to guide the future development of gaze-based interfaces in VR. These insights highlight the importance of tailoring cursor metaphors to specific interaction tasks and provide practical guidance for researchers and developers in optimizing VR user interfaces.
{"title":"Reevaluating the Gaze Cursor in Virtual Reality: A Comparative Analysis of Cursor Visibility, Confirmation Mechanisms, and Task Paradigms.","authors":"Yushi Wei, Rongkai Shi, Sen Zhang, Anil Ufuk Batmaz, Pan Hui, Hai-Ning Liang","doi":"10.1109/TVCG.2025.3622042","DOIUrl":"10.1109/TVCG.2025.3622042","url":null,"abstract":"<p><p>Cursors and how they are presented significantly influence user experience in both VR and non-VR environments by shaping how users interact with and perceive interfaces. In traditional interfaces, cursors serve as a fundamental component for translating human movement into digital interactions, enhancing interaction accuracy, efficiency, and experience. The design and visibility of cursors can affect users' ability to locate interactive elements and understand system feedback. In VR, cursor manipulation is more complex than in non-VR environments, as it can be controlled through hand, head, and gaze movements. With the arrival of the Apple Vision Pro, the use of gaze-controlled non-visible cursors has gained some prominence. However, there has been limited exploration of the effect of this type of cursor. This work presents a comprehensive study of the effects of cursor visibility (visible versus invisible) in gaze-based interactions within VR environments. Through two user studies, we investigate how cursor visibility impacts user performance and experience across different confirmation mechanisms and tasks. The first study focuses on selection tasks, examining the influence of target width, movement amplitude, and three common confirmation methods (air tap, blinking, and dwell). The second study explores pursuit tasks, analyzing cursor effects under varying movement speeds. Our findings reveal that cursor visibility significantly affects both objective performance metrics and subjective user preferences, but these effects vary depending on the confirmation mechanism used and task type. We propose eight design implications based on our empirical results to guide the future development of gaze-based interfaces in VR. These insights highlight the importance of tailoring cursor metaphors to specific interaction tasks and provide practical guidance for researchers and developers in optimizing VR user interfaces.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1640-1655"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145305158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3631702
Junyu Zhu, Hao Zhu, Sheng Wang, Zhan Ma, Xun Cao
Neural Radiance Fields (NeRF) have gained significant attention due to their precise reconstruction and rapid inference capabilities, making them highly promising for applications in virtual reality and gaming. However, extending NeRF's capabilities to dynamic scenes remains underexplored, particularly in ensuring consistent and coherent reconstructions across space, time, and viewing angles. To address this challenge, we propose Scale-NeRF, a novel approach that organizes the training of dynamic NeRFs as a progressive, scale-based refinement process, grounded in hierarchical Bayesian theory. Scale-NeRF begins by reconstructing the radiance fields using coarse, large-scale frames and iteratively refines them with progressively smaller-scale frames. This hierarchical strategy, combined with a corresponding sampling approach and a newly introduced structural loss, ensures consistency and integrity throughout the reconstruction process. Experiments on public datasets validate the superiority of Scale-NeRF over traditional methods, especially in terms of the proposed metrics evaluating spatial, angular, and temporal consistency. Furthermore, Scale-NeRF demonstrates excellent dynamic reconstruction capabilities with real-time rendering, offering a significant advancement for applications demanding both high fidelity and real-time performance.
{"title":"Hierarchical Bayesian Guided Spatial-, Angular- and Temporal-Consistent View Synthesis.","authors":"Junyu Zhu, Hao Zhu, Sheng Wang, Zhan Ma, Xun Cao","doi":"10.1109/TVCG.2025.3631702","DOIUrl":"10.1109/TVCG.2025.3631702","url":null,"abstract":"<p><p>Neural Radiance Fields (NeRF) have gained significant attention due to their precise reconstruction and rapid inference capabilities, making them highly promising for applications in virtual reality and gaming. However, extending NeRF's capabilities to dynamic scenes remains underexplored, particularly in ensuring consistent and coherent reconstructions across space, time, and viewing angles. To address this challenge, we propose Scale-NeRF, a novel approach that organizes the training of dynamic NeRFs as a progressive, scale-based refinement process, grounded in hierarchical Bayesian theory. Scale-NeRF begins by reconstructing the radiance fields using coarse, large-scale frames and iteratively refines them with progressively smaller-scale frames. This hierarchical strategy, combined with a corresponding sampling approach and a newly introduced structural loss, ensures consistency and integrity throughout the reconstruction process. Experiments on public datasets validate the superiority of Scale-NeRF over traditional methods, especially in terms of the proposed metrics evaluating spatial, angular, and temporal consistency. Furthermore, Scale-NeRF demonstrates excellent dynamic reconstruction capabilities with real-time rendering, offering a significant advancement for applications demanding both high fidelity and real-time performance.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1438-1451"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145508619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3632345
Daniel Rupp, Tim Weissker, Matthias Wolwer, Torsten W Kuhlen, Daniel Zielasko
Target-selection-based teleportation is one of the most widely used and researched travel techniques in immersive virtual environments, requiring the user to specify a target location with a selection ray before being transported there. This work explores the influence of the maximum reach of the parabolic selection ray, modeled by different emission velocities of the projectile motion equation, and compares the resulting teleportation performance to a straight ray as the baseline. In a user study with 60 participants, we asked participants to teleport as far as possible while still remaining within accuracy constraints to understand how the theoretical implications of the projectile motion equation apply to a realistic VR use case. We found that a projectile emission velocity of $14 frac{m}{s}$14ms (resulting in a maximal reach of $text{21.52 m}$21.52m) offered the best trade-off between selection distance and accuracy, with an inferior performance of the straight ray. Our results demonstrate the necessity to carefully set and report the projectile emission velocity in future work, as it was shown to directly influence user-selected distance, selection errors, and controller height during selection.
{"title":"How Far is Too Far? The Trade-Off Between Selection Distance and Accuracy During Teleportation in Immersive Virtual Reality.","authors":"Daniel Rupp, Tim Weissker, Matthias Wolwer, Torsten W Kuhlen, Daniel Zielasko","doi":"10.1109/TVCG.2025.3632345","DOIUrl":"10.1109/TVCG.2025.3632345","url":null,"abstract":"<p><p>Target-selection-based teleportation is one of the most widely used and researched travel techniques in immersive virtual environments, requiring the user to specify a target location with a selection ray before being transported there. This work explores the influence of the maximum reach of the parabolic selection ray, modeled by different emission velocities of the projectile motion equation, and compares the resulting teleportation performance to a straight ray as the baseline. In a user study with 60 participants, we asked participants to teleport as far as possible while still remaining within accuracy constraints to understand how the theoretical implications of the projectile motion equation apply to a realistic VR use case. We found that a projectile emission velocity of $14 frac{m}{s}$14ms (resulting in a maximal reach of $text{21.52 m}$21.52m) offered the best trade-off between selection distance and accuracy, with an inferior performance of the straight ray. Our results demonstrate the necessity to carefully set and report the projectile emission velocity in future work, as it was shown to directly influence user-selected distance, selection errors, and controller height during selection.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1864-1878"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145524848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}