首页 > 最新文献

IEEE transactions on visualization and computer graphics最新文献

英文 中文
ScribbleSense: Generative Scribble-Based Texture Editing With Intent Prediction. ScribbleSense:基于生成涂鸦的纹理编辑与意图预测。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3635035
Yudi Zhang, Yeming Geng, Lei Zhang

Interactive 3D model texture editing presents enhanced opportunities for creating 3D assets, with freehand drawing style offering the most intuitive experience. However, existing methods primarily support sketch-based interactions for outlining, while the utilization of coarse-grained scribble-based interaction remains limited. Furthermore, current methodologies often encounter challenges due to the abstract nature of scribble instructions, which can result in ambiguous editing intentions and unclear target semantic locations. To address these issues, we propose ScribbleSense, an editing method that combines multimodal large language models (MLLMs) and image generation models to effectively resolve these challenges. We leverage the visual capabilities of MLLMs to predict the editing intent behind the scribbles. Once the semantic intent of the scribble is discerned, we employ globally generated images to extract local texture details, thereby anchoring local semantics and alleviating ambiguities concerning the target semantic locations. Experimental results indicate that our method effectively leverages the strengths of MLLMs, achieving state-of-the-art interactive editing performance for scribble-based texture editing.

交互式3D模型纹理编辑为创建3D资产提供了增强的机会,手绘风格提供了最直观的体验。然而,现有的方法主要支持基于草图的交互来进行概述,而基于粗粒度潦草的交互的利用仍然有限。此外,由于潦草指令的抽象性,当前的方法经常遇到挑战,这可能导致模糊的编辑意图和不明确的目标语义位置。为了解决这些问题,我们提出了ScribbleSense,一种结合多模态大语言模型(mllm)和图像生成模型的编辑方法,以有效地解决这些挑战。我们利用mlms的视觉功能来预测涂鸦背后的编辑意图。一旦识别了涂鸦的语义意图,我们使用全局生成的图像来提取局部纹理细节,从而锚定局部语义并减轻目标语义位置的模糊性。实验结果表明,我们的方法有效地利用了mllm的优势,为基于涂鸦的纹理编辑实现了最先进的交互式编辑性能。
{"title":"ScribbleSense: Generative Scribble-Based Texture Editing With Intent Prediction.","authors":"Yudi Zhang, Yeming Geng, Lei Zhang","doi":"10.1109/TVCG.2025.3635035","DOIUrl":"10.1109/TVCG.2025.3635035","url":null,"abstract":"<p><p>Interactive 3D model texture editing presents enhanced opportunities for creating 3D assets, with freehand drawing style offering the most intuitive experience. However, existing methods primarily support sketch-based interactions for outlining, while the utilization of coarse-grained scribble-based interaction remains limited. Furthermore, current methodologies often encounter challenges due to the abstract nature of scribble instructions, which can result in ambiguous editing intentions and unclear target semantic locations. To address these issues, we propose ScribbleSense, an editing method that combines multimodal large language models (MLLMs) and image generation models to effectively resolve these challenges. We leverage the visual capabilities of MLLMs to predict the editing intent behind the scribbles. Once the semantic intent of the scribble is discerned, we employ globally generated images to extract local texture details, thereby anchoring local semantics and alleviating ambiguities concerning the target semantic locations. Experimental results indicate that our method effectively leverages the strengths of MLLMs, achieving state-of-the-art interactive editing performance for scribble-based texture editing.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"2075-2086"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145575104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Perceived Empathy in Empathic Mixed Reality Agents via Context-Aware Adaptation. 通过情境感知适应增强共情混合现实主体的共情感知。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3646601
Zhuang Chang, Dominik O W Hirschberg, Kunal Gupta, Mehak Sharma, Kangsoo Kim, Huidong Bai, Li Shao, Mark Billinghurst

Mixed Reality Agents (MiRAs) have been extensively studied to enhance virtual-physical interactions, using their ability to exist in both virtual and physical environments. However, little research has focused on enhancing perceived empathy in MiRAs, despite its potential for agent-assisted therapy, education, and training. To fill this gap, we investigate the impact of an Empathic Mixed Reality agent (EMiRA) that adapts to users' physiological states and physical events in a shooting game. We found that this adaptation enhanced users' social perceptions of the agent, including social presence, social connectedness, and perceived empathy. Physiological adaptation increased paternalism and reduced user dominance, while physical adaptation had no such effect. We discuss these findings and provide design implications for future EMiRAs.

混合现实代理(MiRAs)已经被广泛研究,以增强虚拟-物理交互,利用他们在虚拟和物理环境中存在的能力。然而,尽管MiRAs具有代理辅助治疗、教育和培训的潜力,但很少有研究关注于增强MiRAs的感知同理心。为了填补这一空白,我们研究了在射击游戏中适应用户生理状态和物理事件的移情混合现实代理(EMiRA)的影响。我们发现这种适应增强了用户对代理的社会感知,包括社会存在、社会联系和感知共情。生理适应增加了家长作风,减少了用户支配,而生理适应没有这种影响。我们讨论了这些发现,并为未来的埃米尔提供了设计启示。
{"title":"Enhancing Perceived Empathy in Empathic Mixed Reality Agents via Context-Aware Adaptation.","authors":"Zhuang Chang, Dominik O W Hirschberg, Kunal Gupta, Mehak Sharma, Kangsoo Kim, Huidong Bai, Li Shao, Mark Billinghurst","doi":"10.1109/TVCG.2025.3646601","DOIUrl":"10.1109/TVCG.2025.3646601","url":null,"abstract":"<p><p>Mixed Reality Agents (MiRAs) have been extensively studied to enhance virtual-physical interactions, using their ability to exist in both virtual and physical environments. However, little research has focused on enhancing perceived empathy in MiRAs, despite its potential for agent-assisted therapy, education, and training. To fill this gap, we investigate the impact of an Empathic Mixed Reality agent (EMiRA) that adapts to users' physiological states and physical events in a shooting game. We found that this adaptation enhanced users' social perceptions of the agent, including social presence, social connectedness, and perceived empathy. Physiological adaptation increased paternalism and reduced user dominance, while physical adaptation had no such effect. We discuss these findings and provide design implications for future EMiRAs.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1569-1581"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145812496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the Usability of Microgestures for Text Editing Tasks in Virtual Reality. 评估微手势在虚拟现实中文本编辑任务中的可用性。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3642050
Xiang Li, Wei He, Per Ola Kristensson

As virtual reality (VR) continues to evolve, traditional input methods such as handheld controllers and gesture systems often face challenges with precision, social accessibility, and user fatigue. These limitations motivate the exploration of microgestures, which promise more subtle, ergonomic, and device-free interactions. We introduce microGEXT, a lightweight microgesture-based system designed for text editing in VR without external sensors, which utilizes small, subtle hand movements to reduce physical strain compared to standard gestures. We evaluated microGEXT in three user studies. In Study 1 ($N=20$N=20), microGEXT reduced overall edit time and fatigue compared to a ray-casting + pinch menu baseline, the default text editing approach in commercial VR systems. Study 2 ($N=20$N=20) found that microGEXT performed well in short text selection tasks but was slower for longer text ranges. In Study 3 ($N=10$N=10), participants found microGEXT intuitive for open-ended information-gathering tasks. Across all studies, microGEXT demonstrated enhanced user experience and reduced physical effort, offering a promising alternative to traditional VR text editing techniques.

随着虚拟现实(VR)的不断发展,传统的输入方法(如手持控制器和手势系统)经常面临精度、社交可访问性和用户疲劳方面的挑战。这些限制激发了对微手势的探索,它承诺更微妙的、符合人体工程学的、不需要设备的交互。我们推出了microGEXT,这是一款轻量级的微手势系统,专为VR中的文本编辑而设计,无需外部传感器,与标准手势相比,它利用微小的手部动作来减少身体压力。我们在三个用户研究中评估了microGEXT。在研究1 ($N=20$)中,与商业VR系统中的默认文本编辑方法光线投射+缩放菜单基线相比,microGEXT减少了总体编辑时间和疲劳。研究2 ($N=20$)发现,microGEXT在短文本选择任务中表现良好,但在较长的文本范围中表现较慢。在研究3 ($N=10$)中,参与者发现microGEXT对于开放式信息收集任务是直观的。在所有的研究中,microGEXT展示了增强的用户体验和减少的体力劳动,为传统的VR文本编辑技术提供了一个有希望的替代方案。
{"title":"Evaluating the Usability of Microgestures for Text Editing Tasks in Virtual Reality.","authors":"Xiang Li, Wei He, Per Ola Kristensson","doi":"10.1109/TVCG.2025.3642050","DOIUrl":"10.1109/TVCG.2025.3642050","url":null,"abstract":"<p><p>As virtual reality (VR) continues to evolve, traditional input methods such as handheld controllers and gesture systems often face challenges with precision, social accessibility, and user fatigue. These limitations motivate the exploration of microgestures, which promise more subtle, ergonomic, and device-free interactions. We introduce microGEXT, a lightweight microgesture-based system designed for text editing in VR without external sensors, which utilizes small, subtle hand movements to reduce physical strain compared to standard gestures. We evaluated microGEXT in three user studies. In Study 1 ($N=20$N=20), microGEXT reduced overall edit time and fatigue compared to a ray-casting + pinch menu baseline, the default text editing approach in commercial VR systems. Study 2 ($N=20$N=20) found that microGEXT performed well in short text selection tasks but was slower for longer text ranges. In Study 3 ($N=10$N=10), participants found microGEXT intuitive for open-ended information-gathering tasks. Across all studies, microGEXT demonstrated enhanced user experience and reduced physical effort, offering a promising alternative to traditional VR text editing techniques.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"2020-2033"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145746231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perceptual Quality Assessment of Trisoup-Lifting Encoded 3D Point Clouds. 三组提升编码三维点云的感知质量评价。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3629111
Juncheng Long, Honglei Su, Qi Liu, Hui Yuan, Wei Gao, Jiarun Song, Zhou Wang

No-reference bitstream-layer point cloud quality assessment (PCQA) can be deployed without full decoding at any network node to achieve real-time quality monitoring. In this work, we develop the first PCQA model dedicated to Trisoup-Lifting encoded 3D point clouds by analyzing bitstreams without full decoding. Specifically, we investigate the relationship among texture bitrate per point (TBPP), texture complexity (TC) and texture quantization parameter (TQP) while geometry encoding is lossless. Subsequently, we estimate TC by utilizing TQP and TBPP. Then, we establish a texture distortion evaluation model based on TC, TBPP and TQP. Ultimately, by integrating this texture distortion model with a geometry attenuation factor, a function of trisoupNodeSizeLog2 (tNSL), we acquire a comprehensive NR bitstream-layer PCQA model named streamPCQ-TL. In addition, this work establishes a database named WPC6.0, the first PCQA database dedicated to Trisoup-Lifting encoding mode, encompassing 400 distorted point clouds with 4 geometry multiplied by 5 texture distortion levels. Experiment results on M-PCCD, ICIP2020 and the proposed WPC6.0 database suggest that the proposed streamPCQ-TL model exhibits robust and notable performance in contrast to existing advanced PCQA metrics, particularly in terms of computational cost.

无参考比特流层点云质量评估(PCQA)无需全解码即可部署在任何网络节点,实现实时质量监控。在这项工作中,我们通过分析没有完全解码的比特流,开发了第一个专用于Trisoup-Lifting编码3D点云的PCQA模型。具体来说,我们研究了几何编码无损时纹理比特率(TBPP)、纹理复杂度(TC)和纹理量化参数(TQP)之间的关系。随后,我们利用TQP和TBPP来估计总质量。然后,我们建立了基于TC、TBPP和TQP的纹理失真评价模型。最后,将该纹理失真模型与几何衰减因子trisoupNodeSizeLog2 (tNSL)函数相结合,得到综合的NR位流层PCQA模型streamPCQ-TL。此外,本工作还建立了WPC6.0数据库,这是第一个专门针对Trisoup-Lifting编码模式的PCQA数据库,包含400个扭曲点云,4个几何形状乘以5个纹理失真级别。在M-PCCD、ICIP2020和WPC6.0数据库上的实验结果表明,与现有的先进PCQA指标相比,所提出的流pcq - tl模型具有鲁棒性和显著的性能,特别是在计算成本方面。数据集和源代码将在https://github.com/qdushl/Waterloo-Point-Cloud-Database-6.0上公开发布。
{"title":"Perceptual Quality Assessment of Trisoup-Lifting Encoded 3D Point Clouds.","authors":"Juncheng Long, Honglei Su, Qi Liu, Hui Yuan, Wei Gao, Jiarun Song, Zhou Wang","doi":"10.1109/TVCG.2025.3629111","DOIUrl":"10.1109/TVCG.2025.3629111","url":null,"abstract":"<p><p>No-reference bitstream-layer point cloud quality assessment (PCQA) can be deployed without full decoding at any network node to achieve real-time quality monitoring. In this work, we develop the first PCQA model dedicated to Trisoup-Lifting encoded 3D point clouds by analyzing bitstreams without full decoding. Specifically, we investigate the relationship among texture bitrate per point (TBPP), texture complexity (TC) and texture quantization parameter (TQP) while geometry encoding is lossless. Subsequently, we estimate TC by utilizing TQP and TBPP. Then, we establish a texture distortion evaluation model based on TC, TBPP and TQP. Ultimately, by integrating this texture distortion model with a geometry attenuation factor, a function of trisoupNodeSizeLog2 (tNSL), we acquire a comprehensive NR bitstream-layer PCQA model named streamPCQ-TL. In addition, this work establishes a database named WPC6.0, the first PCQA database dedicated to Trisoup-Lifting encoding mode, encompassing 400 distorted point clouds with 4 geometry multiplied by 5 texture distortion levels. Experiment results on M-PCCD, ICIP2020 and the proposed WPC6.0 database suggest that the proposed streamPCQ-TL model exhibits robust and notable performance in contrast to existing advanced PCQA metrics, particularly in terms of computational cost.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"2034-2048"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145454269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LaPDA: Latent-Space Point Cloud Denoising With Adaptivity. LaPDA:基于自适应的潜在空间点云去噪。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3635138
Peng Du, Xingce Wang, Zhongke Wu, Xudong Ru, Xavier Granier, Ying He

Point cloud denoising is a fundamental yet challenging task in computer graphics. Existing solutions typically rely on supervised training on synthesized noise. However, real-world noise often exhibits greater complexity, causing learning-based methods trained on synthetic noise to struggle when encountering unseen noise-a phenomenon we refer to as noise misalignment. To address this challenge, we propose LaPDA (Latent-space Point cloud Denoising with Adaptivity), a neural network explicitly designed to mitigate noise misalignment and enhance denoising robustness. LaPDA consists of two key stages. First, we adaptively model noise in the latent space, aligning unseen noise distributions with the known training distributions or adjusting them toward distributions with lower noise scales. Training objectives at this stage are formulated based on controlled synthetic noise with varying intensity levels. Second, we introduce a gradual noise removal module that optimizes the spatial distribution of the adaptively adjusted noisy points. Extensive experiments conducted on both synthetic and scanned datasets demonstrate that LaPDA achieves enhanced accuracy and robustness compared to state-of-the-art methods.

点云去噪是计算机图形学中一项基本而又具有挑战性的任务。现有的解决方案通常依赖于对合成噪声的监督训练。然而,现实世界的噪声往往表现出更大的复杂性,导致在合成噪声上训练的基于学习的方法在遇到看不见的噪声时会遇到困难——我们将这种现象称为噪声失调。为了应对这一挑战,我们提出了LaPDA (Latent-space Point cloud Denoising with Adaptivity),这是一种明确设计用于减轻噪声失调和增强去噪鲁棒性的神经网络。LaPDA包括两个关键阶段。首先,我们在潜在空间中自适应地对噪声建模,将未见的噪声分布与已知的训练分布对齐,或将其调整为具有更低噪声尺度的分布。这一阶段的训练目标是根据不同强度水平的受控合成噪声制定的。其次,我们引入了一个渐进的噪声去除模块,优化自适应调整噪声点的空间分布。在合成和扫描数据集上进行的大量实验表明,与最先进的方法相比,LaPDA实现了更高的准确性和鲁棒性。我们将公开源代码和测试模型。
{"title":"LaPDA: Latent-Space Point Cloud Denoising With Adaptivity.","authors":"Peng Du, Xingce Wang, Zhongke Wu, Xudong Ru, Xavier Granier, Ying He","doi":"10.1109/TVCG.2025.3635138","DOIUrl":"10.1109/TVCG.2025.3635138","url":null,"abstract":"<p><p>Point cloud denoising is a fundamental yet challenging task in computer graphics. Existing solutions typically rely on supervised training on synthesized noise. However, real-world noise often exhibits greater complexity, causing learning-based methods trained on synthetic noise to struggle when encountering unseen noise-a phenomenon we refer to as noise misalignment. To address this challenge, we propose LaPDA (Latent-space Point cloud Denoising with Adaptivity), a neural network explicitly designed to mitigate noise misalignment and enhance denoising robustness. LaPDA consists of two key stages. First, we adaptively model noise in the latent space, aligning unseen noise distributions with the known training distributions or adjusting them toward distributions with lower noise scales. Training objectives at this stage are formulated based on controlled synthetic noise with varying intensity levels. Second, we introduce a gradual noise removal module that optimizes the spatial distribution of the adaptively adjusted noisy points. Extensive experiments conducted on both synthetic and scanned datasets demonstrate that LaPDA achieves enhanced accuracy and robustness compared to state-of-the-art methods.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1525-1539"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145574754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual and Somatosensory Integration With Higher Sitting Posture Enhances the Sense of Standing and Self-Motion in Seated VR. 视觉和体感与高坐姿的融合增强了坐姿VR中的站立感和自我运动感。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3640239
Daiki Hagimori, Naoya Isoyama, Monica Perusquia-Hernandez, Shunsuke Yoshimoto, Hideaki Uchiyama, Nobuchika Sakata, Kiyoshi Kiyokawa

Users are often seated in the real environment, while their virtual avatars either remain standing stationary or move in virtual reality (VR). This creates posture inconsistencies between the real and virtual embodiment representations. The relationship between posture consistency in locomotion techniques and sense of presence in VR is still unclear. This study investigates how visual and somatosensory integration affects the sense of standing (SoSt) and the sense of self-motion (SoSm) when the sitting posture is varied slightly, including highlighting the importance of sitting posture for locomotion design in VR. The degree and occurrence of SoSt and SoSm were assessed by subjective experiments, and it was found that higher sitting and lower sitting postures present higher SoSt and lower SoSm, respectively. Invocation of SoSt also influences postural perception. Perception of travel distance varied according to the posture condition when identical visual flow was presented. The findings suggest that visual and somatosensory integration related to posture enhances SoSt and SoSm, and a sitting posture with a higher seating position is recommended in seated VR locomotion design.

用户通常坐在真实环境中,而他们的虚拟化身在虚拟现实(VR)中要么站着不动,要么移动。这造成了真实和虚拟化身表示之间的姿势不一致。在VR中,运动技术中的姿势一致性与存在感之间的关系尚不清楚。本研究探讨了当坐姿略有变化时,视觉和体感整合如何影响站立感(SoSt)和自动感(SoSm),包括强调坐姿对VR运动设计的重要性。通过主观实验评估SoSt和SoSm的程度和发生情况,发现坐姿越高,SoSt越高,SoSm越低。SoSt的调用也影响姿势感知。在相同视觉流条件下,行走距离感知随姿态条件的变化而变化。研究结果表明,与姿势相关的视觉和体感整合增强了SoSt和SoSm,建议在坐姿VR运动设计中采用更高的坐姿。
{"title":"Visual and Somatosensory Integration With Higher Sitting Posture Enhances the Sense of Standing and Self-Motion in Seated VR.","authors":"Daiki Hagimori, Naoya Isoyama, Monica Perusquia-Hernandez, Shunsuke Yoshimoto, Hideaki Uchiyama, Nobuchika Sakata, Kiyoshi Kiyokawa","doi":"10.1109/TVCG.2025.3640239","DOIUrl":"10.1109/TVCG.2025.3640239","url":null,"abstract":"<p><p>Users are often seated in the real environment, while their virtual avatars either remain standing stationary or move in virtual reality (VR). This creates posture inconsistencies between the real and virtual embodiment representations. The relationship between posture consistency in locomotion techniques and sense of presence in VR is still unclear. This study investigates how visual and somatosensory integration affects the sense of standing (SoSt) and the sense of self-motion (SoSm) when the sitting posture is varied slightly, including highlighting the importance of sitting posture for locomotion design in VR. The degree and occurrence of SoSt and SoSm were assessed by subjective experiments, and it was found that higher sitting and lower sitting postures present higher SoSt and lower SoSm, respectively. Invocation of SoSt also influences postural perception. Perception of travel distance varied according to the posture condition when identical visual flow was presented. The findings suggest that visual and somatosensory integration related to posture enhances SoSt and SoSm, and a sitting posture with a higher seating position is recommended in seated VR locomotion design.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1767-1779"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145679809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiFusion: Flexible Stylized Motion Generation Using Digest-and-Fusion Scheme. 扩散:灵活的风格化运动生成使用消化和融合方案。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3620400
Yatian Wang, Haoran Mo, Chengying Gao

To address the issue of style expression in existing text-driven human motion synthesis methods, we propose DiFusion, a framework for diversely stylized motion generation. It offers flexible control of content through texts and style via multiple modalities, i.e., textual labels or motion sequences. Our approach employs a dual-condition motion latent diffusion model, enabling independent control of content and style through flexible input modalities. To tackle the issue of imbalanced complexity between the text-motion and style-motion datasets, we propose the Digest-and-Fusion training scheme, which digests domain-specific knowledge from both datasets and then adaptively fuses them into a compatible manner. Comprehensive evaluations demonstrate the effectiveness of our method and its superiority over existing approaches in terms of content alignment, style expressiveness, realism, and diversity. Additionally, our approach can be extended to practical applications, such as motion style interpolation.

为了解决现有文本驱动人体运动合成方法中的风格表达问题,我们提出了DiFusion框架,用于不同风格的运动生成。它通过文本和样式通过多种方式提供灵活的内容控制,即文本标签或动作序列。我们的方法采用双条件运动潜扩散模型,通过灵活的输入方式实现对内容和风格的独立控制。为了解决文本-运动和样式-运动数据集之间复杂性不平衡的问题,我们提出了摘要-融合训练方案,该方案从两个数据集中提取特定领域的知识,然后自适应地将它们融合成兼容的方式。综合评估证明了我们的方法的有效性,并且在内容一致性、风格表现力、现实性和多样性方面优于现有方法。此外,我们的方法可以扩展到实际应用,如运动风格插值。
{"title":"DiFusion: Flexible Stylized Motion Generation Using Digest-and-Fusion Scheme.","authors":"Yatian Wang, Haoran Mo, Chengying Gao","doi":"10.1109/TVCG.2025.3620400","DOIUrl":"10.1109/TVCG.2025.3620400","url":null,"abstract":"<p><p>To address the issue of style expression in existing text-driven human motion synthesis methods, we propose DiFusion, a framework for diversely stylized motion generation. It offers flexible control of content through texts and style via multiple modalities, i.e., textual labels or motion sequences. Our approach employs a dual-condition motion latent diffusion model, enabling independent control of content and style through flexible input modalities. To tackle the issue of imbalanced complexity between the text-motion and style-motion datasets, we propose the Digest-and-Fusion training scheme, which digests domain-specific knowledge from both datasets and then adaptively fuses them into a compatible manner. Comprehensive evaluations demonstrate the effectiveness of our method and its superiority over existing approaches in terms of content alignment, style expressiveness, realism, and diversity. Additionally, our approach can be extended to practical applications, such as motion style interpolation.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1593-1604"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145287981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EnVisionVR: A Scene Interpretation Tool for Visual Accessibility in Virtual Reality. EnVisionVR:虚拟现实中视觉可达性的场景解释工具。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3617147
Junlong Chen, Rosella P Galindo Esparza, Vanja Garaj, Per Ola Kristensson, John Dudley

Effective visual accessibility in Virtual Reality (VR) is crucial for Blind and Low Vision (BLV) users. However, designing visual accessibility systems is challenging due to the complexity of 3D VR environments and the need for techniques that can be easily retrofitted into existing applications. While prior work has studied how to enhance or translate visual information, the advancement of Vision Language Models (VLMs) provides an exciting opportunity to advance the scene interpretation capability of current systems. This paper presents EnVisionVR, an accessibility tool for VR scene interpretation. Through a formative study of usability barriers, we confirmed the lack of visual accessibility features as a key barrier for BLV users of VR content and applications. In response, we used our findings from the formative study to inform the design and development of EnVisionVR, a novel visual accessibility system leveraging a VLM, voice input and multimodal feedback for scene interpretation and virtual object interaction in VR. An evaluation with 12 BLV users demonstrated that EnVisionVR significantly improved their ability to locate virtual objects, effectively supporting scene understanding and object interaction.

虚拟现实(VR)中有效的视觉可达性对盲人和低视力(BLV)用户至关重要。然而,由于3D VR环境的复杂性以及对可以轻松改造到现有应用程序的技术的需求,设计视觉可访问性系统具有挑战性。虽然先前的工作已经研究了如何增强或翻译视觉信息,但视觉语言模型(VLMs)的进步为提高当前系统的场景解释能力提供了一个令人兴奋的机会。本文介绍了一个可访问的虚拟现实场景解释工具EnVisionVR。通过对可用性障碍的形成性研究,我们证实了视觉可访问性特征的缺乏是虚拟现实内容和应用程序的BLV用户的主要障碍。作为回应,我们利用形成性研究的结果为EnVisionVR的设计和开发提供了信息,这是一种新型的视觉辅助系统,利用VLM、语音输入和多模态反馈来实现VR中的场景解释和虚拟对象交互。对12名BLV用户的评估表明,EnVisionVR显著提高了他们定位虚拟物体的能力,有效地支持了场景理解和物体交互。
{"title":"EnVisionVR: A Scene Interpretation Tool for Visual Accessibility in Virtual Reality.","authors":"Junlong Chen, Rosella P Galindo Esparza, Vanja Garaj, Per Ola Kristensson, John Dudley","doi":"10.1109/TVCG.2025.3617147","DOIUrl":"10.1109/TVCG.2025.3617147","url":null,"abstract":"<p><p>Effective visual accessibility in Virtual Reality (VR) is crucial for Blind and Low Vision (BLV) users. However, designing visual accessibility systems is challenging due to the complexity of 3D VR environments and the need for techniques that can be easily retrofitted into existing applications. While prior work has studied how to enhance or translate visual information, the advancement of Vision Language Models (VLMs) provides an exciting opportunity to advance the scene interpretation capability of current systems. This paper presents EnVisionVR, an accessibility tool for VR scene interpretation. Through a formative study of usability barriers, we confirmed the lack of visual accessibility features as a key barrier for BLV users of VR content and applications. In response, we used our findings from the formative study to inform the design and development of EnVisionVR, a novel visual accessibility system leveraging a VLM, voice input and multimodal feedback for scene interpretation and virtual object interaction in VR. An evaluation with 12 BLV users demonstrated that EnVisionVR significantly improved their ability to locate virtual objects, effectively supporting scene understanding and object interaction.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"2007-2019"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145240653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Volume Feature Aware View-Epipolar Transformers for Generalizable NeRF. 可扩展NeRF的体积特征感知视图-极外变压器。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3621585
Yilei Chen, Ping An, Xinpeng Huang, Qiang Wu

Generalizable NeRF synthesizes novel views of unseen scenes without per-scene training. The view-epipolar transformer has become popular in this field for its ability to produce high-quality views. Existing methods with this architecture rely on the assumption that texture consistency across views can identify object surfaces, with such identification crucial for determining where to reconstruct texture. However, this assumption is not always valid, as different surface positions may share similar texture features, creating ambiguity in surface identification. To handle this ambiguity, this paper introduces 3D volume features into the view-epipolar transformer. These features contain geometric information, which will be a supplement to texture features. By incorporating both texture and geometric cues in consistency measurement, our method mitigates the ambiguity in surface detection. This leads to more accurate surfaces and thus better novel view synthesis. Additionally, we propose a decoupled decoder where volume and texture features are used for density and color prediction respectively. In this way, the two properties can be better predicted without mutual interference. Experiments show improved results over existing transformer-based methods on both real-world and synthetic datasets.

可概括的NeRF在没有每个场景训练的情况下综合了未见场景的新视图。视图-极极变压器因其产生高质量视图的能力而在该领域受到欢迎。这种架构的现有方法依赖于这样一个假设,即纹理一致性跨视图可以识别物体表面,这种识别对于阻止挖掘重建纹理的位置至关重要。然而,这种假设并不总是有效的,因为不同的表面位置可能具有相似的纹理特征,从而在表面识别中产生歧义。为了解决这种模糊性,本文将三维体特征引入到视极变压器中。这些特征包含几何信息,是对纹理特征的补充。通过在一致性测量中结合纹理和几何线索,我们的方法减轻了表面检测中的模糊性。这导致了更精确的表面,从而更好的新视图合成。此外,我们提出了一种解耦解码器,其中体积和纹理特征分别用于密度和颜色预测。这样,两种性质可以在不相互干扰的情况下更好地预测。实验表明,在真实世界和合成数据集上,现有的基于变压器的方法的结果都有所改善。
{"title":"Volume Feature Aware View-Epipolar Transformers for Generalizable NeRF.","authors":"Yilei Chen, Ping An, Xinpeng Huang, Qiang Wu","doi":"10.1109/TVCG.2025.3621585","DOIUrl":"10.1109/TVCG.2025.3621585","url":null,"abstract":"<p><p>Generalizable NeRF synthesizes novel views of unseen scenes without per-scene training. The view-epipolar transformer has become popular in this field for its ability to produce high-quality views. Existing methods with this architecture rely on the assumption that texture consistency across views can identify object surfaces, with such identification crucial for determining where to reconstruct texture. However, this assumption is not always valid, as different surface positions may share similar texture features, creating ambiguity in surface identification. To handle this ambiguity, this paper introduces 3D volume features into the view-epipolar transformer. These features contain geometric information, which will be a supplement to texture features. By incorporating both texture and geometric cues in consistency measurement, our method mitigates the ambiguity in surface detection. This leads to more accurate surfaces and thus better novel view synthesis. Additionally, we propose a decoupled decoder where volume and texture features are used for density and color prediction respectively. In this way, the two properties can be better predicted without mutual interference. Experiments show improved results over existing transformer-based methods on both real-world and synthetic datasets.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"2049-2060"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145294710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiffPortraitVideo: Diffusion-Based Expression-Consistent Zero-Shot Portrait Video Translation. DiffPortraitVideo:基于扩散的表达一致的零镜头人像视频翻译。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3642300
Shaoxu Li, Chuhang Ma, Ye Pan

Zero-shot text-to-video diffusion models are crafted to expand pre-trained image diffusion models to the video domain without additional training. In recent times, prevailing techniques commonly rely on existing shapes as constraints and introduce inter-frame attention to ensure texture consistency. However, such shape constraints tend to restrict the stylized geometric deformation of videos and inadvertently neglect the original texture characteristics. Furthermore, existing methods suffer from flickering and inconsistent facial expressions. In this paper, we present DiffPortraitVideo. The framework employs a diffusion model-based feature and attention injection mechanism to generate key frames, with cross-frame constraints to enforce coherence and adaptive feature fusion to ensure expression consistency. Our approach achieves high spatio-temporal and expression consistency while retaining the textual and original image properties. Extensive and comprehensive experiments are conducted to validate the efficacy of our proposed framework in generating personalized, high-quality, and coherent videos. This not only showcases the superiority of our method over existing approaches but also paves the way for further research and development in the field of text-to-video generation with enhanced personalization and quality.

零射击文本到视频扩散模型是精心设计的,以扩展预训练的图像扩散模型到视频领域,而无需额外的训练。近年来,流行的技术通常依赖于现有的形状作为约束,并引入帧间注意来确保纹理的一致性。然而,这样的形状约束往往会限制视频的程式化几何变形,不经意间忽略了原有的纹理特征。此外,现有的方法存在面部表情闪烁和不一致的问题。在本文中,我们提出了DiffPortraitVideo。该框架采用基于扩散模型的特征和注意力注入机制生成关键帧,并采用跨帧约束增强一致性,自适应特征融合确保表达一致性。我们的方法在保留文本和原始图像属性的同时,实现了高度的时空和表达一致性。进行了广泛而全面的实验来验证我们提出的框架在生成个性化,高质量和连贯视频方面的有效性。这不仅展示了我们的方法相对于现有方法的优越性,而且为进一步研究和开发具有增强个性化和质量的文本到视频生成领域铺平了道路。
{"title":"DiffPortraitVideo: Diffusion-Based Expression-Consistent Zero-Shot Portrait Video Translation.","authors":"Shaoxu Li, Chuhang Ma, Ye Pan","doi":"10.1109/TVCG.2025.3642300","DOIUrl":"10.1109/TVCG.2025.3642300","url":null,"abstract":"<p><p>Zero-shot text-to-video diffusion models are crafted to expand pre-trained image diffusion models to the video domain without additional training. In recent times, prevailing techniques commonly rely on existing shapes as constraints and introduce inter-frame attention to ensure texture consistency. However, such shape constraints tend to restrict the stylized geometric deformation of videos and inadvertently neglect the original texture characteristics. Furthermore, existing methods suffer from flickering and inconsistent facial expressions. In this paper, we present DiffPortraitVideo. The framework employs a diffusion model-based feature and attention injection mechanism to generate key frames, with cross-frame constraints to enforce coherence and adaptive feature fusion to ensure expression consistency. Our approach achieves high spatio-temporal and expression consistency while retaining the textual and original image properties. Extensive and comprehensive experiments are conducted to validate the efficacy of our proposed framework in generating personalized, high-quality, and coherent videos. This not only showcases the superiority of our method over existing approaches but also paves the way for further research and development in the field of text-to-video generation with enhanced personalization and quality.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1656-1667"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on visualization and computer graphics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1