首页 > 最新文献

IEEE transactions on visualization and computer graphics最新文献

英文 中文
DiFusion: Flexible Stylized Motion Generation Using Digest-and-Fusion Scheme. 扩散:灵活的风格化运动生成使用消化和融合方案。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3620400
Yatian Wang, Haoran Mo, Chengying Gao

To address the issue of style expression in existing text-driven human motion synthesis methods, we propose DiFusion, a framework for diversely stylized motion generation. It offers flexible control of content through texts and style via multiple modalities, i.e., textual labels or motion sequences. Our approach employs a dual-condition motion latent diffusion model, enabling independent control of content and style through flexible input modalities. To tackle the issue of imbalanced complexity between the text-motion and style-motion datasets, we propose the Digest-and-Fusion training scheme, which digests domain-specific knowledge from both datasets and then adaptively fuses them into a compatible manner. Comprehensive evaluations demonstrate the effectiveness of our method and its superiority over existing approaches in terms of content alignment, style expressiveness, realism, and diversity. Additionally, our approach can be extended to practical applications, such as motion style interpolation.

为了解决现有文本驱动人体运动合成方法中的风格表达问题,我们提出了DiFusion框架,用于不同风格的运动生成。它通过文本和样式通过多种方式提供灵活的内容控制,即文本标签或动作序列。我们的方法采用双条件运动潜扩散模型,通过灵活的输入方式实现对内容和风格的独立控制。为了解决文本-运动和样式-运动数据集之间复杂性不平衡的问题,我们提出了摘要-融合训练方案,该方案从两个数据集中提取特定领域的知识,然后自适应地将它们融合成兼容的方式。综合评估证明了我们的方法的有效性,并且在内容一致性、风格表现力、现实性和多样性方面优于现有方法。此外,我们的方法可以扩展到实际应用,如运动风格插值。
{"title":"DiFusion: Flexible Stylized Motion Generation Using Digest-and-Fusion Scheme.","authors":"Yatian Wang, Haoran Mo, Chengying Gao","doi":"10.1109/TVCG.2025.3620400","DOIUrl":"10.1109/TVCG.2025.3620400","url":null,"abstract":"<p><p>To address the issue of style expression in existing text-driven human motion synthesis methods, we propose DiFusion, a framework for diversely stylized motion generation. It offers flexible control of content through texts and style via multiple modalities, i.e., textual labels or motion sequences. Our approach employs a dual-condition motion latent diffusion model, enabling independent control of content and style through flexible input modalities. To tackle the issue of imbalanced complexity between the text-motion and style-motion datasets, we propose the Digest-and-Fusion training scheme, which digests domain-specific knowledge from both datasets and then adaptively fuses them into a compatible manner. Comprehensive evaluations demonstrate the effectiveness of our method and its superiority over existing approaches in terms of content alignment, style expressiveness, realism, and diversity. Additionally, our approach can be extended to practical applications, such as motion style interpolation.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1593-1604"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145287981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EnVisionVR: A Scene Interpretation Tool for Visual Accessibility in Virtual Reality. EnVisionVR:虚拟现实中视觉可达性的场景解释工具。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3617147
Junlong Chen, Rosella P Galindo Esparza, Vanja Garaj, Per Ola Kristensson, John Dudley

Effective visual accessibility in Virtual Reality (VR) is crucial for Blind and Low Vision (BLV) users. However, designing visual accessibility systems is challenging due to the complexity of 3D VR environments and the need for techniques that can be easily retrofitted into existing applications. While prior work has studied how to enhance or translate visual information, the advancement of Vision Language Models (VLMs) provides an exciting opportunity to advance the scene interpretation capability of current systems. This paper presents EnVisionVR, an accessibility tool for VR scene interpretation. Through a formative study of usability barriers, we confirmed the lack of visual accessibility features as a key barrier for BLV users of VR content and applications. In response, we used our findings from the formative study to inform the design and development of EnVisionVR, a novel visual accessibility system leveraging a VLM, voice input and multimodal feedback for scene interpretation and virtual object interaction in VR. An evaluation with 12 BLV users demonstrated that EnVisionVR significantly improved their ability to locate virtual objects, effectively supporting scene understanding and object interaction.

虚拟现实(VR)中有效的视觉可达性对盲人和低视力(BLV)用户至关重要。然而,由于3D VR环境的复杂性以及对可以轻松改造到现有应用程序的技术的需求,设计视觉可访问性系统具有挑战性。虽然先前的工作已经研究了如何增强或翻译视觉信息,但视觉语言模型(VLMs)的进步为提高当前系统的场景解释能力提供了一个令人兴奋的机会。本文介绍了一个可访问的虚拟现实场景解释工具EnVisionVR。通过对可用性障碍的形成性研究,我们证实了视觉可访问性特征的缺乏是虚拟现实内容和应用程序的BLV用户的主要障碍。作为回应,我们利用形成性研究的结果为EnVisionVR的设计和开发提供了信息,这是一种新型的视觉辅助系统,利用VLM、语音输入和多模态反馈来实现VR中的场景解释和虚拟对象交互。对12名BLV用户的评估表明,EnVisionVR显著提高了他们定位虚拟物体的能力,有效地支持了场景理解和物体交互。
{"title":"EnVisionVR: A Scene Interpretation Tool for Visual Accessibility in Virtual Reality.","authors":"Junlong Chen, Rosella P Galindo Esparza, Vanja Garaj, Per Ola Kristensson, John Dudley","doi":"10.1109/TVCG.2025.3617147","DOIUrl":"10.1109/TVCG.2025.3617147","url":null,"abstract":"<p><p>Effective visual accessibility in Virtual Reality (VR) is crucial for Blind and Low Vision (BLV) users. However, designing visual accessibility systems is challenging due to the complexity of 3D VR environments and the need for techniques that can be easily retrofitted into existing applications. While prior work has studied how to enhance or translate visual information, the advancement of Vision Language Models (VLMs) provides an exciting opportunity to advance the scene interpretation capability of current systems. This paper presents EnVisionVR, an accessibility tool for VR scene interpretation. Through a formative study of usability barriers, we confirmed the lack of visual accessibility features as a key barrier for BLV users of VR content and applications. In response, we used our findings from the formative study to inform the design and development of EnVisionVR, a novel visual accessibility system leveraging a VLM, voice input and multimodal feedback for scene interpretation and virtual object interaction in VR. An evaluation with 12 BLV users demonstrated that EnVisionVR significantly improved their ability to locate virtual objects, effectively supporting scene understanding and object interaction.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"2007-2019"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145240653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Volume Feature Aware View-Epipolar Transformers for Generalizable NeRF. 可扩展NeRF的体积特征感知视图-极外变压器。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3621585
Yilei Chen, Ping An, Xinpeng Huang, Qiang Wu

Generalizable NeRF synthesizes novel views of unseen scenes without per-scene training. The view-epipolar transformer has become popular in this field for its ability to produce high-quality views. Existing methods with this architecture rely on the assumption that texture consistency across views can identify object surfaces, with such identification crucial for determining where to reconstruct texture. However, this assumption is not always valid, as different surface positions may share similar texture features, creating ambiguity in surface identification. To handle this ambiguity, this paper introduces 3D volume features into the view-epipolar transformer. These features contain geometric information, which will be a supplement to texture features. By incorporating both texture and geometric cues in consistency measurement, our method mitigates the ambiguity in surface detection. This leads to more accurate surfaces and thus better novel view synthesis. Additionally, we propose a decoupled decoder where volume and texture features are used for density and color prediction respectively. In this way, the two properties can be better predicted without mutual interference. Experiments show improved results over existing transformer-based methods on both real-world and synthetic datasets.

可概括的NeRF在没有每个场景训练的情况下综合了未见场景的新视图。视图-极极变压器因其产生高质量视图的能力而在该领域受到欢迎。这种架构的现有方法依赖于这样一个假设,即纹理一致性跨视图可以识别物体表面,这种识别对于阻止挖掘重建纹理的位置至关重要。然而,这种假设并不总是有效的,因为不同的表面位置可能具有相似的纹理特征,从而在表面识别中产生歧义。为了解决这种模糊性,本文将三维体特征引入到视极变压器中。这些特征包含几何信息,是对纹理特征的补充。通过在一致性测量中结合纹理和几何线索,我们的方法减轻了表面检测中的模糊性。这导致了更精确的表面,从而更好的新视图合成。此外,我们提出了一种解耦解码器,其中体积和纹理特征分别用于密度和颜色预测。这样,两种性质可以在不相互干扰的情况下更好地预测。实验表明,在真实世界和合成数据集上,现有的基于变压器的方法的结果都有所改善。
{"title":"Volume Feature Aware View-Epipolar Transformers for Generalizable NeRF.","authors":"Yilei Chen, Ping An, Xinpeng Huang, Qiang Wu","doi":"10.1109/TVCG.2025.3621585","DOIUrl":"10.1109/TVCG.2025.3621585","url":null,"abstract":"<p><p>Generalizable NeRF synthesizes novel views of unseen scenes without per-scene training. The view-epipolar transformer has become popular in this field for its ability to produce high-quality views. Existing methods with this architecture rely on the assumption that texture consistency across views can identify object surfaces, with such identification crucial for determining where to reconstruct texture. However, this assumption is not always valid, as different surface positions may share similar texture features, creating ambiguity in surface identification. To handle this ambiguity, this paper introduces 3D volume features into the view-epipolar transformer. These features contain geometric information, which will be a supplement to texture features. By incorporating both texture and geometric cues in consistency measurement, our method mitigates the ambiguity in surface detection. This leads to more accurate surfaces and thus better novel view synthesis. Additionally, we propose a decoupled decoder where volume and texture features are used for density and color prediction respectively. In this way, the two properties can be better predicted without mutual interference. Experiments show improved results over existing transformer-based methods on both real-world and synthetic datasets.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"2049-2060"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145294710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiffPortraitVideo: Diffusion-Based Expression-Consistent Zero-Shot Portrait Video Translation. DiffPortraitVideo:基于扩散的表达一致的零镜头人像视频翻译。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3642300
Shaoxu Li, Chuhang Ma, Ye Pan

Zero-shot text-to-video diffusion models are crafted to expand pre-trained image diffusion models to the video domain without additional training. In recent times, prevailing techniques commonly rely on existing shapes as constraints and introduce inter-frame attention to ensure texture consistency. However, such shape constraints tend to restrict the stylized geometric deformation of videos and inadvertently neglect the original texture characteristics. Furthermore, existing methods suffer from flickering and inconsistent facial expressions. In this paper, we present DiffPortraitVideo. The framework employs a diffusion model-based feature and attention injection mechanism to generate key frames, with cross-frame constraints to enforce coherence and adaptive feature fusion to ensure expression consistency. Our approach achieves high spatio-temporal and expression consistency while retaining the textual and original image properties. Extensive and comprehensive experiments are conducted to validate the efficacy of our proposed framework in generating personalized, high-quality, and coherent videos. This not only showcases the superiority of our method over existing approaches but also paves the way for further research and development in the field of text-to-video generation with enhanced personalization and quality.

零射击文本到视频扩散模型是精心设计的,以扩展预训练的图像扩散模型到视频领域,而无需额外的训练。近年来,流行的技术通常依赖于现有的形状作为约束,并引入帧间注意来确保纹理的一致性。然而,这样的形状约束往往会限制视频的程式化几何变形,不经意间忽略了原有的纹理特征。此外,现有的方法存在面部表情闪烁和不一致的问题。在本文中,我们提出了DiffPortraitVideo。该框架采用基于扩散模型的特征和注意力注入机制生成关键帧,并采用跨帧约束增强一致性,自适应特征融合确保表达一致性。我们的方法在保留文本和原始图像属性的同时,实现了高度的时空和表达一致性。进行了广泛而全面的实验来验证我们提出的框架在生成个性化,高质量和连贯视频方面的有效性。这不仅展示了我们的方法相对于现有方法的优越性,而且为进一步研究和开发具有增强个性化和质量的文本到视频生成领域铺平了道路。
{"title":"DiffPortraitVideo: Diffusion-Based Expression-Consistent Zero-Shot Portrait Video Translation.","authors":"Shaoxu Li, Chuhang Ma, Ye Pan","doi":"10.1109/TVCG.2025.3642300","DOIUrl":"10.1109/TVCG.2025.3642300","url":null,"abstract":"<p><p>Zero-shot text-to-video diffusion models are crafted to expand pre-trained image diffusion models to the video domain without additional training. In recent times, prevailing techniques commonly rely on existing shapes as constraints and introduce inter-frame attention to ensure texture consistency. However, such shape constraints tend to restrict the stylized geometric deformation of videos and inadvertently neglect the original texture characteristics. Furthermore, existing methods suffer from flickering and inconsistent facial expressions. In this paper, we present DiffPortraitVideo. The framework employs a diffusion model-based feature and attention injection mechanism to generate key frames, with cross-frame constraints to enforce coherence and adaptive feature fusion to ensure expression consistency. Our approach achieves high spatio-temporal and expression consistency while retaining the textual and original image properties. Extensive and comprehensive experiments are conducted to validate the efficacy of our proposed framework in generating personalized, high-quality, and coherent videos. This not only showcases the superiority of our method over existing approaches but also paves the way for further research and development in the field of text-to-video generation with enhanced personalization and quality.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1656-1667"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual Stenography: Feature Recreation and Preservation in Sketches of Noisy Line Charts. 视觉速记:噪声折线图草图的特征再现与保存。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3626128
Rifat Ara Proma, Michael Correll, Ghulam Jilani Quadri, Paul Rosen

Line charts surface many features in time series data, from trends to periodicity to peaks & valleys. However, not every potentially important feature in the data may correspond to a visual feature that readers can detect or prioritize. In this study, we conducted a visual stenography task, where participants re-drew line charts to solicit information about the visual features they believed to be important. We systematically varied noise levels (SNR $approx$≈ 5-30 dB) across line charts to observe how visual clutter influences which features people prioritize in their sketches. We identified three key strategies that correlated with the noise present in the stimuli: the $color{green}{textit{Replicator}}$greenReplicator attempted to retain all major features of the line chart including noise; the $color{yellow}{textit{Trend Keeper}}$yellowTrendKeeper prioritized trends disregarding periodicity and peaks; and the $color{pink}{textit{De-noiser}}$pinkDe-noiser filtered out noise while preserving other features. Further, we found that participants tended to faithfully retain trends and peaks & valleys when these features were present, whereas periodicity and noise were represented in more qualitative or gestural ways: semantically rather than accurately. These results suggest a need to consider more flexible and human-centric ways of presenting, summarizing, preprocessing, or clustering time series data.

折线图显示了时间序列数据中的许多特征,从趋势到周期性到峰谷。然而,并非数据中的每个潜在重要特征都对应于读者可以检测或优先考虑的视觉特征。在这项研究中,我们进行了一项视觉速记任务,参与者重新绘制折线图,以获取他们认为重要的视觉特征的信息。我们系统地改变了折线图上的噪声水平(信噪比约为5-30 dB),以观察视觉杂乱如何影响人们在草图中优先考虑的特征。我们确定了与刺激中存在的噪声相关的三种关键策略:复制器试图保留折线图的所有主要特征,包括噪声;趋势管理员优先考虑趋势,而不考虑周期性和峰值;降噪器滤除噪声,同时保留其他特征。此外,我们发现当这些特征存在时,参与者倾向于忠实地保留趋势和峰谷,而周期性和噪声则以更定性或手势的方式表示:语义而不是准确。这些结果表明,需要考虑更灵活和以人为中心的方式来表示、总结、预处理或聚类时间序列数据。
{"title":"Visual Stenography: Feature Recreation and Preservation in Sketches of Noisy Line Charts.","authors":"Rifat Ara Proma, Michael Correll, Ghulam Jilani Quadri, Paul Rosen","doi":"10.1109/TVCG.2025.3626128","DOIUrl":"10.1109/TVCG.2025.3626128","url":null,"abstract":"<p><p>Line charts surface many features in time series data, from trends to periodicity to peaks & valleys. However, not every potentially important feature in the data may correspond to a visual feature that readers can detect or prioritize. In this study, we conducted a visual stenography task, where participants re-drew line charts to solicit information about the visual features they believed to be important. We systematically varied noise levels (SNR $approx$≈ 5-30 dB) across line charts to observe how visual clutter influences which features people prioritize in their sketches. We identified three key strategies that correlated with the noise present in the stimuli: the $color{green}{textit{Replicator}}$greenReplicator attempted to retain all major features of the line chart including noise; the $color{yellow}{textit{Trend Keeper}}$yellowTrendKeeper prioritized trends disregarding periodicity and peaks; and the $color{pink}{textit{De-noiser}}$pinkDe-noiser filtered out noise while preserving other features. Further, we found that participants tended to faithfully retain trends and peaks & valleys when these features were present, whereas periodicity and noise were represented in more qualitative or gestural ways: semantically rather than accurately. These results suggest a need to consider more flexible and human-centric ways of presenting, summarizing, preprocessing, or clustering time series data.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1879-1894"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145380514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DenseSplat: Densifying Gaussian Splatting SLAM With Neural Radiance Prior. DenseSplat:密集高斯溅射SLAM与神经辐射先验。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3617961
Mingrui Li, Shuhong Liu, Tianchen Deng, Hongyu Wang

Gaussian SLAM systems excel in real-time rendering and fine-grained reconstruction compared to NeRF-based systems. However, their reliance on extensive keyframes is impractical for deployment in real-world robotic systems, which typically operate under sparse-view conditions that can result in substantial holes in the map. To address these challenges, we introduce DenseSplat, the first SLAM system that effectively combines the advantages of NeRF and 3DGS. DenseSplat utilizes sparse keyframes and NeRF priors for initializing primitives that densely populate maps and seamlessly fill gaps. It also implements geometry-aware primitive sampling and pruning strategies to manage granularity and enhance rendering efficiency. Moreover, DenseSplat integrates loop closure and bundle adjustment, significantly enhancing frame-to-frame tracking accuracy. Extensive experiments on multiple large-scale datasets demonstrate that DenseSplat achieves superior performance in tracking and mapping compared to current state-of-the-art methods.

与基于nerf的系统相比,高斯SLAM系统在实时渲染和细粒度重建方面表现出色。然而,它们对大量关键帧的依赖对于现实世界机器人系统的部署是不切实际的,因为现实世界机器人系统通常在稀疏视图条件下运行,这可能导致地图上出现大量漏洞。为了应对这些挑战,我们推出了DenseSplat,这是第一个有效结合NeRF和3DGS优势的SLAM系统。DenseSplat利用稀疏关键帧和NeRF先验来初始化密集填充地图和无缝填充空白的原语。它还实现了几何感知的原始采样和修剪策略,以管理粒度和提高渲染效率。此外,DenseSplat集成了闭环和束调整,显著提高了帧到帧的跟踪精度。在多个大规模数据集上进行的大量实验表明,与目前最先进的方法相比,DenseSplat在跟踪和映射方面取得了卓越的性能。
{"title":"DenseSplat: Densifying Gaussian Splatting SLAM With Neural Radiance Prior.","authors":"Mingrui Li, Shuhong Liu, Tianchen Deng, Hongyu Wang","doi":"10.1109/TVCG.2025.3617961","DOIUrl":"10.1109/TVCG.2025.3617961","url":null,"abstract":"<p><p>Gaussian SLAM systems excel in real-time rendering and fine-grained reconstruction compared to NeRF-based systems. However, their reliance on extensive keyframes is impractical for deployment in real-world robotic systems, which typically operate under sparse-view conditions that can result in substantial holes in the map. To address these challenges, we introduce DenseSplat, the first SLAM system that effectively combines the advantages of NeRF and 3DGS. DenseSplat utilizes sparse keyframes and NeRF priors for initializing primitives that densely populate maps and seamlessly fill gaps. It also implements geometry-aware primitive sampling and pruning strategies to manage granularity and enhance rendering efficiency. Moreover, DenseSplat integrates loop closure and bundle adjustment, significantly enhancing frame-to-frame tracking accuracy. Extensive experiments on multiple large-scale datasets demonstrate that DenseSplat achieves superior performance in tracking and mapping compared to current state-of-the-art methods.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1993-2006"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145240672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RP-SLAM: Real-Time Photorealistic SLAM With Efficient 3D Gaussian Splatting. RP-SLAM:实时逼真的SLAM与高效的三维高斯飞溅。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3616173
Lizhi Bai, Chunqi Tian, Jun Yang, Siyu Zhang, Masanori Suganuma, Takayuki Okatani

3D Gaussian Splatting has emerged as a promising technique for high-quality 3D rendering, leading to increasing interest in integrating 3DGS into realism SLAM systems. However, existing methods face challenges such as Gaussian primitives redundancy, forgetting problem during continuous optimization, and difficulty in initializing primitives in monocular case due to lack of depth information. In order to achieve efficient and photorealistic mapping, we propose RP-SLAM, a 3D Gaussian splatting-based vision SLAM method for monocular and RGB-D cameras. RP-SLAM decouples camera poses estimation from Gaussian primitives optimization and consists of three key components. Firstly, we propose an efficient incremental mapping approach to achieve a compact and accurate representation of the scene through adaptive sampling and Gaussian primitives filtering. Secondly, a dynamic window optimization method is proposed to mitigate the forgetting problem and improve map consistency. Finally, for the monocular case, a monocular keyframe initialization method based on sparse point cloud is proposed to improve the initialization accuracy of Gaussian primitives, which provides a geometric basis for subsequent optimization. The results of numerous experiments demonstrate that RP-SLAM achieves state-of-the-art map rendering accuracy while ensuring real-time performance and model compactness.

3D高斯喷溅已经成为一种有前途的高质量3D渲染技术,导致越来越多的兴趣将3DGS集成到现实主义SLAM系统中。然而,现有的方法存在高斯基元冗余、连续优化过程中的遗忘问题、单目情况下缺乏深度信息导致基元初始化困难等问题。为了实现高效逼真的映射,我们提出了一种基于三维高斯飞溅的视觉SLAM方法RP-SLAM,该方法适用于单眼相机和RGB-D相机。RP-SLAM将相机姿态估计与高斯基元优化解耦,由三个关键部分组成。首先,我们提出了一种高效的增量映射方法,通过自适应采样和高斯原语滤波来实现紧凑准确的场景表示。其次,提出了一种动态窗口优化方法来缓解遗忘问题,提高地图一致性。最后,针对单目情况,提出了一种基于稀疏点云的单目关键帧初始化方法,提高高斯原语初始化精度,为后续优化提供几何基础。大量实验结果表明,RP-SLAM在保证实时性能和模型紧凑性的同时,实现了最先进的地图绘制精度。
{"title":"RP-SLAM: Real-Time Photorealistic SLAM With Efficient 3D Gaussian Splatting.","authors":"Lizhi Bai, Chunqi Tian, Jun Yang, Siyu Zhang, Masanori Suganuma, Takayuki Okatani","doi":"10.1109/TVCG.2025.3616173","DOIUrl":"10.1109/TVCG.2025.3616173","url":null,"abstract":"<p><p>3D Gaussian Splatting has emerged as a promising technique for high-quality 3D rendering, leading to increasing interest in integrating 3DGS into realism SLAM systems. However, existing methods face challenges such as Gaussian primitives redundancy, forgetting problem during continuous optimization, and difficulty in initializing primitives in monocular case due to lack of depth information. In order to achieve efficient and photorealistic mapping, we propose RP-SLAM, a 3D Gaussian splatting-based vision SLAM method for monocular and RGB-D cameras. RP-SLAM decouples camera poses estimation from Gaussian primitives optimization and consists of three key components. Firstly, we propose an efficient incremental mapping approach to achieve a compact and accurate representation of the scene through adaptive sampling and Gaussian primitives filtering. Secondly, a dynamic window optimization method is proposed to mitigate the forgetting problem and improve map consistency. Finally, for the monocular case, a monocular keyframe initialization method based on sparse point cloud is proposed to improve the initialization accuracy of Gaussian primitives, which provides a geometric basis for subsequent optimization. The results of numerous experiments demonstrate that RP-SLAM achieves state-of-the-art map rendering accuracy while ensuring real-time performance and model compactness.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1452-1466"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145208312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion. HuGDiffusion:通过3D高斯扩散的可通用的单图像人类渲染。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3625230
Yingzhi Tang, Qijian Zhang, Junhui Hou

We present HuGDiffusion, a generalizable 3D Gaussian splatting (3DGS) learning pipeline to achieve novel view synthesis (NVS) of human characters from single-view input images. Existing approaches typically require monocular videos or calibrated multi-view images as inputs, whose applicability could be weakened in real-world scenarios with arbitrary and/or unknown camera poses. In this paper, we aim to generate the set of 3DGS attributes via a diffusion-based framework conditioned on human priors extracted from a single image. Specifically, we begin with carefully integrated human-centric feature extraction procedures to deduce informative conditioning signals. Based on our empirical observations that jointly learning the whole 3DGS attributes is challenging to optimize, we design a multi-stage generation strategy to obtain different types of 3DGS attributes. To facilitate the training process, we investigate constructing proxy ground-truth 3D Gaussian attributes as high-quality attribute-level supervision signals. Through extensive experiments, our HuGDiffusion shows significant performance improvements over the state-of-the-art methods.

我们提出了HuGDiffusion,一个可推广的3D高斯飞溅(3DGS)学习管道,用于从单视图输入图像中实现人类角色的新视图合成(NVS)。现有的方法通常需要单目视频或校准的多视图图像作为输入,其适用性在任意和/或未知相机姿势的现实场景中可能会被削弱。在本文中,我们的目标是通过一个基于扩散的框架来生成一组3DGS属性,该框架以从单个图像中提取的人类先验为条件。具体来说,我们从精心集成的以人为中心的特征提取程序开始,以推断信息条件反射信号。根据我们的经验观察,共同学习整个3DGS属性是一个具有挑战性的优化,我们设计了一个多阶段生成策略来获得不同类型的3DGS属性。为了方便训练过程,我们研究了构造代理地真三维高斯属性作为高质量属性级监督信号。通过大量的实验,我们的HuGDiffusion显示出比最先进的方法有显著的性能改进。我们的代码将在https://github.com/haiantyz/HuGDiffusion.git上公开提供。
{"title":"HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion.","authors":"Yingzhi Tang, Qijian Zhang, Junhui Hou","doi":"10.1109/TVCG.2025.3625230","DOIUrl":"10.1109/TVCG.2025.3625230","url":null,"abstract":"<p><p>We present HuGDiffusion, a generalizable 3D Gaussian splatting (3DGS) learning pipeline to achieve novel view synthesis (NVS) of human characters from single-view input images. Existing approaches typically require monocular videos or calibrated multi-view images as inputs, whose applicability could be weakened in real-world scenarios with arbitrary and/or unknown camera poses. In this paper, we aim to generate the set of 3DGS attributes via a diffusion-based framework conditioned on human priors extracted from a single image. Specifically, we begin with carefully integrated human-centric feature extraction procedures to deduce informative conditioning signals. Based on our empirical observations that jointly learning the whole 3DGS attributes is challenging to optimize, we design a multi-stage generation strategy to obtain different types of 3DGS attributes. To facilitate the training process, we investigate constructing proxy ground-truth 3D Gaussian attributes as high-quality attribute-level supervision signals. Through extensive experiments, our HuGDiffusion shows significant performance improvements over the state-of-the-art methods.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"2061-2074"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145369319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topological Autoencoders++: Fast and Accurate Cycle-Aware Dimensionality Reduction. 拓扑自编码器++:快速和准确的周期感知降维。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3644671
Matteo Clemot, Julie Digne, Julien Tierny

This paper presents a novel topology-aware dimensionality reduction approach aiming at accurately visualizing the cyclic patterns present in high dimensional data. To that end, we build on the Topological Autoencoders (TopoAE) (Moor et al., 2020) formulation. First, we provide a novel theoretical analysis of its associated loss and show that a zero loss indeed induces identical persistence pairs (in high and low dimensions) for the 0-dimensional persistent homology ($text{PH}^{0}$) of the Rips filtration. We also provide a counter example showing that this property no longer holds for a naive extension of TopoAE to $text{PH}^{d}$ for $dgeq 1$. Based on this observation, we introduce a novel generalization of TopoAE to 1-dimensional persistent homology ($text{PH}^{1}$), called TopoAE++, for the accurate generation of cycle-aware planar embeddings, addressing the above failure case. This generalization is based on the notion of cascade distortion, a new penalty term favoring an isometric embedding of the 2-chains filling persistent 1-cycles, hence resulting in more faithful geometrical reconstructions of the 1-cycles in the plane. We further introduce a novel, fast algorithm for the exact computation of $text{PH}^{}$ for Rips filtrations in the plane, yielding improved runtimes over previously documented topology-aware methods. Our method also achieves a better balance between the topological accuracy, as measured by the Wasserstein distance, and the visual preservation of the cycles in low dimensions.

本文提出了一种新颖的拓扑感知降维方法,旨在准确地可视化高维数据中的循环模式。为此,我们建立在拓扑自编码器(TopoAE)[1]公式的基础上。首先,我们对其相关损失进行了新的理论分析,并表明零损失确实导致了Rips过滤的0维持续同源($PH^{0}$)的相同持久对(在高维和低维)。我们还提供了一个反例,表明该属性不再适用于将TopoAE简单扩展到$PH^{d}$ (d $geq$ 1)。基于这一观察结果,我们引入了一种新的TopoAE推广到一维持久同源($PH^{1}$),称为TopoAE++,用于精确生成周期感知平面嵌入,解决上述故障情况。这种推广是基于级联扭曲的概念,这是一个新的惩罚术语,有利于2链填充持久的1环的等距嵌入,从而导致平面上1环的更忠实的几何重建。我们进一步介绍了一种新的、快速的算法,用于精确计算平面上rip过滤的PH值,与以前记录的拓扑感知方法相比,可以提高运行时间。我们的方法还在拓扑精度(由Wasserstein距离测量)和低维循环的视觉保存之间实现了更好的平衡。我们的c++实现可以在https://github.com/MClemot/TopologicalAutoencodersPlusPlus上获得。
{"title":"Topological Autoencoders++: Fast and Accurate Cycle-Aware Dimensionality Reduction.","authors":"Matteo Clemot, Julie Digne, Julien Tierny","doi":"10.1109/TVCG.2025.3644671","DOIUrl":"10.1109/TVCG.2025.3644671","url":null,"abstract":"<p><p>This paper presents a novel topology-aware dimensionality reduction approach aiming at accurately visualizing the cyclic patterns present in high dimensional data. To that end, we build on the Topological Autoencoders (TopoAE) (Moor et al., 2020) formulation. First, we provide a novel theoretical analysis of its associated loss and show that a zero loss indeed induces identical persistence pairs (in high and low dimensions) for the 0-dimensional persistent homology ($text{PH}^{0}$) of the Rips filtration. We also provide a counter example showing that this property no longer holds for a naive extension of TopoAE to $text{PH}^{d}$ for $dgeq 1$. Based on this observation, we introduce a novel generalization of TopoAE to 1-dimensional persistent homology ($text{PH}^{1}$), called TopoAE++, for the accurate generation of cycle-aware planar embeddings, addressing the above failure case. This generalization is based on the notion of cascade distortion, a new penalty term favoring an isometric embedding of the 2-chains filling persistent 1-cycles, hence resulting in more faithful geometrical reconstructions of the 1-cycles in the plane. We further introduce a novel, fast algorithm for the exact computation of $text{PH}^{}$ for Rips filtrations in the plane, yielding improved runtimes over previously documented topology-aware methods. Our method also achieves a better balance between the topological accuracy, as measured by the Wasserstein distance, and the visual preservation of the cycles in low dimensions.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1622-1639"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GenFODrawing: Supporting Creative Found Object Drawing With Generative AI. GenFODrawing:支持生成AI的创造性发现对象绘制。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3626754
Jiaye Leng, Hui Ye, Pengfei Xu, Miu-Ling Lam, Hongbo Fu

Found object drawing is a creative art form incorporating everyday objects into imaginative images, offering a refreshing and unique way to express ideas. However, for many people, creating this type of work can be challenging due to difficulties in generating creative ideas and finding suitable reference images to help translate their ideas onto paper. Based on the findings of a formative study, we propose GenFODrawing, a creativity support tool to help users create diverse found object drawings. Our system provides AI-driven textual and visual inspirations, and enhances controllability through sketch-based and box-conditioned image generation, enabling users to create personalized outputs. We conducted a user study with twelve participants to compare GenFODrawing, to a baseline condition where the participants completed the creative tasks using their own desired approaches without access to our system. The study demonstrated that GenFODrawing, enabled easier exploration of diverse ideas, greater agency and control through the creative process, and higher creativity support compared to the baseline. A further open-ended study demonstrated the system's usability and expressiveness, and all participants found the creative process engaging.

实物绘画是一种创造性的艺术形式,将日常物品融入富有想象力的图像中,提供了一种令人耳目一新的独特方式来表达想法。然而,对于许多人来说,创造这种类型的工作可能是具有挑战性的,因为很难产生创造性的想法,并找到合适的参考图像来帮助他们将想法转化为纸。基于一项形成性研究的发现,我们提出了GenFODrawing,这是一个创造力支持工具,可以帮助用户创建各种各样的发现对象绘图。我们的系统提供人工智能驱动的文本和视觉灵感,并通过基于草图和盒子条件的图像生成增强可控性,使用户能够创建个性化的输出。我们对12名参与者进行了一项用户研究,将GenFODrawing与基线条件进行比较,在基线条件下,参与者使用自己想要的方法完成创造性任务,而无需访问我们的系统。研究表明,与基线相比,GenFODrawing可以更容易地探索各种想法,在创作过程中发挥更大的作用和控制力,并提供更高的创造力支持。进一步的开放式研究证明了系统的可用性和表现力,所有参与者都发现创造性的过程很吸引人。
{"title":"GenFODrawing: Supporting Creative Found Object Drawing With Generative AI.","authors":"Jiaye Leng, Hui Ye, Pengfei Xu, Miu-Ling Lam, Hongbo Fu","doi":"10.1109/TVCG.2025.3626754","DOIUrl":"10.1109/TVCG.2025.3626754","url":null,"abstract":"<p><p>Found object drawing is a creative art form incorporating everyday objects into imaginative images, offering a refreshing and unique way to express ideas. However, for many people, creating this type of work can be challenging due to difficulties in generating creative ideas and finding suitable reference images to help translate their ideas onto paper. Based on the findings of a formative study, we propose GenFODrawing, a creativity support tool to help users create diverse found object drawings. Our system provides AI-driven textual and visual inspirations, and enhances controllability through sketch-based and box-conditioned image generation, enabling users to create personalized outputs. We conducted a user study with twelve participants to compare GenFODrawing, to a baseline condition where the participants completed the creative tasks using their own desired approaches without access to our system. The study demonstrated that GenFODrawing, enabled easier exploration of diverse ideas, greater agency and control through the creative process, and higher creativity support compared to the baseline. A further open-ended study demonstrated the system's usability and expressiveness, and all participants found the creative process engaging.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1978-1992"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145411428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on visualization and computer graphics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1