首页 > 最新文献

IEEE transactions on visualization and computer graphics最新文献

英文 中文
CloseUpShot: Close-Up Novel View Synthesis From Sparse-Views via Point-Conditioned Diffusion Model. 特写镜头:利用点条件扩散模型从稀疏视图合成特写新视图。
IF 6.5 Pub Date : 2026-02-01 DOI: 10.1109/TVCG.2025.3635342
Yuqi Zhang, Guanying Chen, Jiaxing Chen, Chuanyu Fu, Chuan Huang, Shuguang Cui

Reconstructing 3D scenes and synthesizing novel views from sparse input views is a highly challenging task. Recent advances in video diffusion models have demonstrated strong temporal reasoning capabilities, making them a promising tool for enhancing reconstruction quality under sparse-view settings. However, existing approaches are primarily designed for modest viewpoint variations, which struggle in capturing fine-grained details in close-up scenarios since input information is severely limited. In this paper, we present a diffusion-based framework, called CloseUpShot, for close-up novel view synthesis from sparse inputs via point-conditioned video diffusion. Specifically, we observe that pixel-warping conditioning suffers from severe sparsity and background leakage in close-up settings. To address this, we propose hierarchical warping and occlusion-aware noise suppression, enhancing the quality and completeness of the conditioning images for the video diffusion model. Furthermore, we introduce global structure guidance, which leverages a dense fused point cloud to provide consistent geometric context to the diffusion process, to compensate for the lack of globally consistent 3D constraints in sparse conditioning inputs. Extensive experiments on multiple datasets demonstrate that our method outperforms existing approaches, especially in close-up novel view synthesis, clearly validating the effectiveness of our design.

重建三维场景并从稀疏输入视图合成新视图是一项极具挑战性的任务。视频扩散模型的最新进展显示了强大的时间推理能力,使其成为在稀疏视图设置下提高重建质量的有前途的工具。然而,现有的方法主要是为适度的视点变化而设计的,由于输入信息严重有限,因此难以在特写场景中捕获细粒度的细节。在本文中,我们提出了一个基于扩散的框架,称为CloseUpShot,用于通过点条件视频扩散从稀疏输入合成特写新视图。具体来说,我们观察到像素扭曲条件在特写设置中受到严重的稀疏性和背景泄漏的影响。为了解决这个问题,我们提出了分层扭曲和闭塞感知噪声抑制,提高了视频扩散模型的条件图像的质量和完整性。此外,我们引入了全局结构指导,它利用密集的融合点云为扩散过程提供一致的几何背景,以弥补稀疏条件输入中缺乏全局一致的3D约束。在多个数据集上进行的大量实验表明,我们的方法优于现有的方法,特别是在特写新视图合成方面,清楚地验证了我们设计的有效性。
{"title":"CloseUpShot: Close-Up Novel View Synthesis From Sparse-Views via Point-Conditioned Diffusion Model.","authors":"Yuqi Zhang, Guanying Chen, Jiaxing Chen, Chuanyu Fu, Chuan Huang, Shuguang Cui","doi":"10.1109/TVCG.2025.3635342","DOIUrl":"10.1109/TVCG.2025.3635342","url":null,"abstract":"<p><p>Reconstructing 3D scenes and synthesizing novel views from sparse input views is a highly challenging task. Recent advances in video diffusion models have demonstrated strong temporal reasoning capabilities, making them a promising tool for enhancing reconstruction quality under sparse-view settings. However, existing approaches are primarily designed for modest viewpoint variations, which struggle in capturing fine-grained details in close-up scenarios since input information is severely limited. In this paper, we present a diffusion-based framework, called CloseUpShot, for close-up novel view synthesis from sparse inputs via point-conditioned video diffusion. Specifically, we observe that pixel-warping conditioning suffers from severe sparsity and background leakage in close-up settings. To address this, we propose hierarchical warping and occlusion-aware noise suppression, enhancing the quality and completeness of the conditioning images for the video diffusion model. Furthermore, we introduce global structure guidance, which leverages a dense fused point cloud to provide consistent geometric context to the diffusion process, to compensate for the lack of globally consistent 3D constraints in sparse conditioning inputs. Extensive experiments on multiple datasets demonstrate that our method outperforms existing approaches, especially in close-up novel view synthesis, clearly validating the effectiveness of our design.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1467-1480"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145575129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Artificial Facial Mimicry in Immersive Virtual Reality. 沉浸式虚拟现实中的人工面部模仿。
IF 6.5 Pub Date : 2026-01-29 DOI: 10.1109/TVCG.2026.3656848
Elena Piscopo

Artificial facial mimicry (AFM) is increasingly used to enhance social interaction with virtual agents in immersive virtual reality. However, its psychological and ethical implications remain insufficiently explored. This article conceptualizes AFM as an effective and embodied intervention, examining the role of emotional congruence, individual differences, and clinical vulnerability in shaping user responses. We further outline methodological directions involving physiological measures and embodied coordination. By framing AFM within affective computing and embodied cognition, this work contributes to the responsible design of emotionally adaptive virtual agents.

在沉浸式虚拟现实中,人工面部模仿(AFM)越来越多地用于增强与虚拟代理的社交互动。然而,其心理和伦理意义仍未得到充分探讨。本文将AFM定义为一种有效的具体干预手段,研究了情感一致性、个体差异和临床脆弱性在塑造用户反应中的作用。我们进一步概述了涉及生理测量和具体协调的方法方向。通过在情感计算和具身认知中构建AFM,这项工作有助于情感适应性虚拟代理的负责任设计。
{"title":"Artificial Facial Mimicry in Immersive Virtual Reality.","authors":"Elena Piscopo","doi":"10.1109/TVCG.2026.3656848","DOIUrl":"https://doi.org/10.1109/TVCG.2026.3656848","url":null,"abstract":"<p><p>Artificial facial mimicry (AFM) is increasingly used to enhance social interaction with virtual agents in immersive virtual reality. However, its psychological and ethical implications remain insufficiently explored. This article conceptualizes AFM as an effective and embodied intervention, examining the role of emotional congruence, individual differences, and clinical vulnerability in shaping user responses. We further outline methodological directions involving physiological measures and embodied coordination. By framing AFM within affective computing and embodied cognition, this work contributes to the responsible design of emotionally adaptive virtual agents.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MaskScene: Hierarchical Conditional Masked Models for Real-time 3D Indoor Scene Synthesis. MaskScene:用于实时3D室内场景合成的分层条件遮罩模型。
IF 6.5 Pub Date : 2026-01-27 DOI: 10.1109/TVCG.2026.3658429
Xinyu Zhang, Yusen Liu, Qichuan Geng, Zhong Zhou, Wenfeng Song

Indoor scene synthesis is essential for creative industries, recent advances in scene synthesis using diffusion and autoregressive models have shown promising results. However, existing models struggle to simultaneously achieve real-time performance, high visual fidelity, and flexible scene editability. To tackle these challenges, we propose MaskScene, a novel hierarchical conditional masked model for real-time 3D indoor scene synthesis and editing. Specifically, MaskScene introduces a hierarchical scene representation that explicitly encodes scene relationships, semantics, and tokenization. Based on this representation, we design a hierarchical conditional masked modeling architecture that enables parallel and iterative decoding, conditioned on both semantics and relationships. By masking local objects and leveraging the hierarchical structure of the scene, the model learns to infer and synthesize missing regions from partial observations, enabling rapid construction of 3D indoor environments that more accurately reflect real-world scenes. Compared to state-of-the-art methods, MaskScene achieves 80× faster generation speed and improves scene quality by 10%, while also supporting zero-shot editing, such as scene completion and rearrangement, without extra fine-tuning. Our project and dataset will be public.

室内场景合成对创意产业至关重要,最近在使用扩散和自回归模型的场景合成方面取得了进展,结果令人鼓舞。然而,现有的模型很难同时实现实时性能、高视觉保真度和灵活的场景可编辑性。为了解决这些挑战,我们提出了MaskScene,这是一种用于实时3D室内场景合成和编辑的新型分层条件掩模模型。具体来说,MaskScene引入了一个分层场景表示,它显式地对场景关系、语义和标记化进行编码。基于这种表示,我们设计了一个分层的条件屏蔽建模架构,该架构支持并行和迭代解码,并以语义和关系为条件。通过掩盖局部物体和利用场景的层次结构,该模型学会从部分观测中推断和综合缺失区域,从而能够快速构建更准确地反映现实世界场景的3D室内环境。与最先进的方法相比,MaskScene实现了80倍更快的生成速度,并将场景质量提高了10%,同时还支持零拍摄编辑,如场景完成和重排,而无需额外的微调。我们的项目和数据集将是公开的。
{"title":"MaskScene: Hierarchical Conditional Masked Models for Real-time 3D Indoor Scene Synthesis.","authors":"Xinyu Zhang, Yusen Liu, Qichuan Geng, Zhong Zhou, Wenfeng Song","doi":"10.1109/TVCG.2026.3658429","DOIUrl":"https://doi.org/10.1109/TVCG.2026.3658429","url":null,"abstract":"<p><p>Indoor scene synthesis is essential for creative industries, recent advances in scene synthesis using diffusion and autoregressive models have shown promising results. However, existing models struggle to simultaneously achieve real-time performance, high visual fidelity, and flexible scene editability. To tackle these challenges, we propose MaskScene, a novel hierarchical conditional masked model for real-time 3D indoor scene synthesis and editing. Specifically, MaskScene introduces a hierarchical scene representation that explicitly encodes scene relationships, semantics, and tokenization. Based on this representation, we design a hierarchical conditional masked modeling architecture that enables parallel and iterative decoding, conditioned on both semantics and relationships. By masking local objects and leveraging the hierarchical structure of the scene, the model learns to infer and synthesize missing regions from partial observations, enabling rapid construction of 3D indoor environments that more accurately reflect real-world scenes. Compared to state-of-the-art methods, MaskScene achieves 80× faster generation speed and improves scene quality by 10%, while also supporting zero-shot editing, such as scene completion and rearrangement, without extra fine-tuning. Our project and dataset will be public.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Make the Unhearable Visible: Exploring Visualization for Musical Instrument Practice. 把听不见的变成看得见的:探索乐器练习的可视化。
IF 6.5 Pub Date : 2026-01-27 DOI: 10.1109/TVCG.2026.3658216
Frank Heyen, Michael Gleicher, Michael Sedlmair

We explore the potential of visualization to support musicians in instrument practice through real-time feedback and reflection on their playing. Musicians often struggle to observe patterns in their playing and interpret them with respect to their goals. Our premise is that these patterns can be made visible with interactive visualization: we can make the unhearable visible. However, understanding the design of such visualizations is challenging: the diversity of needs, including different instruments, skills, musical attributes, and genres, means that any single use case is unlikely to illustrate the broad potential and opportunities. To address this challenge, we conducted a design exploration where we created and iterated on 33 designs, each focusing on a subset of needs, for example, only one musical skill. Our designs are grounded in our own experience as musicians and the ideas and feedback of 18 musicians with various musical backgrounds and we evaluated them with 13 music learners and teachers. This paper presents the results of our exploration, focusing on a few example designs as instances of possible instrument practice visualizations. From our work, we draw design considerations that contribute to future research and products for visual instrument education. Supplemental materials are available at github.com/visvar/mila.

我们探索可视化的潜力,以支持音乐家在乐器实践通过实时反馈和反思他们的演奏。音乐家经常努力观察他们的演奏模式,并根据他们的目标来解释它们。我们的前提是,这些模式可以通过交互式可视化变得可见:我们可以让听不见的东西变得可见。然而,理解这种可视化的设计是具有挑战性的:需求的多样性,包括不同的乐器、技能、音乐属性和流派,意味着任何单一的用例都不太可能说明广泛的潜力和机会。为了应对这一挑战,我们进行了一次设计探索,我们创建并迭代了33个设计,每个设计都专注于需求的子集,例如,只有一种音乐技能。我们的设计基于我们自己作为音乐家的经验,以及18位不同音乐背景的音乐家的想法和反馈,我们与13位音乐学习者和老师一起对他们进行了评估。本文介绍了我们探索的结果,重点介绍了几个示例设计,作为可能的乐器实践可视化的实例。从我们的工作中,我们得出了有助于未来研究和视觉仪器教育产品的设计考虑。补充材料可在github.com/visvar/mila上获得。
{"title":"Make the Unhearable Visible: Exploring Visualization for Musical Instrument Practice.","authors":"Frank Heyen, Michael Gleicher, Michael Sedlmair","doi":"10.1109/TVCG.2026.3658216","DOIUrl":"https://doi.org/10.1109/TVCG.2026.3658216","url":null,"abstract":"<p><p>We explore the potential of visualization to support musicians in instrument practice through real-time feedback and reflection on their playing. Musicians often struggle to observe patterns in their playing and interpret them with respect to their goals. Our premise is that these patterns can be made visible with interactive visualization: we can make the unhearable visible. However, understanding the design of such visualizations is challenging: the diversity of needs, including different instruments, skills, musical attributes, and genres, means that any single use case is unlikely to illustrate the broad potential and opportunities. To address this challenge, we conducted a design exploration where we created and iterated on 33 designs, each focusing on a subset of needs, for example, only one musical skill. Our designs are grounded in our own experience as musicians and the ideas and feedback of 18 musicians with various musical backgrounds and we evaluated them with 13 music learners and teachers. This paper presents the results of our exploration, focusing on a few example designs as instances of possible instrument practice visualizations. From our work, we draw design considerations that contribute to future research and products for visual instrument education. Supplemental materials are available at github.com/visvar/mila.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5,"publicationDate":"2026-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Scale Breaks "Normalized Stress" and KL Divergence: Rethinking Quality Metrics. 尺度如何打破“标准化压力”和KL分歧:重新思考质量度量。
IF 6.5 Pub Date : 2026-01-26 DOI: 10.1109/TVCG.2026.3657654
Kiran Smelser, Kaviru Gunaratne, Jacob Miller, Stephen Kobourov

Complex, high-dimensional data is ubiquitous across many scientific disciplines, including machine learning, biology, and the social sciences. One of the primary methods of visualizing these datasets is with two-dimensional scatter plots that visually capture some properties of the data. Because visually determining the accuracy of these plots is challenging, researchers often use quality metrics to measure the projection's accuracy and faithfulness to the original data. One of the most commonly employed metrics, normalized stress, is sensitive to uniform scaling (stretching, shrinking) of the projection, despite this act not meaningfully changing anything about the projection. Another quality metric, the Kullback-Leibler (KL) divergence used in the popular t-Distributed Stochastic Neighbor Embedding (t-SNE) technique, is also susceptible to this scale sensitivity. We investigate the effect of scaling on stress and KL divergence analytically and empirically by showing just how much the values change and how this affects dimension reduction technique evaluations. We introduce a simple technique to make both metrics scale-invariant and show that it accurately captures expected behavior on a small benchmark.

复杂的高维数据在许多科学学科中无处不在,包括机器学习、生物学和社会科学。可视化这些数据集的主要方法之一是使用二维散点图,直观地捕捉数据的一些属性。因为从视觉上确定这些图的准确性是具有挑战性的,研究人员经常使用质量指标来衡量投影的准确性和对原始数据的忠实度。最常用的指标之一,标准化应力,对投影的均匀缩放(拉伸、收缩)很敏感,尽管这种行为不会对投影产生任何有意义的改变。另一种质量度量,流行的t分布随机邻居嵌入(t-SNE)技术中使用的Kullback-Leibler (KL)散度,也容易受到这种尺度敏感性的影响。我们通过显示值变化的多少以及这如何影响降维技术评估,分析和经验地研究了尺度对应力和KL散度的影响。我们介绍了一种简单的技术,使这两个指标具有尺度不变性,并表明它可以准确地捕获小型基准上的预期行为。
{"title":"How Scale Breaks \"Normalized Stress\" and KL Divergence: Rethinking Quality Metrics.","authors":"Kiran Smelser, Kaviru Gunaratne, Jacob Miller, Stephen Kobourov","doi":"10.1109/TVCG.2026.3657654","DOIUrl":"https://doi.org/10.1109/TVCG.2026.3657654","url":null,"abstract":"<p><p>Complex, high-dimensional data is ubiquitous across many scientific disciplines, including machine learning, biology, and the social sciences. One of the primary methods of visualizing these datasets is with two-dimensional scatter plots that visually capture some properties of the data. Because visually determining the accuracy of these plots is challenging, researchers often use quality metrics to measure the projection's accuracy and faithfulness to the original data. One of the most commonly employed metrics, normalized stress, is sensitive to uniform scaling (stretching, shrinking) of the projection, despite this act not meaningfully changing anything about the projection. Another quality metric, the Kullback-Leibler (KL) divergence used in the popular t-Distributed Stochastic Neighbor Embedding (t-SNE) technique, is also susceptible to this scale sensitivity. We investigate the effect of scaling on stress and KL divergence analytically and empirically by showing just how much the values change and how this affects dimension reduction technique evaluations. We introduce a simple technique to make both metrics scale-invariant and show that it accurately captures expected behavior on a small benchmark.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sketch2Avatar: Geometry-Guided 3D Full-Body Human Generation in 360° from Hand-Drawn Sketches. Sketch2Avatar:几何引导的3D全身人体生成360°从手绘草图。
IF 6.5 Pub Date : 2026-01-26 DOI: 10.1109/TVCG.2026.3657593
Ziwei Chen, Qiang Li, Jie Zhang, Anthony Kong, Ping Li

Generating full-body humans in 360° has broad applications in digital entertainment, online education and art design. Existing works primarily rely on coarse conditions such as body pose to guide the generation, lacking detailed control over the synthesized results. Regarding this limitation, sketches offer a promising alternative as an expressive condition that enables more explicit and precise control. However, current sketch-based generation methods focus on faces or common objects, how to transfer sketches into 360° full-body humans remains unexplored. We first propose two straightforward strategies: adapting sketch-based 3D face generation to full-body human or lifting sketch-based 2D human generation to 3D format through a two-stage approach. Unfortunately, both methods result in unsatisfactory degradation of generation quality. To bridge this gap, in this work, we propose Sketch2Avatar, the first generative model to achieve 3D full-body human generation from hand-drawn sketches. Our model is capable of synthesizing sketch-aligned and 360°-consistent full-body human images by leveraging the geometry information extracted from sketches to guide the 3D representation generation and neural rendering. Specifically, we propose sketch-guided 3D representation generation to model the 3D human and maintain the alignment between input sketches and generated humans. Our transformer-based generator incorporates spatial feature guidance and latent modulation derived from sketches to produce high-quality 3D representations. Additionally, our designed body aware neural rendering utilizes 3D human body priors from sketches, simplifying the learning of articulated body poses and complex body shapes. To train and evaluate our model, we construct a large-scale dataset comprising approximately 19K 2D full-body human images and their corresponding sketches in a hand-drawn style. Experimental results demonstrate that our Sketch2Avatar can transfer hand-drawn sketches into photo-realistic 360° full-body human images with precise sketch-human alignment. Ablation studies further validate the effectiveness of our design choices. Our project is publicly available at: https://richardchen20.github.io/Sketch2Avatar.

360°生成全身人体在数字娱乐、在线教育和艺术设计等领域有着广泛的应用。现有的作品主要依靠身体姿势等粗糙条件来指导生成,缺乏对合成结果的详细控制。考虑到这一限制,草图提供了一个有希望的替代方案,作为一种表达条件,可以实现更明确和精确的控制。然而,目前基于草图的生成方法主要集中在人脸或普通物体上,如何将草图转化为360°的全身人体仍然是一个未探索的问题。我们首先提出了两种简单的策略:通过两阶段的方法,将基于草图的3D人脸生成适应于全身人体或将基于草图的2D人体生成提升到3D格式。不幸的是,这两种方法都导致了令人不满意的发电质量退化。为了弥补这一差距,在这项工作中,我们提出了Sketch2Avatar,这是第一个通过手绘草图实现3D全身人体生成的生成模型。我们的模型能够利用从草图中提取的几何信息来指导三维表示生成和神经渲染,从而合成与草图对齐和360°一致的全身人体图像。具体来说,我们提出了草图引导的3D表示生成来建模3D人体,并保持输入草图和生成的人体之间的一致性。我们基于变压器的生成器结合了空间特征引导和来自草图的潜在调制,以产生高质量的3D表示。此外,我们设计的身体感知神经渲染利用了来自草图的3D人体先验,简化了铰接身体姿势和复杂身体形状的学习。为了训练和评估我们的模型,我们构建了一个大型数据集,其中包括大约19K的2D全身人体图像及其相应的手绘草图。实验结果表明,我们的Sketch2Avatar可以将手绘草图精确地转换为逼真的360°全身人体图像。消融研究进一步验证了我们设计选择的有效性。我们的项目是公开的:https://richardchen20.github.io/Sketch2Avatar。
{"title":"Sketch2Avatar: Geometry-Guided 3D Full-Body Human Generation in 360° from Hand-Drawn Sketches.","authors":"Ziwei Chen, Qiang Li, Jie Zhang, Anthony Kong, Ping Li","doi":"10.1109/TVCG.2026.3657593","DOIUrl":"https://doi.org/10.1109/TVCG.2026.3657593","url":null,"abstract":"<p><p>Generating full-body humans in 360° has broad applications in digital entertainment, online education and art design. Existing works primarily rely on coarse conditions such as body pose to guide the generation, lacking detailed control over the synthesized results. Regarding this limitation, sketches offer a promising alternative as an expressive condition that enables more explicit and precise control. However, current sketch-based generation methods focus on faces or common objects, how to transfer sketches into 360° full-body humans remains unexplored. We first propose two straightforward strategies: adapting sketch-based 3D face generation to full-body human or lifting sketch-based 2D human generation to 3D format through a two-stage approach. Unfortunately, both methods result in unsatisfactory degradation of generation quality. To bridge this gap, in this work, we propose Sketch2Avatar, the first generative model to achieve 3D full-body human generation from hand-drawn sketches. Our model is capable of synthesizing sketch-aligned and 360°-consistent full-body human images by leveraging the geometry information extracted from sketches to guide the 3D representation generation and neural rendering. Specifically, we propose sketch-guided 3D representation generation to model the 3D human and maintain the alignment between input sketches and generated humans. Our transformer-based generator incorporates spatial feature guidance and latent modulation derived from sketches to produce high-quality 3D representations. Additionally, our designed body aware neural rendering utilizes 3D human body priors from sketches, simplifying the learning of articulated body poses and complex body shapes. To train and evaluate our model, we construct a large-scale dataset comprising approximately 19K 2D full-body human images and their corresponding sketches in a hand-drawn style. Experimental results demonstrate that our Sketch2Avatar can transfer hand-drawn sketches into photo-realistic 360° full-body human images with precise sketch-human alignment. Ablation studies further validate the effectiveness of our design choices. Our project is publicly available at: https://richardchen20.github.io/Sketch2Avatar.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoreEditor: Correspondence-constrained Diffusion for Consistent 3D Editing. CoreEditor:一致3D编辑的通信约束扩散。
IF 6.5 Pub Date : 2026-01-26 DOI: 10.1109/TVCG.2026.3657658
Zhe Zhu, Honghua Chen, Peng Li, Mingqiang Wei

Text-driven 3D editing is an emerging task that focuses on modifying scenes based on text prompts. Current methods often adapt pre-trained 2D image editors to multi-view observations, using specific strategies to combine information across views. However, these approaches still struggle with ensuring consistency across views, as they lack precise control over the sharing of information, resulting in edits with insufficient visual changes and blurry details. In this paper, we propose CoreEditor, a novel framework for consistent text-to-3D editing. At the core of our approach is a novel correspondence-constrained attention mechanism, which enforces structured interactions between corresponding pixels that are expected to remain visually consistent during the diffusion denoising process. Unlike conventional wisdom that relies solely on scene geometry, we enhance the correspondence by incorporating semantic similarity derived from the diffusion denoising process. This combined support from both geometry and semantics ensures a robust multi-view editing process. Additionally, we introduce a selective editing pipeline that enables users to choose their preferred edits from multiple candidates, creating a more flexible and user-centered 3D editing process. Extensive experiments demonstrate the effectiveness of CoreEditor, showing its ability to generate high-quality 3D edits, significantly outperforming existing methods.

文本驱动的3D编辑是一项新兴的任务,主要是根据文本提示修改场景。目前的方法通常采用预先训练好的2D图像编辑器来适应多视图观察,使用特定的策略来组合视图间的信息。然而,这些方法仍然难以确保视图之间的一致性,因为它们缺乏对信息共享的精确控制,导致编辑的视觉变化不足,细节模糊。在本文中,我们提出了一个新的框架CoreEditor,用于一致的文本到3d编辑。我们方法的核心是一种新的对应约束注意机制,它强制相应像素之间的结构化交互,这些像素在扩散去噪过程中有望保持视觉一致性。与仅依赖场景几何的传统智慧不同,我们通过结合来自扩散去噪过程的语义相似性来增强对应性。这种几何和语义的结合支持确保了健壮的多视图编辑过程。此外,我们引入了一个选择性编辑管道,使用户能够从多个候选中选择他们喜欢的编辑,创建一个更灵活和以用户为中心的3D编辑过程。大量的实验证明了CoreEditor的有效性,显示了其生成高质量3D编辑的能力,显着优于现有方法。
{"title":"CoreEditor: Correspondence-constrained Diffusion for Consistent 3D Editing.","authors":"Zhe Zhu, Honghua Chen, Peng Li, Mingqiang Wei","doi":"10.1109/TVCG.2026.3657658","DOIUrl":"https://doi.org/10.1109/TVCG.2026.3657658","url":null,"abstract":"<p><p>Text-driven 3D editing is an emerging task that focuses on modifying scenes based on text prompts. Current methods often adapt pre-trained 2D image editors to multi-view observations, using specific strategies to combine information across views. However, these approaches still struggle with ensuring consistency across views, as they lack precise control over the sharing of information, resulting in edits with insufficient visual changes and blurry details. In this paper, we propose CoreEditor, a novel framework for consistent text-to-3D editing. At the core of our approach is a novel correspondence-constrained attention mechanism, which enforces structured interactions between corresponding pixels that are expected to remain visually consistent during the diffusion denoising process. Unlike conventional wisdom that relies solely on scene geometry, we enhance the correspondence by incorporating semantic similarity derived from the diffusion denoising process. This combined support from both geometry and semantics ensures a robust multi-view editing process. Additionally, we introduce a selective editing pipeline that enables users to choose their preferred edits from multiple candidates, creating a more flexible and user-centered 3D editing process. Extensive experiments demonstrate the effectiveness of CoreEditor, showing its ability to generate high-quality 3D edits, significantly outperforming existing methods.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modulating Effort Sensations in virtual reality: A Parameter-Based Haptic Feedback Approach. 调节虚拟现实中的努力感觉:一种基于参数的触觉反馈方法。
IF 6.5 Pub Date : 2026-01-26 DOI: 10.1109/TVCG.2026.3657634
Yann Glemarec, Tom Roy, Quentin Galvane, Gurvan Lecuyer, Anatole Lecuyer, Ferran Argelaguet

Virtual reality is becoming increasingly popular, and modern haptic equipment, such as vibrotactile suits, haptic gloves, and force-feedback controllers, offers new means of interaction within virtual environments, significantly enhancing user experience. When interacting with virtual objects, combined visual and haptic feedback simulates the physical sensations of grasping, lifting, or moving real objects. This sensorimotor feedback is essential for inducing a sense of presence and agency, yet it remains challenging to reproduce in the absence of reliable haptic cues. In this study, we design and evaluate several haptic metaphors using combinations of vibrotactile design parameters to simulate the lifting effort associated with light to heavy objects. These parameters include primitive signals, intensity, spatial density, propagation, and temporal density. Our contribution is threefold. First, we propose a method for modulating perceived physical effort by extending signal intensity with spatial and temporal density, which together reflect the effort required to lift an object. Second, we present a user study in which participants compared haptic effects and ranked them according to perceived lifting effort, comfort, and confidence, allowing us to assess the influence of each parameter. Third, we report the results of a second study in which participants evaluated vibrotactile effects when lifting different virtual objects. The findings confirm the importance of intensity and spatial density, as well as the influence of graphical representation on perceived effort. This research provides practical insights for designing haptic-enabled virtual reality systems and offers guidance for developers seeking to create more expressive and believable vibrotactile interactions.

虚拟现实正变得越来越流行,现代触觉设备,如振动触觉服、触觉手套和力反馈控制器,在虚拟环境中提供了新的交互方式,显著提高了用户体验。当与虚拟物体交互时,结合视觉和触觉反馈模拟抓取,举起或移动真实物体的物理感觉。这种感觉运动反馈对于诱导存在感和代理感至关重要,但在缺乏可靠的触觉线索的情况下,再现这种反馈仍然具有挑战性。在这项研究中,我们设计和评估了几种触觉隐喻,使用振动触觉设计参数的组合来模拟轻到重物体的举升力。这些参数包括原始信号、强度、空间密度、传播和时间密度。我们的贡献是三重的。首先,我们提出了一种方法,通过扩展空间和时间密度的信号强度来调节感知体力,这两个密度共同反映了举起物体所需的体力。其次,我们提出了一项用户研究,其中参与者比较了触觉效果,并根据感知的举起力度、舒适度和信心对它们进行了排名,使我们能够评估每个参数的影响。第三,我们报告了第二项研究的结果,在该研究中,参与者评估了举起不同虚拟物体时的振动触觉效果。研究结果证实了强度和空间密度的重要性,以及图形表示对感知努力的影响。这项研究为设计触觉虚拟现实系统提供了实用的见解,并为寻求创造更具表现力和可信度的振动触觉交互的开发人员提供了指导。
{"title":"Modulating Effort Sensations in virtual reality: A Parameter-Based Haptic Feedback Approach.","authors":"Yann Glemarec, Tom Roy, Quentin Galvane, Gurvan Lecuyer, Anatole Lecuyer, Ferran Argelaguet","doi":"10.1109/TVCG.2026.3657634","DOIUrl":"https://doi.org/10.1109/TVCG.2026.3657634","url":null,"abstract":"<p><p>Virtual reality is becoming increasingly popular, and modern haptic equipment, such as vibrotactile suits, haptic gloves, and force-feedback controllers, offers new means of interaction within virtual environments, significantly enhancing user experience. When interacting with virtual objects, combined visual and haptic feedback simulates the physical sensations of grasping, lifting, or moving real objects. This sensorimotor feedback is essential for inducing a sense of presence and agency, yet it remains challenging to reproduce in the absence of reliable haptic cues. In this study, we design and evaluate several haptic metaphors using combinations of vibrotactile design parameters to simulate the lifting effort associated with light to heavy objects. These parameters include primitive signals, intensity, spatial density, propagation, and temporal density. Our contribution is threefold. First, we propose a method for modulating perceived physical effort by extending signal intensity with spatial and temporal density, which together reflect the effort required to lift an object. Second, we present a user study in which participants compared haptic effects and ranked them according to perceived lifting effort, comfort, and confidence, allowing us to assess the influence of each parameter. Third, we report the results of a second study in which participants evaluated vibrotactile effects when lifting different virtual objects. The findings confirm the importance of intensity and spatial density, as well as the influence of graphical representation on perceived effort. This research provides practical insights for designing haptic-enabled virtual reality systems and offers guidance for developers seeking to create more expressive and believable vibrotactile interactions.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unlearning Comparator: a Visual Analytics System for Comparative Evaluation of Machine Unlearning Methods. 学习比较器:一个用于机器学习方法比较评估的可视化分析系统。
IF 6.5 Pub Date : 2026-01-26 DOI: 10.1109/TVCG.2026.3658325
Jaeung Lee, Suhyeon Yu, Yurim Jang, Simon S Woo, Jaemin Jo

Machine Unlearning (MU) aims to remove target training data from a trained model so that the removed data no longer influences the model's behavior, fulfilling "right to be forgotten" obligations under data privacy laws. Yet, we observe that researchers in this rapidly emerging field face challenges in analyzing and understanding the behavior of different MU methods, especially in terms of three fundamental principles in MU: accuracy, efficiency, and privacy. Consequently, they often rely on aggregate metrics and ad-hoc evaluations, making it difficult to accurately assess the trade-offs between methods. To f ill this gap, we introduce a visual analytics system, Unlearning Comparator, designed to facilitate the systematic evaluation of MU methods. Our system supports two important tasks in the evaluation process: model comparison and attack simulation. First, it allows the user to compare the behaviors of two models, such as a model generated by a certain method and a retrained baseline, at class-, instance-, and layer-levels to better understand the changes made after unlearning. Second, our system simulates membership inference attacks (MIAs) to evaluate the privacy of a method, where an attacker attempts to determine whether specific data samples were part of the original training set. We evaluate our system through a case study visually analyzing prominent MU methods and demonstrate that it helps the user not only understand model behaviors but also gain insights that can inform the improvement of MU methods.

机器学习(MU)旨在从训练好的模型中删除目标训练数据,使被删除的数据不再影响模型的行为,从而履行数据隐私法下的“被遗忘权”义务。然而,研究人员在分析和理解不同MU方法的行为方面面临着挑战,特别是在MU的三个基本原则:准确性、效率和隐私性方面。因此,它们通常依赖于聚合度量和特别评估,这使得很难准确地评估方法之间的权衡。为了填补这一空白,我们引入了一个可视化分析系统,Unlearning Comparator,旨在促进对MU方法的系统评估。我们的系统支持评估过程中的两个重要任务:模型比较和攻击仿真。首先,它允许用户在类、实例和层级别上比较两个模型的行为,例如由某种方法生成的模型和重新训练的基线,以更好地理解在忘记后所做的更改。其次,我们的系统模拟成员推理攻击(mia)来评估方法的隐私性,攻击者试图确定特定数据样本是否属于原始训练集的一部分。我们通过一个案例研究来评估我们的系统,可视化地分析了突出的MU方法,并证明它不仅可以帮助用户理解模型行为,还可以获得可以通知MU方法改进的见解。
{"title":"Unlearning Comparator: a Visual Analytics System for Comparative Evaluation of Machine Unlearning Methods.","authors":"Jaeung Lee, Suhyeon Yu, Yurim Jang, Simon S Woo, Jaemin Jo","doi":"10.1109/TVCG.2026.3658325","DOIUrl":"https://doi.org/10.1109/TVCG.2026.3658325","url":null,"abstract":"<p><p>Machine Unlearning (MU) aims to remove target training data from a trained model so that the removed data no longer influences the model's behavior, fulfilling \"right to be forgotten\" obligations under data privacy laws. Yet, we observe that researchers in this rapidly emerging field face challenges in analyzing and understanding the behavior of different MU methods, especially in terms of three fundamental principles in MU: accuracy, efficiency, and privacy. Consequently, they often rely on aggregate metrics and ad-hoc evaluations, making it difficult to accurately assess the trade-offs between methods. To f ill this gap, we introduce a visual analytics system, Unlearning Comparator, designed to facilitate the systematic evaluation of MU methods. Our system supports two important tasks in the evaluation process: model comparison and attack simulation. First, it allows the user to compare the behaviors of two models, such as a model generated by a certain method and a retrained baseline, at class-, instance-, and layer-levels to better understand the changes made after unlearning. Second, our system simulates membership inference attacks (MIAs) to evaluate the privacy of a method, where an attacker attempts to determine whether specific data samples were part of the original training set. We evaluate our system through a case study visually analyzing prominent MU methods and demonstrate that it helps the user not only understand model behaviors but also gain insights that can inform the improvement of MU methods.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5,"publicationDate":"2026-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146055674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Barycenters of Persistence Diagrams. 持久性图的健壮重心。
IF 6.5 Pub Date : 2026-01-23 DOI: 10.1109/TVCG.2026.3657210
Keanu Sisouk, Eloi Tanguy, Julie Delon, Julien Tierny

This short paper presents a general approach for computing robust Wasserstein barycenters [2], [80], [81] of persistence diagrams. The classical method consists in computing assignment arithmetic means after finding the optimal transport plans between the barycenter and the persistence diagrams. However, this procedure only works for the transportation cost related to the $q$-Wasserstein distance $W_{q}$ when $q=2$. We adapt an alternative fixed-point method [76] to compute a barycenter diagram for generic transportation costs ($q gt 1$), in particular those robust to outliers , $q in (1,2)$. We show the utility of our work in two applications : (i) the clustering of persistence diagrams on their metric space and (ii) the dictionary encoding of persistence diagrams [73]. In both scenarios, we demonstrate the added robustness to outliers provided by our generalized framework. Our Python implementation is available at this address: https://github.com/Keanu-Sisouk/RobustBarycenter.

这篇短文提出了一种计算持久图鲁棒Wasserstein重心[2],[80],[81]的通用方法。经典的方法是在找到重心与持久图之间的最优传输方案后,计算分配算法均值。然而,当$q=2$时,此过程仅适用于与$q$-Wasserstein距离$W_{q}$有关的运输成本。我们采用了另一种不动点方法[76]来计算一般运输成本($q gt 1$)的重心图,特别是那些对异常值具有鲁棒性的运输成本($q in(1,2)$)。我们在两个应用中展示了我们的工作的实用性:(i)持久性图在其度量空间上的聚类;(ii)持久性图的字典编码[73]。在这两种情况下,我们都演示了广义框架提供的对异常值的鲁棒性。我们的Python实现可通过以下地址获得:https://github.com/Keanu-Sisouk/RobustBarycenter。
{"title":"Robust Barycenters of Persistence Diagrams.","authors":"Keanu Sisouk, Eloi Tanguy, Julie Delon, Julien Tierny","doi":"10.1109/TVCG.2026.3657210","DOIUrl":"https://doi.org/10.1109/TVCG.2026.3657210","url":null,"abstract":"<p><p>This short paper presents a general approach for computing robust Wasserstein barycenters [2], [80], [81] of persistence diagrams. The classical method consists in computing assignment arithmetic means after finding the optimal transport plans between the barycenter and the persistence diagrams. However, this procedure only works for the transportation cost related to the $q$-Wasserstein distance $W_{q}$ when $q=2$. We adapt an alternative fixed-point method [76] to compute a barycenter diagram for generic transportation costs ($q gt 1$), in particular those robust to outliers , $q in (1,2)$. We show the utility of our work in two applications : (i) the clustering of persistence diagrams on their metric space and (ii) the dictionary encoding of persistence diagrams [73]. In both scenarios, we demonstrate the added robustness to outliers provided by our generalized framework. Our Python implementation is available at this address: https://github.com/Keanu-Sisouk/RobustBarycenter.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5,"publicationDate":"2026-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146042357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on visualization and computer graphics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1