Pub Date : 2026-02-02DOI: 10.1109/TVCG.2026.3659931
Xiang Xu, Huiyu Li, Linwei Fan, Lu Wang
Data-parallel ray tracing is a crucial technique for rendering large-scale scenes that exceed the memory capacity of a single compute node. It partitions scene data across multiple nodes and accesses remote data through inter-node communication. However, the resulting communication overhead remains a significant bottleneck for practical performance. Existing approaches mitigate this bottleneck by enhancing data locality through dynamic scheduling during rendering, typically employing spatial partitioning to enable access prediction. Although effective in some scenarios, these methods incur significant redundancy in base geometry when applied to large-scale instanced scenes. In this paper, we introduce the first object-space-based dynamic scheduling algorithm, which uses object groups as the scheduling units to eliminate redundant storage of base data in instanced scenes. Additionally, we propose two data access frequency prediction methods to guide asynchronous data prefetching, enhancing rendering efficiency. Compared to the state-of-the-art method, our approach achieves an average rendering speedup of 77.6%, with a maximum improvement of up to 146.1%, while incurring only a 5% increase in scene memory consumption.
{"title":"Dynamic Scheduling for Data-Parallel Path Tracing of Large-Scale Instanced Scenes.","authors":"Xiang Xu, Huiyu Li, Linwei Fan, Lu Wang","doi":"10.1109/TVCG.2026.3659931","DOIUrl":"https://doi.org/10.1109/TVCG.2026.3659931","url":null,"abstract":"<p><p>Data-parallel ray tracing is a crucial technique for rendering large-scale scenes that exceed the memory capacity of a single compute node. It partitions scene data across multiple nodes and accesses remote data through inter-node communication. However, the resulting communication overhead remains a significant bottleneck for practical performance. Existing approaches mitigate this bottleneck by enhancing data locality through dynamic scheduling during rendering, typically employing spatial partitioning to enable access prediction. Although effective in some scenarios, these methods incur significant redundancy in base geometry when applied to large-scale instanced scenes. In this paper, we introduce the first object-space-based dynamic scheduling algorithm, which uses object groups as the scheduling units to eliminate redundant storage of base data in instanced scenes. Additionally, we propose two data access frequency prediction methods to guide asynchronous data prefetching, enhancing rendering efficiency. Compared to the state-of-the-art method, our approach achieves an average rendering speedup of 77.6%, with a maximum improvement of up to 146.1%, while incurring only a 5% increase in scene memory consumption.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":6.5,"publicationDate":"2026-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146109230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3624569
Huawei Tu, BoYu Gao, Yujun Lu, Weiqiang Xin, Hui Cui, Weiqi Luo, Jian Weng, Henry Been-Lirn Duh
This study explores the design space of two-handed input (i.e., clicking or tapping with the thumb) on the touchpads of controllers for virtual reality (VR) interaction. Four experiments were conducted to fulfill this purpose. Experiment 1 investigated how users employed two VR controllers to perform four representative interaction tasks in VR and identified 14 potentially usable two-handed operations that involved tapping or clicking. Experiments 2 and 3 analyzed user performance of the 14 operations, providing insights into their interaction characteristics in terms of completion time, accuracy, and subjective feedback. In Experiment 4, we designed a command-input technique based on the proposed operations. We verified its effectiveness compared to context menus and marking menus in a VR text entry scenario. Our technique generally had shorter times and similar accuracy to the two menu types. Our work contributes to the design of VR interactions using two-handed controllers.
{"title":"Two-Handed Click and Tap: Expanding Input Vocabulary of Controllers for Virtual Reality Interaction.","authors":"Huawei Tu, BoYu Gao, Yujun Lu, Weiqiang Xin, Hui Cui, Weiqi Luo, Jian Weng, Henry Been-Lirn Duh","doi":"10.1109/TVCG.2025.3624569","DOIUrl":"10.1109/TVCG.2025.3624569","url":null,"abstract":"<p><p>This study explores the design space of two-handed input (i.e., clicking or tapping with the thumb) on the touchpads of controllers for virtual reality (VR) interaction. Four experiments were conducted to fulfill this purpose. Experiment 1 investigated how users employed two VR controllers to perform four representative interaction tasks in VR and identified 14 potentially usable two-handed operations that involved tapping or clicking. Experiments 2 and 3 analyzed user performance of the 14 operations, providing insights into their interaction characteristics in terms of completion time, accuracy, and subjective feedback. In Experiment 4, we designed a command-input technique based on the proposed operations. We verified its effectiveness compared to context menus and marking menus in a VR text entry scenario. Our technique generally had shorter times and similar accuracy to the two menu types. Our work contributes to the design of VR interactions using two-handed controllers.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1682-1697"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145357317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3636949
Haiming Zhu, Yangyang Xu, Jun Yu, Shengfeng He
With the revolution of generative AI, video-related tasks have been widely studied. However, current state-of-the-art video models still lag behind image models in visual quality and user control over generated content. In this paper, we introduce TokenWarping, a novel framework for temporally coherent video translation. Existing diffusion-based video editing approaches rely solely on key and value patches in self-attention to ensure temporal consistency, often sacrificing the preservation of local and structural regions. Critically, these methods overlook the significance of the query patches in achieving accurate feature aggregation and temporal coherence. In contrast, TokenWarping leverages complementary token priors by constructing temporal correlations across different frames. Our method begins by extracting optical flows from source videos. During the denoising process of the diffusion model, these optical flows are used to warp the previous frame's query, key, and value patches, aligning them with the current frame's patches. By directly warping the query patches, we enhance feature aggregation in self-attention, while warping the key and value patches ensures temporal consistency across frames. This token warping imposes explicit constraints on the self-attention layer outputs, effectively ensuring temporally coherent translation. Our framework does not require any additional training or fine-tuning and can be seamlessly integrated with existing text-to-image editing methods. We conduct extensive experiments on various video translation tasks, demonstrating that TokenWarping surpasses state-of-the-art methods both qualitatively and quantitatively. Video demonstrations are available in supplementary materials.
{"title":"Zero-Shot Video Translation via Token Warping.","authors":"Haiming Zhu, Yangyang Xu, Jun Yu, Shengfeng He","doi":"10.1109/TVCG.2025.3636949","DOIUrl":"10.1109/TVCG.2025.3636949","url":null,"abstract":"<p><p>With the revolution of generative AI, video-related tasks have been widely studied. However, current state-of-the-art video models still lag behind image models in visual quality and user control over generated content. In this paper, we introduce TokenWarping, a novel framework for temporally coherent video translation. Existing diffusion-based video editing approaches rely solely on key and value patches in self-attention to ensure temporal consistency, often sacrificing the preservation of local and structural regions. Critically, these methods overlook the significance of the query patches in achieving accurate feature aggregation and temporal coherence. In contrast, TokenWarping leverages complementary token priors by constructing temporal correlations across different frames. Our method begins by extracting optical flows from source videos. During the denoising process of the diffusion model, these optical flows are used to warp the previous frame's query, key, and value patches, aligning them with the current frame's patches. By directly warping the query patches, we enhance feature aggregation in self-attention, while warping the key and value patches ensures temporal consistency across frames. This token warping imposes explicit constraints on the self-attention layer outputs, effectively ensuring temporally coherent translation. Our framework does not require any additional training or fine-tuning and can be seamlessly integrated with existing text-to-image editing methods. We conduct extensive experiments on various video translation tasks, demonstrating that TokenWarping surpasses state-of-the-art methods both qualitatively and quantitatively. Video demonstrations are available in supplementary materials.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1582-1592"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3638450
Qing Li, Huifang Feng, Kanle Shi, Yue Gao, Yi Fang, Yu-Shen Liu, Zhizhong Han
Estimating the normal of a point requires constructing a local patch to provide center-surrounding context, but determining the appropriate neighborhood size is difficult when dealing with different data or geometries. Existing methods commonly employ various parameter-heavy strategies to extract a full feature description from the input patch. However, they still have difficulties in accurately and efficiently predicting normals for various point clouds. In this work, we present a new idea of feature extraction for robust normal estimation of point clouds. We use the fusion of multi-scale features from different neighborhood sizes to address the issue of selecting reasonable patch sizes for various data or geometries. We seek to model a patch feature fitting (PFF) based on multi-scale features to approximate the optimal geometric description for normal estimation and implement the approximation process via multi-scale feature aggregation and cross-scale feature compensation. The feature aggregation module progressively aggregates the patch features of different scales to the center of the patch and shrinks the patch size by removing points far from the center. It not only enables the network to precisely capture the structure characteristic in a wide range, but also describes highly detailed geometries. The feature compensation module ensures the reusability of features from earlier layers of large scales and reveals associated information in different patch sizes. Our approximation strategy based on aggregating the features of multiple scales enables the model to achieve scale adaptation of varying local patches and deliver the optimal feature description. Extensive experiments demonstrate that our method achieves state-of-the-art performance on both synthetic and real-world datasets with fewer network parameters and running time.
{"title":"PFF-Net: Patch Feature Fitting for Point Cloud Normal Estimation.","authors":"Qing Li, Huifang Feng, Kanle Shi, Yue Gao, Yi Fang, Yu-Shen Liu, Zhizhong Han","doi":"10.1109/TVCG.2025.3638450","DOIUrl":"10.1109/TVCG.2025.3638450","url":null,"abstract":"<p><p>Estimating the normal of a point requires constructing a local patch to provide center-surrounding context, but determining the appropriate neighborhood size is difficult when dealing with different data or geometries. Existing methods commonly employ various parameter-heavy strategies to extract a full feature description from the input patch. However, they still have difficulties in accurately and efficiently predicting normals for various point clouds. In this work, we present a new idea of feature extraction for robust normal estimation of point clouds. We use the fusion of multi-scale features from different neighborhood sizes to address the issue of selecting reasonable patch sizes for various data or geometries. We seek to model a patch feature fitting (PFF) based on multi-scale features to approximate the optimal geometric description for normal estimation and implement the approximation process via multi-scale feature aggregation and cross-scale feature compensation. The feature aggregation module progressively aggregates the patch features of different scales to the center of the patch and shrinks the patch size by removing points far from the center. It not only enables the network to precisely capture the structure characteristic in a wide range, but also describes highly detailed geometries. The feature compensation module ensures the reusability of features from earlier layers of large scales and reveals associated information in different patch sizes. Our approximation strategy based on aggregating the features of multiple scales enables the model to achieve scale adaptation of varying local patches and deliver the optimal feature description. Extensive experiments demonstrate that our method achieves state-of-the-art performance on both synthetic and real-world datasets with fewer network parameters and running time.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1713-1728"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145644183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3624800
Haoran Mo, Yulin Shen, Edgar Simo-Serra, Zeyu Wang
Creating high-quality line art in a fast and controlled manner plays a crucial role in anime production and concept design. We present DoodleAssist, an interactive and progressive line art generation system controlled by sketches and prompts, which helps both experts and novices concretize their design intentions or explore possibilities. Built upon a controllable diffusion model, our system performs progressive generation based on the last generated line art, synthesizing regions corresponding to drawn or modified strokes while keeping the remaining ones unchanged. To facilitate this process, we propose a latent distribution alignment mechanism to enhance the transition between the two regions and allow seamless blending, thereby alleviating issues of region incoherence and line discontinuity. Finally, we also build a user interface that allows the convenient creation of line art through interactive sketching and prompts. Qualitative and quantitative comparisons against existing approaches and an in-depth user study demonstrate the effectiveness and usability of our system. Our system can benefit various applications such as anime concept design, drawing assistant, and creativity support for children.
{"title":"DoodleAssist: Progressive Interactive Line Art Generation With Latent Distribution Alignment.","authors":"Haoran Mo, Yulin Shen, Edgar Simo-Serra, Zeyu Wang","doi":"10.1109/TVCG.2025.3624800","DOIUrl":"10.1109/TVCG.2025.3624800","url":null,"abstract":"<p><p>Creating high-quality line art in a fast and controlled manner plays a crucial role in anime production and concept design. We present DoodleAssist, an interactive and progressive line art generation system controlled by sketches and prompts, which helps both experts and novices concretize their design intentions or explore possibilities. Built upon a controllable diffusion model, our system performs progressive generation based on the last generated line art, synthesizing regions corresponding to drawn or modified strokes while keeping the remaining ones unchanged. To facilitate this process, we propose a latent distribution alignment mechanism to enhance the transition between the two regions and allow seamless blending, thereby alleviating issues of region incoherence and line discontinuity. Finally, we also build a user interface that allows the convenient creation of line art through interactive sketching and prompts. Qualitative and quantitative comparisons against existing approaches and an in-depth user study demonstrate the effectiveness and usability of our system. Our system can benefit various applications such as anime concept design, drawing assistant, and creativity support for children.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"2087-2098"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145357315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3631659
Mingliang Xue, Yifan Wang, Zhi Wang, Lifeng Zhu, Lizhen Cui, Yueguo Chen, Zhiyu Ding, Oliver Deussen, Yunhai Wang
Traditional force-based graph layout models are rooted in virtual physics, while criteria-driven techniques position nodes by directly optimizing graph readability criteria. In this article, we systematically explore the integration of these two approaches, introducing criteria-driven force-based graph layout techniques. We propose a general framework that, based on user-specified readability criteria, such as minimizing edge crossings, automatically constructs a force-based model tailored to generate layouts for a given graph. Models derived from highly similar graphs can be reused to create initial layouts, users can further refine layouts by imposing different criteria on subgraphs. We perform quantitative comparisons between our layout methods and existing techniques across various graphs and present a case study on graph exploration. Our results indicate that our framework generates superior layouts compared to existing techniques and exhibits better generalization capabilities than deep learning-based methods.
{"title":"AutoFDP: Automatic Force-Based Model Selection for Multicriteria Graph Drawing.","authors":"Mingliang Xue, Yifan Wang, Zhi Wang, Lifeng Zhu, Lizhen Cui, Yueguo Chen, Zhiyu Ding, Oliver Deussen, Yunhai Wang","doi":"10.1109/TVCG.2025.3631659","DOIUrl":"10.1109/TVCG.2025.3631659","url":null,"abstract":"<p><p>Traditional force-based graph layout models are rooted in virtual physics, while criteria-driven techniques position nodes by directly optimizing graph readability criteria. In this article, we systematically explore the integration of these two approaches, introducing criteria-driven force-based graph layout techniques. We propose a general framework that, based on user-specified readability criteria, such as minimizing edge crossings, automatically constructs a force-based model tailored to generate layouts for a given graph. Models derived from highly similar graphs can be reused to create initial layouts, users can further refine layouts by imposing different criteria on subgraphs. We perform quantitative comparisons between our layout methods and existing techniques across various graphs and present a case study on graph exploration. Our results indicate that our framework generates superior layouts compared to existing techniques and exhibits better generalization capabilities than deep learning-based methods.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1554-1568"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145508602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3633081
Sizhuo Zhou, Yuou Sun, Bailin Deng, Juyong Zhang
Designing freeform surfaces to control light based on real-world illumination patterns is challenging, as existing caustic lens designs often assume oversimplified point or parallel light sources. We propose representing surface light sources using an optimized set of point sources, whose parameters are fitted to the real light source's illumination using a novel differentiable rendering framework. Our physically-based rendering approach simulates light transmission using flux, without requiring prior knowledge of the light source's intensity distribution. To efficiently explore the light source parameter space during optimization, we apply a contraction mapping that converts the constrained problem into an unconstrained one. Using the optimized light source model, we then design the freeform lens shape considering flux consistency and normal integrability. Simulations and physical experiments show our method more accurately represents real surface light sources compared to point-source approximations, yielding caustic lenses that produce images closely matching the target light distributions.
{"title":"Computational Caustic Design for Surface Light Source.","authors":"Sizhuo Zhou, Yuou Sun, Bailin Deng, Juyong Zhang","doi":"10.1109/TVCG.2025.3633081","DOIUrl":"10.1109/TVCG.2025.3633081","url":null,"abstract":"<p><p>Designing freeform surfaces to control light based on real-world illumination patterns is challenging, as existing caustic lens designs often assume oversimplified point or parallel light sources. We propose representing surface light sources using an optimized set of point sources, whose parameters are fitted to the real light source's illumination using a novel differentiable rendering framework. Our physically-based rendering approach simulates light transmission using flux, without requiring prior knowledge of the light source's intensity distribution. To efficiently explore the light source parameter space during optimization, we apply a contraction mapping that converts the constrained problem into an unconstrained one. Using the optimized light source model, we then design the freeform lens shape considering flux consistency and normal integrability. Simulations and physical experiments show our method more accurately represents real surface light sources compared to point-source approximations, yielding caustic lenses that produce images closely matching the target light distributions.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1911-1927"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145544752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3644956
Yu-Chia Huang, Juntong Chen, Dongyu Liu, Kwan-Liu Ma
Understanding and distinguishing temporal patterns in time series data is essential for scientific discovery and decision-making. For example, in biomedical research, uncovering meaningful patterns in physiological signals can improve diagnosis, risk assessment, and patient outcomes. However, existing methods for time series pattern discovery face major challenges, including high computational complexity, limited interpretability, and difficulty in capturing meaningful temporal structures. To address these gaps, we introduce a novel learning framework that jointly trains two Transformer models using complementary time series representations: shapelet-based representations to capture localized temporal structures and traditional feature engineering to encode statistical properties. The learned shapelets serve as interpretable signatures that differentiate time series across classification labels. Additionally, we develop a visual analytics system-SigTime-with coordinated views to facilitate exploration of time series signatures from multiple perspectives, aiding in useful insights generation. We quantitatively evaluate our learning framework on eight publicly available datasets and one proprietary clinical dataset. Additionally, we demonstrate the effectiveness of our system through two usage scenarios along with the domain experts: one involving public ECG data and the other focused on preterm labor analysis.
{"title":"SigTime: Learning and Visually Explaining Time Series Signatures.","authors":"Yu-Chia Huang, Juntong Chen, Dongyu Liu, Kwan-Liu Ma","doi":"10.1109/TVCG.2025.3644956","DOIUrl":"10.1109/TVCG.2025.3644956","url":null,"abstract":"<p><p>Understanding and distinguishing temporal patterns in time series data is essential for scientific discovery and decision-making. For example, in biomedical research, uncovering meaningful patterns in physiological signals can improve diagnosis, risk assessment, and patient outcomes. However, existing methods for time series pattern discovery face major challenges, including high computational complexity, limited interpretability, and difficulty in capturing meaningful temporal structures. To address these gaps, we introduce a novel learning framework that jointly trains two Transformer models using complementary time series representations: shapelet-based representations to capture localized temporal structures and traditional feature engineering to encode statistical properties. The learned shapelets serve as interpretable signatures that differentiate time series across classification labels. Additionally, we develop a visual analytics system-SigTime-with coordinated views to facilitate exploration of time series signatures from multiple perspectives, aiding in useful insights generation. We quantitatively evaluate our learning framework on eight publicly available datasets and one proprietary clinical dataset. Additionally, we demonstrate the effectiveness of our system through two usage scenarios along with the domain experts: one involving public ECG data and the other focused on preterm labor analysis.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"2099-2113"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Human-human interaction generation has garnered significant attention in motion synthesis due to its vital role in understanding humans as social beings. However, existing methods typically rely on transformer-based architectures, which often face challenges related to scalability and efficiency. To address these challenges, we propose InterMamba, a novel and efficient human-human interaction generation method built on the Mamba framework, designed to capture long-sequence dependencies effectively while enabling real-time feedback. Specifically, we introduce an adaptive spatio-temporal Mamba framework that utilizes two parallel SSM branches with an adaptive mechanism to integrate the spatial and temporal features of motion sequences. To further enhance the model's ability to capture dependencies within individual motion sequences and the interactions between different individual sequences, we develop two key modules: the self adaptive spatio-temporal Mamba module and the cross adaptive spatio-temporal Mamba module, enabling efficient feature learning. Extensive experiments demonstrate that our method achieves the state-of-the-art results on both two interaction datasets with remarkable quality and efficiency. Compared to the baseline method InterGen, our approach not only improves accuracy but also reduces the parameter size to just 66 M (36% of InterGen's), while achieving an average inference speed of 0.57 seconds, which is 46% of InterGen's execution time.
在运动合成中,人-人互动生成由于其在理解人类作为社会生物方面的重要作用而引起了极大的关注。然而,现有的方法通常依赖于基于转换器的体系结构,这通常面临着与可伸缩性和效率相关的挑战。为了应对这些挑战,我们提出了InterMamba,这是一种基于Mamba框架的新型高效人机交互生成方法,旨在有效捕获长序列依赖关系,同时实现实时反馈。具体来说,我们引入了一个自适应时空曼巴框架,该框架利用两个平行的SSM分支和自适应机制来整合运动序列的空间和时间特征。为了进一步增强模型捕获单个运动序列中的依赖关系和不同单个序列之间相互作用的能力,我们开发了两个关键模块:自适应时空曼巴模块和交叉适应时空曼巴模块,实现了高效的特征学习。大量的实验表明,我们的方法在两个交互数据集上都达到了最先进的结果,并且具有显著的质量和效率。与基准方法InterGen相比,我们的方法不仅提高了精度,而且将参数大小减少到仅66 M (InterGen的36%),同时实现了0.57秒的平均推理速度,这是InterGen执行时间的46%。
{"title":"InterMamba: Efficient Human-Human Interaction Generation With Adaptive Spatio-Temporal Mamba.","authors":"Zizhao Wu, Yingying Sun, Yiming Chen, Xiaoling Gu, Ruyu Liu, Jiazhou Chen","doi":"10.1109/TVCG.2025.3635116","DOIUrl":"10.1109/TVCG.2025.3635116","url":null,"abstract":"<p><p>Human-human interaction generation has garnered significant attention in motion synthesis due to its vital role in understanding humans as social beings. However, existing methods typically rely on transformer-based architectures, which often face challenges related to scalability and efficiency. To address these challenges, we propose InterMamba, a novel and efficient human-human interaction generation method built on the Mamba framework, designed to capture long-sequence dependencies effectively while enabling real-time feedback. Specifically, we introduce an adaptive spatio-temporal Mamba framework that utilizes two parallel SSM branches with an adaptive mechanism to integrate the spatial and temporal features of motion sequences. To further enhance the model's ability to capture dependencies within individual motion sequences and the interactions between different individual sequences, we develop two key modules: the self adaptive spatio-temporal Mamba module and the cross adaptive spatio-temporal Mamba module, enabling efficient feature learning. Extensive experiments demonstrate that our method achieves the state-of-the-art results on both two interaction datasets with remarkable quality and efficiency. Compared to the baseline method InterGen, our approach not only improves accuracy but also reduces the parameter size to just 66 M (36% of InterGen's), while achieving an average inference speed of 0.57 seconds, which is 46% of InterGen's execution time.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1928-1940"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145574686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3622157
Ruihan Yu, Tianyu Huang, Jingwang Ling, Feng Xu
2D Gaussian Splatting has recently emerged as a significant method in 3D reconstruction, enabling novel view synthesis and geometry reconstruction simultaneously. While the well-known Gaussian kernel is broadly used, its lack of anisotropy and deformation ability leads to dim and vague edges at object silhouettes, limiting the reconstruction quality of current Gaussian splatting methods. To enhance the representation power, we draw inspiration from quantum physics and propose to use the Gaussian-Hermite kernel as the new primitive in Gaussian splatting. The new kernel takes a unified mathematical form and extends the Gaussian function, which serves as the zero-rank special case in the updated general formulation. Our experiments demonstrate that the proposed Gaussian-Hermite kernel achieves improved performance over traditional Gaussian Splatting kernels on both geometry reconstruction and novel-view synthesis tasks. Specifically, on the DTU dataset, our method yields more accurate geometry reconstruction, while on datasets such as MipNeRF360 and our customized Detail dataset, it achieves better results in novel-view synthesis. These results highlight the potential of the Gaussian-Hermite kernel for high-quality 3D reconstruction and rendering.
{"title":"2DGH: 2D Gaussian-Hermite Splatting for High-Quality Rendering and Better Geometry Features.","authors":"Ruihan Yu, Tianyu Huang, Jingwang Ling, Feng Xu","doi":"10.1109/TVCG.2025.3622157","DOIUrl":"10.1109/TVCG.2025.3622157","url":null,"abstract":"<p><p>2D Gaussian Splatting has recently emerged as a significant method in 3D reconstruction, enabling novel view synthesis and geometry reconstruction simultaneously. While the well-known Gaussian kernel is broadly used, its lack of anisotropy and deformation ability leads to dim and vague edges at object silhouettes, limiting the reconstruction quality of current Gaussian splatting methods. To enhance the representation power, we draw inspiration from quantum physics and propose to use the Gaussian-Hermite kernel as the new primitive in Gaussian splatting. The new kernel takes a unified mathematical form and extends the Gaussian function, which serves as the zero-rank special case in the updated general formulation. Our experiments demonstrate that the proposed Gaussian-Hermite kernel achieves improved performance over traditional Gaussian Splatting kernels on both geometry reconstruction and novel-view synthesis tasks. Specifically, on the DTU dataset, our method yields more accurate geometry reconstruction, while on datasets such as MipNeRF360 and our customized Detail dataset, it achieves better results in novel-view synthesis. These results highlight the potential of the Gaussian-Hermite kernel for high-quality 3D reconstruction and rendering.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1513-1524"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145305095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}