Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3620400
Yatian Wang, Haoran Mo, Chengying Gao
To address the issue of style expression in existing text-driven human motion synthesis methods, we propose DiFusion, a framework for diversely stylized motion generation. It offers flexible control of content through texts and style via multiple modalities, i.e., textual labels or motion sequences. Our approach employs a dual-condition motion latent diffusion model, enabling independent control of content and style through flexible input modalities. To tackle the issue of imbalanced complexity between the text-motion and style-motion datasets, we propose the Digest-and-Fusion training scheme, which digests domain-specific knowledge from both datasets and then adaptively fuses them into a compatible manner. Comprehensive evaluations demonstrate the effectiveness of our method and its superiority over existing approaches in terms of content alignment, style expressiveness, realism, and diversity. Additionally, our approach can be extended to practical applications, such as motion style interpolation.
{"title":"DiFusion: Flexible Stylized Motion Generation Using Digest-and-Fusion Scheme.","authors":"Yatian Wang, Haoran Mo, Chengying Gao","doi":"10.1109/TVCG.2025.3620400","DOIUrl":"10.1109/TVCG.2025.3620400","url":null,"abstract":"<p><p>To address the issue of style expression in existing text-driven human motion synthesis methods, we propose DiFusion, a framework for diversely stylized motion generation. It offers flexible control of content through texts and style via multiple modalities, i.e., textual labels or motion sequences. Our approach employs a dual-condition motion latent diffusion model, enabling independent control of content and style through flexible input modalities. To tackle the issue of imbalanced complexity between the text-motion and style-motion datasets, we propose the Digest-and-Fusion training scheme, which digests domain-specific knowledge from both datasets and then adaptively fuses them into a compatible manner. Comprehensive evaluations demonstrate the effectiveness of our method and its superiority over existing approaches in terms of content alignment, style expressiveness, realism, and diversity. Additionally, our approach can be extended to practical applications, such as motion style interpolation.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1593-1604"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145287981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3617147
Junlong Chen, Rosella P Galindo Esparza, Vanja Garaj, Per Ola Kristensson, John Dudley
Effective visual accessibility in Virtual Reality (VR) is crucial for Blind and Low Vision (BLV) users. However, designing visual accessibility systems is challenging due to the complexity of 3D VR environments and the need for techniques that can be easily retrofitted into existing applications. While prior work has studied how to enhance or translate visual information, the advancement of Vision Language Models (VLMs) provides an exciting opportunity to advance the scene interpretation capability of current systems. This paper presents EnVisionVR, an accessibility tool for VR scene interpretation. Through a formative study of usability barriers, we confirmed the lack of visual accessibility features as a key barrier for BLV users of VR content and applications. In response, we used our findings from the formative study to inform the design and development of EnVisionVR, a novel visual accessibility system leveraging a VLM, voice input and multimodal feedback for scene interpretation and virtual object interaction in VR. An evaluation with 12 BLV users demonstrated that EnVisionVR significantly improved their ability to locate virtual objects, effectively supporting scene understanding and object interaction.
{"title":"EnVisionVR: A Scene Interpretation Tool for Visual Accessibility in Virtual Reality.","authors":"Junlong Chen, Rosella P Galindo Esparza, Vanja Garaj, Per Ola Kristensson, John Dudley","doi":"10.1109/TVCG.2025.3617147","DOIUrl":"10.1109/TVCG.2025.3617147","url":null,"abstract":"<p><p>Effective visual accessibility in Virtual Reality (VR) is crucial for Blind and Low Vision (BLV) users. However, designing visual accessibility systems is challenging due to the complexity of 3D VR environments and the need for techniques that can be easily retrofitted into existing applications. While prior work has studied how to enhance or translate visual information, the advancement of Vision Language Models (VLMs) provides an exciting opportunity to advance the scene interpretation capability of current systems. This paper presents EnVisionVR, an accessibility tool for VR scene interpretation. Through a formative study of usability barriers, we confirmed the lack of visual accessibility features as a key barrier for BLV users of VR content and applications. In response, we used our findings from the formative study to inform the design and development of EnVisionVR, a novel visual accessibility system leveraging a VLM, voice input and multimodal feedback for scene interpretation and virtual object interaction in VR. An evaluation with 12 BLV users demonstrated that EnVisionVR significantly improved their ability to locate virtual objects, effectively supporting scene understanding and object interaction.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"2007-2019"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145240653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3621585
Yilei Chen, Ping An, Xinpeng Huang, Qiang Wu
Generalizable NeRF synthesizes novel views of unseen scenes without per-scene training. The view-epipolar transformer has become popular in this field for its ability to produce high-quality views. Existing methods with this architecture rely on the assumption that texture consistency across views can identify object surfaces, with such identification crucial for determining where to reconstruct texture. However, this assumption is not always valid, as different surface positions may share similar texture features, creating ambiguity in surface identification. To handle this ambiguity, this paper introduces 3D volume features into the view-epipolar transformer. These features contain geometric information, which will be a supplement to texture features. By incorporating both texture and geometric cues in consistency measurement, our method mitigates the ambiguity in surface detection. This leads to more accurate surfaces and thus better novel view synthesis. Additionally, we propose a decoupled decoder where volume and texture features are used for density and color prediction respectively. In this way, the two properties can be better predicted without mutual interference. Experiments show improved results over existing transformer-based methods on both real-world and synthetic datasets.
{"title":"Volume Feature Aware View-Epipolar Transformers for Generalizable NeRF.","authors":"Yilei Chen, Ping An, Xinpeng Huang, Qiang Wu","doi":"10.1109/TVCG.2025.3621585","DOIUrl":"10.1109/TVCG.2025.3621585","url":null,"abstract":"<p><p>Generalizable NeRF synthesizes novel views of unseen scenes without per-scene training. The view-epipolar transformer has become popular in this field for its ability to produce high-quality views. Existing methods with this architecture rely on the assumption that texture consistency across views can identify object surfaces, with such identification crucial for determining where to reconstruct texture. However, this assumption is not always valid, as different surface positions may share similar texture features, creating ambiguity in surface identification. To handle this ambiguity, this paper introduces 3D volume features into the view-epipolar transformer. These features contain geometric information, which will be a supplement to texture features. By incorporating both texture and geometric cues in consistency measurement, our method mitigates the ambiguity in surface detection. This leads to more accurate surfaces and thus better novel view synthesis. Additionally, we propose a decoupled decoder where volume and texture features are used for density and color prediction respectively. In this way, the two properties can be better predicted without mutual interference. Experiments show improved results over existing transformer-based methods on both real-world and synthetic datasets.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"2049-2060"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145294710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3642300
Shaoxu Li, Chuhang Ma, Ye Pan
Zero-shot text-to-video diffusion models are crafted to expand pre-trained image diffusion models to the video domain without additional training. In recent times, prevailing techniques commonly rely on existing shapes as constraints and introduce inter-frame attention to ensure texture consistency. However, such shape constraints tend to restrict the stylized geometric deformation of videos and inadvertently neglect the original texture characteristics. Furthermore, existing methods suffer from flickering and inconsistent facial expressions. In this paper, we present DiffPortraitVideo. The framework employs a diffusion model-based feature and attention injection mechanism to generate key frames, with cross-frame constraints to enforce coherence and adaptive feature fusion to ensure expression consistency. Our approach achieves high spatio-temporal and expression consistency while retaining the textual and original image properties. Extensive and comprehensive experiments are conducted to validate the efficacy of our proposed framework in generating personalized, high-quality, and coherent videos. This not only showcases the superiority of our method over existing approaches but also paves the way for further research and development in the field of text-to-video generation with enhanced personalization and quality.
{"title":"DiffPortraitVideo: Diffusion-Based Expression-Consistent Zero-Shot Portrait Video Translation.","authors":"Shaoxu Li, Chuhang Ma, Ye Pan","doi":"10.1109/TVCG.2025.3642300","DOIUrl":"10.1109/TVCG.2025.3642300","url":null,"abstract":"<p><p>Zero-shot text-to-video diffusion models are crafted to expand pre-trained image diffusion models to the video domain without additional training. In recent times, prevailing techniques commonly rely on existing shapes as constraints and introduce inter-frame attention to ensure texture consistency. However, such shape constraints tend to restrict the stylized geometric deformation of videos and inadvertently neglect the original texture characteristics. Furthermore, existing methods suffer from flickering and inconsistent facial expressions. In this paper, we present DiffPortraitVideo. The framework employs a diffusion model-based feature and attention injection mechanism to generate key frames, with cross-frame constraints to enforce coherence and adaptive feature fusion to ensure expression consistency. Our approach achieves high spatio-temporal and expression consistency while retaining the textual and original image properties. Extensive and comprehensive experiments are conducted to validate the efficacy of our proposed framework in generating personalized, high-quality, and coherent videos. This not only showcases the superiority of our method over existing approaches but also paves the way for further research and development in the field of text-to-video generation with enhanced personalization and quality.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1656-1667"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145727816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3626128
Rifat Ara Proma, Michael Correll, Ghulam Jilani Quadri, Paul Rosen
Line charts surface many features in time series data, from trends to periodicity to peaks & valleys. However, not every potentially important feature in the data may correspond to a visual feature that readers can detect or prioritize. In this study, we conducted a visual stenography task, where participants re-drew line charts to solicit information about the visual features they believed to be important. We systematically varied noise levels (SNR $approx$≈ 5-30 dB) across line charts to observe how visual clutter influences which features people prioritize in their sketches. We identified three key strategies that correlated with the noise present in the stimuli: the $color{green}{textit{Replicator}}$greenReplicator attempted to retain all major features of the line chart including noise; the $color{yellow}{textit{Trend Keeper}}$yellowTrendKeeper prioritized trends disregarding periodicity and peaks; and the $color{pink}{textit{De-noiser}}$pinkDe-noiser filtered out noise while preserving other features. Further, we found that participants tended to faithfully retain trends and peaks & valleys when these features were present, whereas periodicity and noise were represented in more qualitative or gestural ways: semantically rather than accurately. These results suggest a need to consider more flexible and human-centric ways of presenting, summarizing, preprocessing, or clustering time series data.
{"title":"Visual Stenography: Feature Recreation and Preservation in Sketches of Noisy Line Charts.","authors":"Rifat Ara Proma, Michael Correll, Ghulam Jilani Quadri, Paul Rosen","doi":"10.1109/TVCG.2025.3626128","DOIUrl":"10.1109/TVCG.2025.3626128","url":null,"abstract":"<p><p>Line charts surface many features in time series data, from trends to periodicity to peaks & valleys. However, not every potentially important feature in the data may correspond to a visual feature that readers can detect or prioritize. In this study, we conducted a visual stenography task, where participants re-drew line charts to solicit information about the visual features they believed to be important. We systematically varied noise levels (SNR $approx$≈ 5-30 dB) across line charts to observe how visual clutter influences which features people prioritize in their sketches. We identified three key strategies that correlated with the noise present in the stimuli: the $color{green}{textit{Replicator}}$greenReplicator attempted to retain all major features of the line chart including noise; the $color{yellow}{textit{Trend Keeper}}$yellowTrendKeeper prioritized trends disregarding periodicity and peaks; and the $color{pink}{textit{De-noiser}}$pinkDe-noiser filtered out noise while preserving other features. Further, we found that participants tended to faithfully retain trends and peaks & valleys when these features were present, whereas periodicity and noise were represented in more qualitative or gestural ways: semantically rather than accurately. These results suggest a need to consider more flexible and human-centric ways of presenting, summarizing, preprocessing, or clustering time series data.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1879-1894"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145380514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3617961
Mingrui Li, Shuhong Liu, Tianchen Deng, Hongyu Wang
Gaussian SLAM systems excel in real-time rendering and fine-grained reconstruction compared to NeRF-based systems. However, their reliance on extensive keyframes is impractical for deployment in real-world robotic systems, which typically operate under sparse-view conditions that can result in substantial holes in the map. To address these challenges, we introduce DenseSplat, the first SLAM system that effectively combines the advantages of NeRF and 3DGS. DenseSplat utilizes sparse keyframes and NeRF priors for initializing primitives that densely populate maps and seamlessly fill gaps. It also implements geometry-aware primitive sampling and pruning strategies to manage granularity and enhance rendering efficiency. Moreover, DenseSplat integrates loop closure and bundle adjustment, significantly enhancing frame-to-frame tracking accuracy. Extensive experiments on multiple large-scale datasets demonstrate that DenseSplat achieves superior performance in tracking and mapping compared to current state-of-the-art methods.
{"title":"DenseSplat: Densifying Gaussian Splatting SLAM With Neural Radiance Prior.","authors":"Mingrui Li, Shuhong Liu, Tianchen Deng, Hongyu Wang","doi":"10.1109/TVCG.2025.3617961","DOIUrl":"10.1109/TVCG.2025.3617961","url":null,"abstract":"<p><p>Gaussian SLAM systems excel in real-time rendering and fine-grained reconstruction compared to NeRF-based systems. However, their reliance on extensive keyframes is impractical for deployment in real-world robotic systems, which typically operate under sparse-view conditions that can result in substantial holes in the map. To address these challenges, we introduce DenseSplat, the first SLAM system that effectively combines the advantages of NeRF and 3DGS. DenseSplat utilizes sparse keyframes and NeRF priors for initializing primitives that densely populate maps and seamlessly fill gaps. It also implements geometry-aware primitive sampling and pruning strategies to manage granularity and enhance rendering efficiency. Moreover, DenseSplat integrates loop closure and bundle adjustment, significantly enhancing frame-to-frame tracking accuracy. Extensive experiments on multiple large-scale datasets demonstrate that DenseSplat achieves superior performance in tracking and mapping compared to current state-of-the-art methods.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1993-2006"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145240672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
3D Gaussian Splatting has emerged as a promising technique for high-quality 3D rendering, leading to increasing interest in integrating 3DGS into realism SLAM systems. However, existing methods face challenges such as Gaussian primitives redundancy, forgetting problem during continuous optimization, and difficulty in initializing primitives in monocular case due to lack of depth information. In order to achieve efficient and photorealistic mapping, we propose RP-SLAM, a 3D Gaussian splatting-based vision SLAM method for monocular and RGB-D cameras. RP-SLAM decouples camera poses estimation from Gaussian primitives optimization and consists of three key components. Firstly, we propose an efficient incremental mapping approach to achieve a compact and accurate representation of the scene through adaptive sampling and Gaussian primitives filtering. Secondly, a dynamic window optimization method is proposed to mitigate the forgetting problem and improve map consistency. Finally, for the monocular case, a monocular keyframe initialization method based on sparse point cloud is proposed to improve the initialization accuracy of Gaussian primitives, which provides a geometric basis for subsequent optimization. The results of numerous experiments demonstrate that RP-SLAM achieves state-of-the-art map rendering accuracy while ensuring real-time performance and model compactness.
{"title":"RP-SLAM: Real-Time Photorealistic SLAM With Efficient 3D Gaussian Splatting.","authors":"Lizhi Bai, Chunqi Tian, Jun Yang, Siyu Zhang, Masanori Suganuma, Takayuki Okatani","doi":"10.1109/TVCG.2025.3616173","DOIUrl":"10.1109/TVCG.2025.3616173","url":null,"abstract":"<p><p>3D Gaussian Splatting has emerged as a promising technique for high-quality 3D rendering, leading to increasing interest in integrating 3DGS into realism SLAM systems. However, existing methods face challenges such as Gaussian primitives redundancy, forgetting problem during continuous optimization, and difficulty in initializing primitives in monocular case due to lack of depth information. In order to achieve efficient and photorealistic mapping, we propose RP-SLAM, a 3D Gaussian splatting-based vision SLAM method for monocular and RGB-D cameras. RP-SLAM decouples camera poses estimation from Gaussian primitives optimization and consists of three key components. Firstly, we propose an efficient incremental mapping approach to achieve a compact and accurate representation of the scene through adaptive sampling and Gaussian primitives filtering. Secondly, a dynamic window optimization method is proposed to mitigate the forgetting problem and improve map consistency. Finally, for the monocular case, a monocular keyframe initialization method based on sparse point cloud is proposed to improve the initialization accuracy of Gaussian primitives, which provides a geometric basis for subsequent optimization. The results of numerous experiments demonstrate that RP-SLAM achieves state-of-the-art map rendering accuracy while ensuring real-time performance and model compactness.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1452-1466"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145208312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3625230
Yingzhi Tang, Qijian Zhang, Junhui Hou
We present HuGDiffusion, a generalizable 3D Gaussian splatting (3DGS) learning pipeline to achieve novel view synthesis (NVS) of human characters from single-view input images. Existing approaches typically require monocular videos or calibrated multi-view images as inputs, whose applicability could be weakened in real-world scenarios with arbitrary and/or unknown camera poses. In this paper, we aim to generate the set of 3DGS attributes via a diffusion-based framework conditioned on human priors extracted from a single image. Specifically, we begin with carefully integrated human-centric feature extraction procedures to deduce informative conditioning signals. Based on our empirical observations that jointly learning the whole 3DGS attributes is challenging to optimize, we design a multi-stage generation strategy to obtain different types of 3DGS attributes. To facilitate the training process, we investigate constructing proxy ground-truth 3D Gaussian attributes as high-quality attribute-level supervision signals. Through extensive experiments, our HuGDiffusion shows significant performance improvements over the state-of-the-art methods.
{"title":"HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion.","authors":"Yingzhi Tang, Qijian Zhang, Junhui Hou","doi":"10.1109/TVCG.2025.3625230","DOIUrl":"10.1109/TVCG.2025.3625230","url":null,"abstract":"<p><p>We present HuGDiffusion, a generalizable 3D Gaussian splatting (3DGS) learning pipeline to achieve novel view synthesis (NVS) of human characters from single-view input images. Existing approaches typically require monocular videos or calibrated multi-view images as inputs, whose applicability could be weakened in real-world scenarios with arbitrary and/or unknown camera poses. In this paper, we aim to generate the set of 3DGS attributes via a diffusion-based framework conditioned on human priors extracted from a single image. Specifically, we begin with carefully integrated human-centric feature extraction procedures to deduce informative conditioning signals. Based on our empirical observations that jointly learning the whole 3DGS attributes is challenging to optimize, we design a multi-stage generation strategy to obtain different types of 3DGS attributes. To facilitate the training process, we investigate constructing proxy ground-truth 3D Gaussian attributes as high-quality attribute-level supervision signals. Through extensive experiments, our HuGDiffusion shows significant performance improvements over the state-of-the-art methods.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"2061-2074"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145369319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3644671
Matteo Clemot, Julie Digne, Julien Tierny
This paper presents a novel topology-aware dimensionality reduction approach aiming at accurately visualizing the cyclic patterns present in high dimensional data. To that end, we build on the Topological Autoencoders (TopoAE) (Moor et al., 2020) formulation. First, we provide a novel theoretical analysis of its associated loss and show that a zero loss indeed induces identical persistence pairs (in high and low dimensions) for the 0-dimensional persistent homology ($text{PH}^{0}$) of the Rips filtration. We also provide a counter example showing that this property no longer holds for a naive extension of TopoAE to $text{PH}^{d}$ for $dgeq 1$. Based on this observation, we introduce a novel generalization of TopoAE to 1-dimensional persistent homology ($text{PH}^{1}$), called TopoAE++, for the accurate generation of cycle-aware planar embeddings, addressing the above failure case. This generalization is based on the notion of cascade distortion, a new penalty term favoring an isometric embedding of the 2-chains filling persistent 1-cycles, hence resulting in more faithful geometrical reconstructions of the 1-cycles in the plane. We further introduce a novel, fast algorithm for the exact computation of $text{PH}^{}$ for Rips filtrations in the plane, yielding improved runtimes over previously documented topology-aware methods. Our method also achieves a better balance between the topological accuracy, as measured by the Wasserstein distance, and the visual preservation of the cycles in low dimensions.
{"title":"Topological Autoencoders++: Fast and Accurate Cycle-Aware Dimensionality Reduction.","authors":"Matteo Clemot, Julie Digne, Julien Tierny","doi":"10.1109/TVCG.2025.3644671","DOIUrl":"10.1109/TVCG.2025.3644671","url":null,"abstract":"<p><p>This paper presents a novel topology-aware dimensionality reduction approach aiming at accurately visualizing the cyclic patterns present in high dimensional data. To that end, we build on the Topological Autoencoders (TopoAE) (Moor et al., 2020) formulation. First, we provide a novel theoretical analysis of its associated loss and show that a zero loss indeed induces identical persistence pairs (in high and low dimensions) for the 0-dimensional persistent homology ($text{PH}^{0}$) of the Rips filtration. We also provide a counter example showing that this property no longer holds for a naive extension of TopoAE to $text{PH}^{d}$ for $dgeq 1$. Based on this observation, we introduce a novel generalization of TopoAE to 1-dimensional persistent homology ($text{PH}^{1}$), called TopoAE++, for the accurate generation of cycle-aware planar embeddings, addressing the above failure case. This generalization is based on the notion of cascade distortion, a new penalty term favoring an isometric embedding of the 2-chains filling persistent 1-cycles, hence resulting in more faithful geometrical reconstructions of the 1-cycles in the plane. We further introduce a novel, fast algorithm for the exact computation of $text{PH}^{}$ for Rips filtrations in the plane, yielding improved runtimes over previously documented topology-aware methods. Our method also achieves a better balance between the topological accuracy, as measured by the Wasserstein distance, and the visual preservation of the cycles in low dimensions.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1622-1639"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-02-01DOI: 10.1109/TVCG.2025.3626754
Jiaye Leng, Hui Ye, Pengfei Xu, Miu-Ling Lam, Hongbo Fu
Found object drawing is a creative art form incorporating everyday objects into imaginative images, offering a refreshing and unique way to express ideas. However, for many people, creating this type of work can be challenging due to difficulties in generating creative ideas and finding suitable reference images to help translate their ideas onto paper. Based on the findings of a formative study, we propose GenFODrawing, a creativity support tool to help users create diverse found object drawings. Our system provides AI-driven textual and visual inspirations, and enhances controllability through sketch-based and box-conditioned image generation, enabling users to create personalized outputs. We conducted a user study with twelve participants to compare GenFODrawing, to a baseline condition where the participants completed the creative tasks using their own desired approaches without access to our system. The study demonstrated that GenFODrawing, enabled easier exploration of diverse ideas, greater agency and control through the creative process, and higher creativity support compared to the baseline. A further open-ended study demonstrated the system's usability and expressiveness, and all participants found the creative process engaging.
{"title":"GenFODrawing: Supporting Creative Found Object Drawing With Generative AI.","authors":"Jiaye Leng, Hui Ye, Pengfei Xu, Miu-Ling Lam, Hongbo Fu","doi":"10.1109/TVCG.2025.3626754","DOIUrl":"10.1109/TVCG.2025.3626754","url":null,"abstract":"<p><p>Found object drawing is a creative art form incorporating everyday objects into imaginative images, offering a refreshing and unique way to express ideas. However, for many people, creating this type of work can be challenging due to difficulties in generating creative ideas and finding suitable reference images to help translate their ideas onto paper. Based on the findings of a formative study, we propose GenFODrawing, a creativity support tool to help users create diverse found object drawings. Our system provides AI-driven textual and visual inspirations, and enhances controllability through sketch-based and box-conditioned image generation, enabling users to create personalized outputs. We conducted a user study with twelve participants to compare GenFODrawing, to a baseline condition where the participants completed the creative tasks using their own desired approaches without access to our system. The study demonstrated that GenFODrawing, enabled easier exploration of diverse ideas, greater agency and control through the creative process, and higher creativity support compared to the baseline. A further open-ended study demonstrated the system's usability and expressiveness, and all participants found the creative process engaging.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":"1978-1992"},"PeriodicalIF":6.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145411428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}