IEEE transactions on visualization and computer graphics最新文献

英文中文

ZigzagNetVis: Suggesting temporal resolutions for graph visualization using zigzag persistence.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-01-17 DOI: 10.1109/TVCG.2025.3528197

Raphael Tinarrage, Jean R Ponciano, Claudio D G Linhares, Agma J M Traina, Jorge Poco

Temporal graphs are commonly used to represent complex systems and track the evolution of their constituents over time. Visualizing these graphs is crucial as it allows one to quickly identify anomalies, trends, patterns, and other properties that facilitate better decision-making. In this context, selecting an appropriate temporal resolution is essential for constructing and visually analyzing the layout. The choice of resolution is particularly important, especially when dealing with temporally sparse graphs. In such cases, changing the temporal resolution by grouping events (i.e., edges) from consecutive timestamps - a technique known as timeslicing - can aid in the analysis and reveal patterns that might not be discernible otherwise. However, selecting an appropriate temporal resolution is a challenging task. In this paper, we propose ZigzagNetVis, a methodology that suggests temporal resolutions potentially relevant for analyzing a given graph, i.e., resolutions that lead to substantial topological changes in the graph structure. ZigzagNetVis achieves this by leveraging zigzag persistent homology, a well-established technique from Topological Data Analysis (TDA). To improve visual graph analysis, ZigzagNetVis incorporates the colored barcode, a novel timeline-based visualization inspired by persistence barcodes commonly used in TDA. We also contribute with a web-based system prototype that implements suggestion methodology and visualization tools. Finally, we demonstrate the usefulness and effectiveness of ZigzagNetVis through a usage scenario, a user study with 27 participants, and a detailed quantitative evaluation.

{"title":"ZigzagNetVis: Suggesting temporal resolutions for graph visualization using zigzag persistence.","authors":"Raphael Tinarrage, Jean R Ponciano, Claudio D G Linhares, Agma J M Traina, Jorge Poco","doi":"10.1109/TVCG.2025.3528197","DOIUrl":"https://doi.org/10.1109/TVCG.2025.3528197","url":null,"abstract":"Temporal graphs are commonly used to represent complex systems and track the evolution of their constituents over time. Visualizing these graphs is crucial as it allows one to quickly identify anomalies, trends, patterns, and other properties that facilitate better decision-making. In this context, selecting an appropriate temporal resolution is essential for constructing and visually analyzing the layout. The choice of resolution is particularly important, especially when dealing with temporally sparse graphs. In such cases, changing the temporal resolution by grouping events (i.e., edges) from consecutive timestamps - a technique known as timeslicing - can aid in the analysis and reveal patterns that might not be discernible otherwise. However, selecting an appropriate temporal resolution is a challenging task. In this paper, we propose ZigzagNetVis, a methodology that suggests temporal resolutions potentially relevant for analyzing a given graph, i.e., resolutions that lead to substantial topological changes in the graph structure. ZigzagNetVis achieves this by leveraging zigzag persistent homology, a well-established technique from Topological Data Analysis (TDA). To improve visual graph analysis, ZigzagNetVis incorporates the colored barcode, a novel timeline-based visualization inspired by persistence barcodes commonly used in TDA. We also contribute with a web-based system prototype that implements suggestion methodology and visualization tools. Finally, we demonstrate the usefulness and effectiveness of ZigzagNetVis through a usage scenario, a user study with 27 participants, and a detailed quantitative evaluation.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143545337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Voronoi Diagrams of Surface Patches.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-01-17 DOI: 10.1109/TVCG.2025.3531445

Pengfei Wang, Jiantao Song, Lei Wang, Shiqing Xin, Dong-Ming Yan, Shuangmin Chen, Changhe Tu, Wenping Wang

Extraction of a high-fidelity 3D medial axis is a crucial operation in CAD. When dealing with a polygonal model as input, ensuring accuracy and tidiness becomes challenging due to discretization errors inherent in the mesh surface. Commonly, existing approaches yield medial-axis surfaces with various artifacts, including zigzag boundaries, bumpy surfaces, unwanted spikes, and non-smooth stitching curves. Considering that the surface of a CAD model can be easily decomposed into a collection of surface patches, its 3D medial axis can be extracted by computing the Voronoi diagram of these surface patches, where each surface patch serves as a generator. However, no solver currently exists for accurately computing such an extended Voronoi diagram. Under the assumption that each generator defines a linear distance field over a sufficiently small range, our approach operates by tetrahedralizing the region of interest and computing the medial axis within each tetrahedral element. Just as SurfaceVoronoi computes surface-based Voronoi diagrams by cutting a 3D prism with 3D planes (each plane encodes a linear field in a triangle), the key operation in this paper is to conduct the hyperplane cutting process in 4D, where each hyperplane encodes a linear field in a tetrahedron. In comparison with the state-of-the-art, our algorithm produces better outcomes. Furthermore, it can also be used to compute the offset surface.

{"title":"Towards Voronoi Diagrams of Surface Patches.","authors":"Pengfei Wang, Jiantao Song, Lei Wang, Shiqing Xin, Dong-Ming Yan, Shuangmin Chen, Changhe Tu, Wenping Wang","doi":"10.1109/TVCG.2025.3531445","DOIUrl":"https://doi.org/10.1109/TVCG.2025.3531445","url":null,"abstract":"Extraction of a high-fidelity 3D medial axis is a crucial operation in CAD. When dealing with a polygonal model as input, ensuring accuracy and tidiness becomes challenging due to discretization errors inherent in the mesh surface. Commonly, existing approaches yield medial-axis surfaces with various artifacts, including zigzag boundaries, bumpy surfaces, unwanted spikes, and non-smooth stitching curves. Considering that the surface of a CAD model can be easily decomposed into a collection of surface patches, its 3D medial axis can be extracted by computing the Voronoi diagram of these surface patches, where each surface patch serves as a generator. However, no solver currently exists for accurately computing such an extended Voronoi diagram. Under the assumption that each generator defines a linear distance field over a sufficiently small range, our approach operates by tetrahedralizing the region of interest and computing the medial axis within each tetrahedral element. Just as SurfaceVoronoi computes surface-based Voronoi diagrams by cutting a 3D prism with 3D planes (each plane encodes a linear field in a triangle), the key operation in this paper is to conduct the hyperplane cutting process in 4D, where each hyperplane encodes a linear field in a tetrahedron. In comparison with the state-of-the-art, our algorithm produces better outcomes. Furthermore, it can also be used to compute the offset surface.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143545425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Perceptually-Aligned Dynamic Facial Projection Mapping by High-Speed Face-Tracking Method and Lens-Shift Co-Axial Setup.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-01-17 DOI: 10.1109/TVCG.2025.3527203

Hao-Lun Peng, Kengo Sato, Soran Nakagawa, Yoshihiro Watanabe

Dynamic Facial Projection Mapping (DFPM) overlays computer-generated images onto human faces to create immersive experiences that have been used in the makeup and entertainment industries. In this study, we propose two concepts to reduce the misalignment artifacts between projected images and target faces, which is a persistent challenge for DFPM. Our first concept is a high-speed face-tracking method that exploits temporal information. We first introduce a cropped-area-limited inter/extrapolation-based face detection framework, which allows parallel execution with facial landmark detection. We then propose a novel hybrid facial landmark detection method that combines fast Ensemble of Regression Trees (ERT)-based detections and an auxiliary detection. ERT-based detections rapidly produce results in 0.107 ms using temporal information with the support of auxiliary detection to recover from detection errors. To train the facial landmark detection method, we propose an innovative method for simulating high-frame-rate video annotations to address the lack of publicly available high-frame-rate annotated datasets. Our second concept is a lens-shift co-axial projector-camera setup that maintains a high optical alignment with only a 1.274-pixel error between 1 m and 2 m depth. This setup reduces misalignment by applying the same optical designs to the projector and camera without causing large misalignment as in conventional methods. Based on these concepts, we developed a novel high-speed DFPM system that achieves nearly perfect alignment with human visual perception.

{"title":"Perceptually-Aligned Dynamic Facial Projection Mapping by High-Speed Face-Tracking Method and Lens-Shift Co-Axial Setup.","authors":"Hao-Lun Peng, Kengo Sato, Soran Nakagawa, Yoshihiro Watanabe","doi":"10.1109/TVCG.2025.3527203","DOIUrl":"https://doi.org/10.1109/TVCG.2025.3527203","url":null,"abstract":"Dynamic Facial Projection Mapping (DFPM) overlays computer-generated images onto human faces to create immersive experiences that have been used in the makeup and entertainment industries. In this study, we propose two concepts to reduce the misalignment artifacts between projected images and target faces, which is a persistent challenge for DFPM. Our first concept is a high-speed face-tracking method that exploits temporal information. We first introduce a cropped-area-limited inter/extrapolation-based face detection framework, which allows parallel execution with facial landmark detection. We then propose a novel hybrid facial landmark detection method that combines fast Ensemble of Regression Trees (ERT)-based detections and an auxiliary detection. ERT-based detections rapidly produce results in 0.107 ms using temporal information with the support of auxiliary detection to recover from detection errors. To train the facial landmark detection method, we propose an innovative method for simulating high-frame-rate video annotations to address the lack of publicly available high-frame-rate annotated datasets. Our second concept is a lens-shift co-axial projector-camera setup that maintains a high optical alignment with only a 1.274-pixel error between 1 m and 2 m depth. This setup reduces misalignment by applying the same optical designs to the projector and camera without causing large misalignment as in conventional methods. Based on these concepts, we developed a novel high-speed DFPM system that achieves nearly perfect alignment with human visual perception.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143545292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scalable and High-Quality Neural Implicit Representation for 3D Reconstruction.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-01-15 DOI: 10.1109/TVCG.2025.3530452

Leyuan Yang, Bailin Deng, Juyong Zhang

Various SDF-based neural implicit surface reconstruction methods have been proposed recently, and have demonstrated remarkable modeling capabilities. However, due to the global nature and limited representation ability of a single network, existing methods still suffer from many drawbacks, such as limited accuracy and scale of the reconstruction. In this paper, we propose a versatile, scalable and high-quality neural implicit representation to address these issues. We integrate a divide-and-conquer approach into the neural SDF-based reconstruction. Specifically, we model the object or scene as a fusion of multiple independent local neural SDFs with overlapping regions. The construction of our representation involves three key steps: (1) constructing the distribution and overlap relationship of the local radiance fields based on object structure or data distribution, (2) relative pose registration for adjacent local SDFs, and (3) SDF blending. Thanks to the independent representation of each local region, our approach can not only achieve high-fidelity surface reconstruction, but also enable scalable scene reconstruction. Extensive experimental results demonstrate the effectiveness and practicality of our proposed method.

{"title":"Scalable and High-Quality Neural Implicit Representation for 3D Reconstruction.","authors":"Leyuan Yang, Bailin Deng, Juyong Zhang","doi":"10.1109/TVCG.2025.3530452","DOIUrl":"https://doi.org/10.1109/TVCG.2025.3530452","url":null,"abstract":"Various SDF-based neural implicit surface reconstruction methods have been proposed recently, and have demonstrated remarkable modeling capabilities. However, due to the global nature and limited representation ability of a single network, existing methods still suffer from many drawbacks, such as limited accuracy and scale of the reconstruction. In this paper, we propose a versatile, scalable and high-quality neural implicit representation to address these issues. We integrate a divide-and-conquer approach into the neural SDF-based reconstruction. Specifically, we model the object or scene as a fusion of multiple independent local neural SDFs with overlapping regions. The construction of our representation involves three key steps: (1) constructing the distribution and overlap relationship of the local radiance fields based on object structure or data distribution, (2) relative pose registration for adjacent local SDFs, and (3) SDF blending. Thanks to the independent representation of each local region, our approach can not only achieve high-fidelity surface reconstruction, but also enable scalable scene reconstruction. Extensive experimental results demonstrate the effectiveness and practicality of our proposed method.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143545419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DDF-ISM: Internal Structure Modeling of Human Head Using Probabilistic Directed Distance Field.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-01-15 DOI: 10.1109/TVCG.2025.3530484

Zhuoman Liu, Yan Luximon, Wei Lin Ng, Eric Chung

The increasing interest surrounding 3D human heads for digital avatars and simulations has highlighted the need for accurate internal modeling rather than solely focusing on external approximations. Existing approaches rely on traditional optimization techniques applied to explicit 3D representations like point clouds and meshes, leading to computational inefficiencies and challenges in capturing local geometric features. To tackle these problems, we propose a novel modeling method called DDF-ISM. It leverages a probabilistic Directed Distance Field for Internal Structure Modeling, facilitating efficient and anatomically accurate deformation of different parts of the human head. DDF-ISM comprises two key components: 1) a probabilistic DDF network for implicit representation of the target model to provide crucial local geometric information, and 2) a conditioned deformation network guided by the local geometry. Additionally, we introduce a large-scale dataset of human heads with internal structures derived from high-quality Computed Tomography (CT) scans, along with well-designed template models encompassing skull, mandible, brain, and head surface. Evaluation on this dataset showcases the superiority of our approach over existing methods, exhibiting superior performance in both modeling quality and efficiency.

{"title":"DDF-ISM: Internal Structure Modeling of Human Head Using Probabilistic Directed Distance Field.","authors":"Zhuoman Liu, Yan Luximon, Wei Lin Ng, Eric Chung","doi":"10.1109/TVCG.2025.3530484","DOIUrl":"https://doi.org/10.1109/TVCG.2025.3530484","url":null,"abstract":"The increasing interest surrounding 3D human heads for digital avatars and simulations has highlighted the need for accurate internal modeling rather than solely focusing on external approximations. Existing approaches rely on traditional optimization techniques applied to explicit 3D representations like point clouds and meshes, leading to computational inefficiencies and challenges in capturing local geometric features. To tackle these problems, we propose a novel modeling method called DDF-ISM. It leverages a probabilistic Directed Distance Field for Internal Structure Modeling, facilitating efficient and anatomically accurate deformation of different parts of the human head. DDF-ISM comprises two key components: 1) a probabilistic DDF network for implicit representation of the target model to provide crucial local geometric information, and 2) a conditioned deformation network guided by the local geometry. Additionally, we introduce a large-scale dataset of human heads with internal structures derived from high-quality Computed Tomography (CT) scans, along with well-designed template models encompassing skull, mandible, brain, and head surface. Evaluation on this dataset showcases the superiority of our approach over existing methods, exhibiting superior performance in both modeling quality and efficiency.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143545404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Photorealistic Portrait Style Transfer in Unconstrained Conditions.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-01-15 DOI: 10.1109/TVCG.2025.3529751

Xinbo Wang, Qing Zhang, Yongwei Nie, Wei-Shi Zheng

We present a photorealistic portrait style transfer approach that allows for producing high-quality results in previously challenging unconstrained conditions, e.g., large facial perspective difference between portraits, faces with complex illumination (e.g., shadow and highlight) and occlusion, and can test without portrait parsing masks. We achieve this by developing a framework to learn robust dense correspondence across portraits for semantically aligned style transfer, where a regional style contrastive learning strategy is devised to boost the effectiveness of semantic-aware style transfer while enhancing the robustness to complex illumination. Extensive experiments demonstrate the superiority of our method. Our code is available at https://github.com/wangxb29/PPST.

引用次数: 0

Narrative Player: Reviving Data Narratives with Visuals.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-01-15 DOI: 10.1109/TVCG.2025.3530512

Zekai Shao, Leixian Shen, Haotian Li, Yi Shan, Huamin Qu, Yun Wang, Siming Chen

Data-rich documents are commonly found across various fields such as business, finance, and science. However, a general limitation of these documents for reading is their reliance on text to convey data and facts. Visual representation of text aids in providing a satisfactory reading experience in comprehension and engagement. However, existing work emphasizes presenting the insights within phrases or sentences, rather than fully conveying data stories within the whole paragraphs and engaging readers. To provide readers with satisfactory data stories, this paper presents Narrative Player, a novel method that automatically revives data narratives with consistent and contextualized visuals. Specifically, it accepts a paragraph and corresponding data table as input and leverages LLMs to characterize the clauses and extract contextualized data facts. Subsequently, the facts are transformed into a coherent visualization sequence with a carefully designed optimization-based approach. Animations are also assigned between adjacent visualizations to enable seamless transitions. Finally, the visualization sequence, transition animations, and audio narration generated by text-to-speech technologies are rendered into a data video. The evaluation results showed that the automatic-generated data videos were well-received by participants and experts for enhancing reading.

{"title":"Narrative Player: Reviving Data Narratives with Visuals.","authors":"Zekai Shao, Leixian Shen, Haotian Li, Yi Shan, Huamin Qu, Yun Wang, Siming Chen","doi":"10.1109/TVCG.2025.3530512","DOIUrl":"https://doi.org/10.1109/TVCG.2025.3530512","url":null,"abstract":"Data-rich documents are commonly found across various fields such as business, finance, and science. However, a general limitation of these documents for reading is their reliance on text to convey data and facts. Visual representation of text aids in providing a satisfactory reading experience in comprehension and engagement. However, existing work emphasizes presenting the insights within phrases or sentences, rather than fully conveying data stories within the whole paragraphs and engaging readers. To provide readers with satisfactory data stories, this paper presents Narrative Player, a novel method that automatically revives data narratives with consistent and contextualized visuals. Specifically, it accepts a paragraph and corresponding data table as input and leverages LLMs to characterize the clauses and extract contextualized data facts. Subsequently, the facts are transformed into a coherent visualization sequence with a carefully designed optimization-based approach. Animations are also assigned between adjacent visualizations to enable seamless transitions. Finally, the visualization sequence, transition animations, and audio narration generated by text-to-speech technologies are rendered into a data video. The evaluation results showed that the automatic-generated data videos were well-received by participants and experts for enhancing reading.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143545245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SpeechAct: Towards Generating Whole-Body Motion From Speech.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-01-13 DOI: 10.1109/TVCG.2025.3529611

Jinsong Zhang, Minjie Zhu, Yuxiang Zhang, Zerong Zheng, Yebin Liu, Kun Li

Whole-body motion generation from speech audio is crucial for computer graphics and immersive VR/AR. Prior methods struggle to produce natural and diverse whole-body motions from speech. In this paper, we introduce a novel method, named SpeechAct, based on a hybrid point representation and contrastive motion learning to boost realism and diversity in motion generation. Our hybrid point representation leverages the advantages of keypoint representation and surface points of 3D body model, which is easy to learn and helps to achieve smooth and natural motion generation from speech audio. We design a VQ-VAE to learn a motion codebook using our hybrid presentation, and then regress the motion from the input audio using a translation model. To boost diversity in motion generation, we propose a contrastive motion learning method according to the intuitive idea that the generated motion should be different from the motions of other audios and other speakers. We collect negative samples from other audio inputs and other speakers using our translation model. With these negative samples, we pull the current motion away from them using a contrastive loss to produce more distinctive representations. In addition, we compose a face generator to generate deterministic face motion due to the strong connection between the face movements and the speech audio. Experimental results validate the superior performance of our model. The code will be available for research purposes.

{"title":"SpeechAct: Towards Generating Whole-Body Motion From Speech.","authors":"Jinsong Zhang, Minjie Zhu, Yuxiang Zhang, Zerong Zheng, Yebin Liu, Kun Li","doi":"10.1109/TVCG.2025.3529611","DOIUrl":"https://doi.org/10.1109/TVCG.2025.3529611","url":null,"abstract":"Whole-body motion generation from speech audio is crucial for computer graphics and immersive VR/AR. Prior methods struggle to produce natural and diverse whole-body motions from speech. In this paper, we introduce a novel method, named SpeechAct, based on a hybrid point representation and contrastive motion learning to boost realism and diversity in motion generation. Our hybrid point representation leverages the advantages of keypoint representation and surface points of 3D body model, which is easy to learn and helps to achieve smooth and natural motion generation from speech audio. We design a VQ-VAE to learn a motion codebook using our hybrid presentation, and then regress the motion from the input audio using a translation model. To boost diversity in motion generation, we propose a contrastive motion learning method according to the intuitive idea that the generated motion should be different from the motions of other audios and other speakers. We collect negative samples from other audio inputs and other speakers using our translation model. With these negative samples, we pull the current motion away from them using a contrastive loss to produce more distinctive representations. In addition, we compose a face generator to generate deterministic face motion due to the strong connection between the face movements and the speech audio. Experimental results validate the superior performance of our model. The code will be available for research purposes.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143545421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VAT: Visibility Aware Transformer for Fine-Grained Clothed Human Reconstruction.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-01-10 DOI: 10.1109/TVCG.2025.3528021

Xiaoyan Zhang, Zibin Zhu, Hong Xie, Sisi Ren, Jianmin Jiang

In order to reconstruct 3D clothed human with accurate fine-grained details from sparse views, we propose a deep cooperating two-level global to fine-grained reconstruction framework that constructs robust global geometry to guide fine-grained geometry learning. The core of the framework is a novel visibility aware Transformer VAT, which bridges the two-level reconstruction architecture by connecting its global encoder and fine-grained decoder with two pixel-aligned implicit functions, respectively. The global encoder fuses semantic features of multiple views to integrate global geometric features. In the fine-grained decoder, visibility aware attention mechanism is designed to efficiently fuse multi-view and multi-scale features for mining fine-grained geometric features. The global encoder and fine-grained decoder are connected by a global embeding module to form a deep cooperation in the two-level framework, which provides global geometric embedding as a query guidance for calculating visibility aware attention in the fine-grained decoder. In addition, to extract highly aligned multi-scale features for the two-level reconstruction architecture, we design an image feature extractor MSUNet, which establishes strong semantic connections between different scales at minimal cost. Our proposed framework is end-to-end trainable, with all modules jointly optimized. We validate the effectiveness of our framework on public benchmarks, and experimental results demonstrate that our method has significant advantages over state-of-the-art methods in terms of both fine-grained performance and generalization.

{"title":"VAT: Visibility Aware Transformer for Fine-Grained Clothed Human Reconstruction.","authors":"Xiaoyan Zhang, Zibin Zhu, Hong Xie, Sisi Ren, Jianmin Jiang","doi":"10.1109/TVCG.2025.3528021","DOIUrl":"https://doi.org/10.1109/TVCG.2025.3528021","url":null,"abstract":"In order to reconstruct 3D clothed human with accurate fine-grained details from sparse views, we propose a deep cooperating two-level global to fine-grained reconstruction framework that constructs robust global geometry to guide fine-grained geometry learning. The core of the framework is a novel visibility aware Transformer VAT, which bridges the two-level reconstruction architecture by connecting its global encoder and fine-grained decoder with two pixel-aligned implicit functions, respectively. The global encoder fuses semantic features of multiple views to integrate global geometric features. In the fine-grained decoder, visibility aware attention mechanism is designed to efficiently fuse multi-view and multi-scale features for mining fine-grained geometric features. The global encoder and fine-grained decoder are connected by a global embeding module to form a deep cooperation in the two-level framework, which provides global geometric embedding as a query guidance for calculating visibility aware attention in the fine-grained decoder. In addition, to extract highly aligned multi-scale features for the two-level reconstruction architecture, we design an image feature extractor MSUNet, which establishes strong semantic connections between different scales at minimal cost. Our proposed framework is end-to-end trainable, with all modules jointly optimized. We validate the effectiveness of our framework on public benchmarks, and experimental results demonstrate that our method has significant advantages over state-of-the-art methods in terms of both fine-grained performance and generalization.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143545329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Criteria Decision Analysis for Aiding Glyph Design.

IEEE transactions on visualization and computer graphics

Pub Date : 2025-01-08 DOI: 10.1109/TVCG.2025.3526918

Hong-Po Hsieh, Amy Zavatsky, Min Chen

Glyph-based visualization is one of the main techniques for visualizing complex multivariate data. With small glyphs, data variables are typically encoded with relatively low visual and perceptual precision. Glyph designers have to contemplate the trade-offs in allocating visual channels when there is a large number of data variables. While there are many successful glyph designs in the literature, there is not yet a systematic method for assisting visualization designers to evaluate different design options that feature different types of trade-offs. In this paper, we present an evaluation scheme based on the multi-criteria decision analysis (MCDA) methodology. The scheme provides designers with a structured way to consider their glyph designs from a range of perspectives, while rendering a semi-quantitative template for evaluating different design options. In addition, this work provides guideposts for future empirical research to obtain more quantitative measurements that can be used in MCDA-aided glyph design processes.

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE transactions on visualization and computer graphics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀