Pub Date : 2024-09-25DOI: 10.1109/TVCG.2024.3466964
Zijun Zhou, Fan Tang, Yuxin Zhang, Oliver Deussen, Juan Cao, Weiming Dong, Xiangtao Li, Tong-Yee Lee
Despite the remarkable process in the field of arbitrary image style transfer (AST), inconsistent evaluation continues to plague style transfer research. Existing methods often suffer from limited objective evaluation and inconsistent subjective feedback, hindering reliable comparisons among AST variants. In this study, we propose a multi-granularity assessment system that combines standardized objective and subjective evaluations. We collect a fine-grained dataset considering a range of image contexts such as different scenes, object complexities, and rich parsing information from multiple sources. Objective and subjective studies are conducted using the collected dataset. Specifically, we innovate on traditional subjective studies by developing an online evaluation system utilizing a combination of point-wise, pair-wise, and group-wise questionnaires. Finally, we bridge the gap between objective and subjective evaluations by examining the consistency between the results from the two studies. We experimentally evaluate CNN-based, flow-based, transformer-based, and diffusion-based AST methods by the proposed multi-granularity assessment system, which lays the foundation for a reliable and robust evaluation. Providing standardized measures, objective data, and detailed subjective feedback empowers researchers to make informed comparisons and drive innovation in this rapidly evolving field. Finally, for the collected dataset and our online evaluation system, please see http://ivc.ia.ac.cn.
{"title":"A Comprehensive Evaluation of Arbitrary Image Style Transfer Methods.","authors":"Zijun Zhou, Fan Tang, Yuxin Zhang, Oliver Deussen, Juan Cao, Weiming Dong, Xiangtao Li, Tong-Yee Lee","doi":"10.1109/TVCG.2024.3466964","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3466964","url":null,"abstract":"<p><p>Despite the remarkable process in the field of arbitrary image style transfer (AST), inconsistent evaluation continues to plague style transfer research. Existing methods often suffer from limited objective evaluation and inconsistent subjective feedback, hindering reliable comparisons among AST variants. In this study, we propose a multi-granularity assessment system that combines standardized objective and subjective evaluations. We collect a fine-grained dataset considering a range of image contexts such as different scenes, object complexities, and rich parsing information from multiple sources. Objective and subjective studies are conducted using the collected dataset. Specifically, we innovate on traditional subjective studies by developing an online evaluation system utilizing a combination of point-wise, pair-wise, and group-wise questionnaires. Finally, we bridge the gap between objective and subjective evaluations by examining the consistency between the results from the two studies. We experimentally evaluate CNN-based, flow-based, transformer-based, and diffusion-based AST methods by the proposed multi-granularity assessment system, which lays the foundation for a reliable and robust evaluation. Providing standardized measures, objective data, and detailed subjective feedback empowers researchers to make informed comparisons and drive innovation in this rapidly evolving field. Finally, for the collected dataset and our online evaluation system, please see http://ivc.ia.ac.cn.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142335182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-24DOI: 10.1109/TVCG.2024.3456215
Jaeyoung Kim, Sihyeon Lee, Hyeon Jeon, Keon-Joo Lee, Hee-Joon Bae, Bohyoung Kim, Jinwook Seo
Acute stroke demands prompt diagnosis and treatment to achieve optimal patient outcomes. However, the intricate and irregular nature of clinical data associated with acute stroke, particularly blood pressure (BP) measurements, presents substantial obstacles to effective visual analytics and decision-making. Through a year-long collaboration with experienced neurologists, we developed PhenoFlow, a visual analytics system that leverages the collaboration between human and Large Language Models (LLMs) to analyze the extensive and complex data of acute ischemic stroke patients. PhenoFlow pioneers an innovative workflow, where the LLM serves as a data wrangler while neurologists explore and supervise the output using visualizations and natural language interactions. This approach enables neurologists to focus more on decision-making with reduced cognitive load. To protect sensitive patient information, PhenoFlow only utilizes metadata to make inferences and synthesize executable codes, without accessing raw patient data. This ensures that the results are both reproducible and interpretable while maintaining patient privacy. The system incorporates a slice-and-wrap design that employs temporal folding to create an overlaid circular visualization. Combined with a linear bar graph, this design aids in exploring meaningful patterns within irregularly measured BP data. Through case studies, PhenoFlow has demonstrated its capability to support iterative analysis of extensive clinical datasets, reducing cognitive load and enabling neurologists to make well-informed decisions. Grounded in long-term collaboration with domain experts, our research demonstrates the potential of utilizing LLMs to tackle current challenges in data-driven clinical decision-making for acute ischemic stroke patients.
{"title":"PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets.","authors":"Jaeyoung Kim, Sihyeon Lee, Hyeon Jeon, Keon-Joo Lee, Hee-Joon Bae, Bohyoung Kim, Jinwook Seo","doi":"10.1109/TVCG.2024.3456215","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3456215","url":null,"abstract":"<p><p>Acute stroke demands prompt diagnosis and treatment to achieve optimal patient outcomes. However, the intricate and irregular nature of clinical data associated with acute stroke, particularly blood pressure (BP) measurements, presents substantial obstacles to effective visual analytics and decision-making. Through a year-long collaboration with experienced neurologists, we developed PhenoFlow, a visual analytics system that leverages the collaboration between human and Large Language Models (LLMs) to analyze the extensive and complex data of acute ischemic stroke patients. PhenoFlow pioneers an innovative workflow, where the LLM serves as a data wrangler while neurologists explore and supervise the output using visualizations and natural language interactions. This approach enables neurologists to focus more on decision-making with reduced cognitive load. To protect sensitive patient information, PhenoFlow only utilizes metadata to make inferences and synthesize executable codes, without accessing raw patient data. This ensures that the results are both reproducible and interpretable while maintaining patient privacy. The system incorporates a slice-and-wrap design that employs temporal folding to create an overlaid circular visualization. Combined with a linear bar graph, this design aids in exploring meaningful patterns within irregularly measured BP data. Through case studies, PhenoFlow has demonstrated its capability to support iterative analysis of extensive clinical datasets, reducing cognitive load and enabling neurologists to make well-informed decisions. Grounded in long-term collaboration with domain experts, our research demonstrates the potential of utilizing LLMs to tackle current challenges in data-driven clinical decision-making for acute ischemic stroke patients.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142335188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-24DOI: 10.1109/TVCG.2024.3456325
Haoran Jiang, Shaohan Shi, Shuhao Zhang, Jie Zheng, Quan Li
Synthetic Lethal (SL) relationships, though rare among the vast array of gene combinations, hold substantial promise for targeted cancer therapy. Despite advancements in AI model accuracy, there is still a significant need among domain experts for interpretive paths and mechanism explorations that align better with domain-specific knowledge, particularly due to the high costs of experimentation. To address this gap, we propose an iterative Human-AI collaborative framework with two key components: 1) HumanEngaged Knowledge Graph Refinement based on Metapath Strategies, which leverages insights from interpretive paths and domain expertise to refine the knowledge graph through metapath strategies with appropriate granularity. 2) Cross-Granularity SL Interpretation Enhancement and Mechanism Analysis, which aids experts in organizing and comparing predictions and interpretive paths across different granularities, uncovering new SL relationships, enhancing result interpretation, and elucidating potential mechanisms inferred by Graph Neural Network (GNN) models. These components cyclically optimize model predictions and mechanism explorations, enhancing expert involvement and intervention to build trust. Facilitated by SLInterpreter, this framework ensures that newly generated interpretive paths increasingly align with domain knowledge and adhere more closely to real-world biological principles through iterative Human-AI collaboration. We evaluate the framework's efficacy through a case study and expert interviews.
{"title":"SLInterpreter: An Exploratory and Iterative Human-AI Collaborative System for GNN-based Synthetic Lethal Prediction.","authors":"Haoran Jiang, Shaohan Shi, Shuhao Zhang, Jie Zheng, Quan Li","doi":"10.1109/TVCG.2024.3456325","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3456325","url":null,"abstract":"<p><p>Synthetic Lethal (SL) relationships, though rare among the vast array of gene combinations, hold substantial promise for targeted cancer therapy. Despite advancements in AI model accuracy, there is still a significant need among domain experts for interpretive paths and mechanism explorations that align better with domain-specific knowledge, particularly due to the high costs of experimentation. To address this gap, we propose an iterative Human-AI collaborative framework with two key components: 1) HumanEngaged Knowledge Graph Refinement based on Metapath Strategies, which leverages insights from interpretive paths and domain expertise to refine the knowledge graph through metapath strategies with appropriate granularity. 2) Cross-Granularity SL Interpretation Enhancement and Mechanism Analysis, which aids experts in organizing and comparing predictions and interpretive paths across different granularities, uncovering new SL relationships, enhancing result interpretation, and elucidating potential mechanisms inferred by Graph Neural Network (GNN) models. These components cyclically optimize model predictions and mechanism explorations, enhancing expert involvement and intervention to build trust. Facilitated by SLInterpreter, this framework ensures that newly generated interpretive paths increasingly align with domain knowledge and adhere more closely to real-world biological principles through iterative Human-AI collaboration. We evaluate the framework's efficacy through a case study and expert interviews.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142335189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-23DOI: 10.1109/TVCG.2024.3456186
Klaus Eckelt, Kiran Gadhave, Alexander Lex, Marc Streit
Exploratory data science is an iterative process of obtaining, cleaning, profiling, analyzing, and interpreting data. This cyclical way of working creates challenges within the linear structure of computational notebooks, leading to issues with code quality, recall, and reproducibility. To remedy this, we present Loops, a set of visual support techniques for iterative and exploratory data analysis in computational notebooks. Loops leverages provenance information to visualize the impact of changes made within a notebook. In visualizations of the notebook provenance, we trace the evolution of the notebook over time and highlight differences between versions. Loops visualizes the provenance of code, markdown, tables, visualizations, and images and their respective differences. Analysts can explore these differences in detail in a separate view. Loops not only makes the analysis process transparent but also supports analysts in their data science work by showing the effects of changes and facilitating comparison of multiple versions. We demonstrate our approach's utility and potential impact in two use cases and feedback from notebook users from various backgrounds. This paper and all supplemental materials are available at https://osf.io/79eyn.
{"title":"Loops: Leveraging Provenance and Visualization to Support Exploratory Data Analysis in Notebooks.","authors":"Klaus Eckelt, Kiran Gadhave, Alexander Lex, Marc Streit","doi":"10.1109/TVCG.2024.3456186","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3456186","url":null,"abstract":"<p><p>Exploratory data science is an iterative process of obtaining, cleaning, profiling, analyzing, and interpreting data. This cyclical way of working creates challenges within the linear structure of computational notebooks, leading to issues with code quality, recall, and reproducibility. To remedy this, we present Loops, a set of visual support techniques for iterative and exploratory data analysis in computational notebooks. Loops leverages provenance information to visualize the impact of changes made within a notebook. In visualizations of the notebook provenance, we trace the evolution of the notebook over time and highlight differences between versions. Loops visualizes the provenance of code, markdown, tables, visualizations, and images and their respective differences. Analysts can explore these differences in detail in a separate view. Loops not only makes the analysis process transparent but also supports analysts in their data science work by showing the effects of changes and facilitating comparison of multiple versions. We demonstrate our approach's utility and potential impact in two use cases and feedback from notebook users from various backgrounds. This paper and all supplemental materials are available at https://osf.io/79eyn.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The integration of Large Language Models (LLMs), especially ChatGPT, into education is poised to revolutionize students' learning experiences by introducing innovative conversational learning methodologies. To empower students to fully leverage the capabilities of ChatGPT in educational scenarios, understanding students' interaction patterns with ChatGPT is crucial for instructors. However, this endeavor is challenging due to the absence of datasets focused on student-ChatGPT conversations and the complexities in identifying and analyzing the evolutional interaction patterns within conversations. To address these challenges, we collected conversational data from 48 students interacting with ChatGPT in a master's level data visualization course over one semester. We then developed a coding scheme, grounded in the literature on cognitive levels and thematic analysis, to categorize students' interaction patterns with ChatGPT. Furthermore, we present a visual analytics system, StuGPTViz, that tracks and compares temporal patterns in student prompts and the quality of ChatGPT's responses at multiple scales, revealing significant pedagogical insights for instructors. We validated the system's effectiveness through expert interviews with six data visualization instructors and three case studies. The results confirmed StuGPTViz's capacity to enhance educators' insights into the pedagogical value of ChatGPT. We also discussed the potential research opportunities of applying visual analytics in education and developing AI-driven personalized learning solutions.
{"title":"StuGPTViz: A Visual Analytics Approach to Understand Student-ChatGPT Interactions.","authors":"Zixin Chen, Jiachen Wang, Meng Xia, Kento Shigyo, Dingdong Liu, Rong Zhang, Huamin Qu","doi":"10.1109/TVCG.2024.3456363","DOIUrl":"10.1109/TVCG.2024.3456363","url":null,"abstract":"<p><p>The integration of Large Language Models (LLMs), especially ChatGPT, into education is poised to revolutionize students' learning experiences by introducing innovative conversational learning methodologies. To empower students to fully leverage the capabilities of ChatGPT in educational scenarios, understanding students' interaction patterns with ChatGPT is crucial for instructors. However, this endeavor is challenging due to the absence of datasets focused on student-ChatGPT conversations and the complexities in identifying and analyzing the evolutional interaction patterns within conversations. To address these challenges, we collected conversational data from 48 students interacting with ChatGPT in a master's level data visualization course over one semester. We then developed a coding scheme, grounded in the literature on cognitive levels and thematic analysis, to categorize students' interaction patterns with ChatGPT. Furthermore, we present a visual analytics system, StuGPTViz, that tracks and compares temporal patterns in student prompts and the quality of ChatGPT's responses at multiple scales, revealing significant pedagogical insights for instructors. We validated the system's effectiveness through expert interviews with six data visualization instructors and three case studies. The results confirmed StuGPTViz's capacity to enhance educators' insights into the pedagogical value of ChatGPT. We also discussed the potential research opportunities of applying visual analytics in education and developing AI-driven personalized learning solutions.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-23DOI: 10.1109/TVCG.2024.3456315
Andreas Walch, Attila Szabo, Harald Steinlechner, Thomas Ortner, Eduard Groller, Johanna Schmidt
Building Information Modeling (BIM) describes a central data pool covering the entire life cycle of a construction project. Similarly, Building Energy Modeling (BEM) describes the process of using a 3D representation of a building as a basis for thermal simulations to assess the building's energy performance. This paper explores the intersection of BIM and BEM, focusing on the challenges and methodologies in converting BIM data into BEM representations for energy performance analysis. BEMTrace integrates 3D data wrangling techniques with visualization methodologies to enhance the accuracy and traceability of the BIM-to-BEM conversion process. Through parsing, error detection, and algorithmic correction of BIM data, our methods generate valid BEM models suitable for energy simulation. Visualization techniques provide transparent insights into the conversion process, aiding error identifcation, validation, and user comprehension. We introduce context-adaptive selections to facilitate user interaction and to show that the BEMTrace workfow helps users understand complex 3D data wrangling processes.
建筑信息模型(BIM)描述了一个涵盖建筑项目整个生命周期的中央数据池。同样,建筑能源建模(BEM)描述了使用建筑物的三维表示作为热模拟基础来评估建筑物能源性能的过程。本文探讨了 BIM 和 BEM 的交叉点,重点是将 BIM 数据转换为 BEM 表征用于能源性能分析的挑战和方法。BEMTrace 集成了三维数据处理技术和可视化方法,以提高 BIM 到 BEM 转换过程的准确性和可追溯性。通过对 BIM 数据进行解析、错误检测和算法修正,我们的方法可生成适用于能源模拟的有效 BEM 模型。可视化技术为转换过程提供了透明的视角,有助于错误识别、验证和用户理解。我们引入了上下文自适应选择,以促进用户互动,并表明 BEMTrace 工作流有助于用户理解复杂的三维数据处理过程。
{"title":"BEMTrace: Visualization-driven approach for deriving Building Energy Models from BIM.","authors":"Andreas Walch, Attila Szabo, Harald Steinlechner, Thomas Ortner, Eduard Groller, Johanna Schmidt","doi":"10.1109/TVCG.2024.3456315","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3456315","url":null,"abstract":"<p><p>Building Information Modeling (BIM) describes a central data pool covering the entire life cycle of a construction project. Similarly, Building Energy Modeling (BEM) describes the process of using a 3D representation of a building as a basis for thermal simulations to assess the building's energy performance. This paper explores the intersection of BIM and BEM, focusing on the challenges and methodologies in converting BIM data into BEM representations for energy performance analysis. BEMTrace integrates 3D data wrangling techniques with visualization methodologies to enhance the accuracy and traceability of the BIM-to-BEM conversion process. Through parsing, error detection, and algorithmic correction of BIM data, our methods generate valid BEM models suitable for energy simulation. Visualization techniques provide transparent insights into the conversion process, aiding error identifcation, validation, and user comprehension. We introduce context-adaptive selections to facilitate user interaction and to show that the BEMTrace workfow helps users understand complex 3D data wrangling processes.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-23DOI: 10.1109/TVCG.2024.3456391
Brian Montambault, Gabriel Appleby, Jen Rogers, Camelia D Brumar, Mingwei Li, Remco Chang
Dimensionality reduction techniques are widely used for visualizing high-dimensional data. However, support for interpreting patterns of dimension reduction results in the context of the original data space is often insufficient. Consequently, users may struggle to extract insights from the projections. In this paper, we introduce DimBridge, a visual analytics tool that allows users to interact with visual patterns in a projection and retrieve corresponding data patterns. DimBridge supports several interactions, allowing users to perform various analyses, from contrasting multiple clusters to explaining complex latent structures. Leveraging first-order predicate logic, DimBridge identifies subspaces in the original dimensions relevant to a queried pattern and provides an interface for users to visualize and interact with them. We demonstrate how DimBridge can help users overcome the challenges associated with interpreting visual patterns in projections.
{"title":"DimBridge: Interactive Explanation of Visual Patterns in Dimensionality Reductions with Predicate Logic.","authors":"Brian Montambault, Gabriel Appleby, Jen Rogers, Camelia D Brumar, Mingwei Li, Remco Chang","doi":"10.1109/TVCG.2024.3456391","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3456391","url":null,"abstract":"<p><p>Dimensionality reduction techniques are widely used for visualizing high-dimensional data. However, support for interpreting patterns of dimension reduction results in the context of the original data space is often insufficient. Consequently, users may struggle to extract insights from the projections. In this paper, we introduce DimBridge, a visual analytics tool that allows users to interact with visual patterns in a projection and retrieve corresponding data patterns. DimBridge supports several interactions, allowing users to perform various analyses, from contrasting multiple clusters to explaining complex latent structures. Leveraging first-order predicate logic, DimBridge identifies subspaces in the original dimensions relevant to a queried pattern and provides an interface for users to visualize and interact with them. We demonstrate how DimBridge can help users overcome the challenges associated with interpreting visual patterns in projections.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Grasping generation holds significant importance in both robotics and AI-generated content. While pure network paradigms based on VAEs or GANs ensure diversity in outcomes, they often fall short of achieving plausibility. Additionally, although those two-step paradigms that first predict contact and then optimize distance yield plausible results, they are always known to be time-consuming. This paper introduces a novel paradigm powered by DDPM, accommodating diverse modalities with varying interaction granularities as its generating conditions, including 3D object, contact affordance, and image content. Our key idea is that the iterative steps inherent to diffusion models can supplant the iterative optimization routines in existing optimization methods, thereby endowing the generated results from our method with both diversity and plausibility. Using the same training data, our paradigm achieves superior generation performance and competitive generation speed compared to optimization-based paradigms. Extensive experiments on both in-domain and out-of-domain objects demonstrate that our method receives significant improvement over the SOTA method. We will release the code for research purposes.
抓取生成对于机器人和人工智能生成的内容都具有重要意义。虽然基于 VAE 或 GAN 的纯网络范式可确保结果的多样性,但它们往往无法实现可信度。此外,虽然那些先预测接触然后优化距离的两步范式能产生可信的结果,但众所周知它们总是非常耗时。本文介绍了一种由 DDPM 驱动的新型范式,该范式的生成条件包括三维物体、接触能力和图像内容等,可适应不同的交互模式和不同的交互粒度。我们的主要想法是,扩散模型固有的迭代步骤可以取代现有优化方法中的迭代优化例程,从而使我们的方法产生的结果具有多样性和可信性。在使用相同训练数据的情况下,与基于优化的范式相比,我们的范式具有更优越的生成性能和更有竞争力的生成速度。对域内和域外对象的广泛实验表明,我们的方法比 SOTA 方法有显著改进。我们将为研究目的发布代码。
{"title":"GraspDiff: Grasping Generation for Hand-Object Interaction With Multimodal Guided Diffusion.","authors":"Binghui Zuo, Zimeng Zhao, Wenqian Sun, Xiaohan Yuan, Zhipeng Yu, Yangang Wang","doi":"10.1109/TVCG.2024.3466190","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3466190","url":null,"abstract":"<p><p>Grasping generation holds significant importance in both robotics and AI-generated content. While pure network paradigms based on VAEs or GANs ensure diversity in outcomes, they often fall short of achieving plausibility. Additionally, although those two-step paradigms that first predict contact and then optimize distance yield plausible results, they are always known to be time-consuming. This paper introduces a novel paradigm powered by DDPM, accommodating diverse modalities with varying interaction granularities as its generating conditions, including 3D object, contact affordance, and image content. Our key idea is that the iterative steps inherent to diffusion models can supplant the iterative optimization routines in existing optimization methods, thereby endowing the generated results from our method with both diversity and plausibility. Using the same training data, our paradigm achieves superior generation performance and competitive generation speed compared to optimization-based paradigms. Extensive experiments on both in-domain and out-of-domain objects demonstrate that our method receives significant improvement over the SOTA method. We will release the code for research purposes.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-23DOI: 10.1109/TVCG.2024.3466554
Xuanyu Wang, Weizhan Zhang, Christian Sandor, Hongbo Fu
Augmented Reality (AR) teleconferencing allows spatially distributed users to interact with each other in 3D through agents in their own physical environments. Existing methods leveraging volumetric capturing and reconstruction can provide a high-fidelity experience but are often too complex and expensive for everyday use. Other solutions target mobile and effortless-to-setup teleconferencing on AR Head Mounted Displays (HMD). They directly transplant the conventional video conferencing onto an AR-HMD platform or use avatars to represent remote participants. However, they can only support either a high fidelity or a high level of co-presence. Moreover, the limited Field of View (FoV) of HMDs could further degrade users' immersive experience. To achieve a balance between fidelity and co-presence, we explore using life-size 2D video-based avatars (video avatars for short) in AR teleconferencing. Specifically, with the potential effect of FoV on users' perception of proximity, we first conducted a pilot study to explore the local-user-centered optimal placement of video avatars in small-group AR conversations. With the placement results, we then implement a proof-of-concept prototype of video-avatar-based teleconferencing. We conduct user evaluations with our prototype to verify its effectiveness in balancing fidelity and co-presence. Following the indication in the pilot study, we further quantitatively explore the effect of FoV size on the video avatar's optimal placement through a user study involving more FoV conditions in a VR-simulated environment. We regress placement models to serve as references for computationally determining video avatar placements in such teleconferencing applications on various existing AR HMDs and future ones with bigger FoVs.
增强现实(AR)远程会议允许空间分散的用户在自己的物理环境中通过代理进行三维互动。利用体积捕捉和重建的现有方法可以提供高保真体验,但往往过于复杂和昂贵,不适合日常使用。其他解决方案的目标是在 AR 头戴式显示器(HMD)上进行移动和易于设置的远程会议。它们直接将传统视频会议移植到 AR-HMD 平台上,或使用虚拟化身来代表远程参与者。不过,它们只能支持高保真或高水平的共同在场。此外,HMD 有限的视场(FoV)可能会进一步降低用户的沉浸式体验。为了在逼真度和共同在场之间取得平衡,我们探索在 AR 远程会议中使用真人大小的二维视频化身(简称视频化身)。具体来说,考虑到 FoV 对用户近距离感知的潜在影响,我们首先开展了一项试点研究,探索在小团体 AR 会话中以本地用户为中心的视频化身最佳放置位置。有了摆放结果,我们就实现了基于视频化身的远程会议概念验证原型。我们对原型进行了用户评估,以验证其在平衡保真度和共同在场方面的有效性。根据试点研究的结果,我们通过在 VR 模拟环境中进行更多 FoV 条件的用户研究,进一步定量探索 FoV 大小对视频头像最佳位置的影响。我们回归了放置模型,作为计算确定视频头像在现有的各种 AR HMD 和未来更大视场角的远程会议应用中的放置位置的参考。
{"title":"Real-and-Present: Investigating the Use of Life-Size 2D Video Avatars in HMD-Based AR Teleconferencing.","authors":"Xuanyu Wang, Weizhan Zhang, Christian Sandor, Hongbo Fu","doi":"10.1109/TVCG.2024.3466554","DOIUrl":"10.1109/TVCG.2024.3466554","url":null,"abstract":"<p><p>Augmented Reality (AR) teleconferencing allows spatially distributed users to interact with each other in 3D through agents in their own physical environments. Existing methods leveraging volumetric capturing and reconstruction can provide a high-fidelity experience but are often too complex and expensive for everyday use. Other solutions target mobile and effortless-to-setup teleconferencing on AR Head Mounted Displays (HMD). They directly transplant the conventional video conferencing onto an AR-HMD platform or use avatars to represent remote participants. However, they can only support either a high fidelity or a high level of co-presence. Moreover, the limited Field of View (FoV) of HMDs could further degrade users' immersive experience. To achieve a balance between fidelity and co-presence, we explore using life-size 2D video-based avatars (video avatars for short) in AR teleconferencing. Specifically, with the potential effect of FoV on users' perception of proximity, we first conducted a pilot study to explore the local-user-centered optimal placement of video avatars in small-group AR conversations. With the placement results, we then implement a proof-of-concept prototype of video-avatar-based teleconferencing. We conduct user evaluations with our prototype to verify its effectiveness in balancing fidelity and co-presence. Following the indication in the pilot study, we further quantitatively explore the effect of FoV size on the video avatar's optimal placement through a user study involving more FoV conditions in a VR-simulated environment. We regress placement models to serve as references for computationally determining video avatar placements in such teleconferencing applications on various existing AR HMDs and future ones with bigger FoVs.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-23DOI: 10.1109/TVCG.2024.3466242
Shuai Ma, Wencheng Wang, Fei Hou
Fast detection of exact point-to-point geodesic paths on meshes is still challenging with existing methods. For this, we present a method to reduce the region to be investigated on the mesh for efficiency. It is by our observation that a mesh and its simplified one are very alike so that the geodesic path between two defined points on the mesh and the geodesic path between their corresponding two points on the simplified mesh are very near to each other in the 3D Euclidean space. Thus, with the geodesic path on the simplified mesh, we can generate a region on the original mesh that contains the geodesic path on the mesh, called the search region, by which existing methods can reduce the search scope in detecting geodesic paths, and so obtaining acceleration. We demonstrate the rationale behind our proposed method. Experimental results show that we can promote existing methods well, e.g., the global exact method VTP (vertex-oriented triangle propagation) can be sped up by even over 200 times when handling large meshes. Our search region can also speed up path initialization using the Dijkstra algorithm to promote local methods, e.g., obtaining an acceleration of at least two times in our tests.
{"title":"Reducing Search Regions for Fast Detection of Exact Point-to-Point Geodesic Paths on Meshes.","authors":"Shuai Ma, Wencheng Wang, Fei Hou","doi":"10.1109/TVCG.2024.3466242","DOIUrl":"10.1109/TVCG.2024.3466242","url":null,"abstract":"<p><p>Fast detection of exact point-to-point geodesic paths on meshes is still challenging with existing methods. For this, we present a method to reduce the region to be investigated on the mesh for efficiency. It is by our observation that a mesh and its simplified one are very alike so that the geodesic path between two defined points on the mesh and the geodesic path between their corresponding two points on the simplified mesh are very near to each other in the 3D Euclidean space. Thus, with the geodesic path on the simplified mesh, we can generate a region on the original mesh that contains the geodesic path on the mesh, called the search region, by which existing methods can reduce the search scope in detecting geodesic paths, and so obtaining acceleration. We demonstrate the rationale behind our proposed method. Experimental results show that we can promote existing methods well, e.g., the global exact method VTP (vertex-oriented triangle propagation) can be sped up by even over 200 times when handling large meshes. Our search region can also speed up path initialization using the Dijkstra algorithm to promote local methods, e.g., obtaining an acceleration of at least two times in our tests.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}