Grasping generation holds significant importance in both robotics and AI-generated content. While pure network paradigms based on VAEs or GANs ensure diversity in outcomes, they often fall short of achieving plausibility. Additionally, although those two-step paradigms that first predict contact and then optimize distance yield plausible results, they are always known to be time-consuming. This paper introduces a novel paradigm powered by DDPM, accommodating diverse modalities with varying interaction granularities as its generating conditions, including 3D object, contact affordance, and image content. Our key idea is that the iterative steps inherent to diffusion models can supplant the iterative optimization routines in existing optimization methods, thereby endowing the generated results from our method with both diversity and plausibility. Using the same training data, our paradigm achieves superior generation performance and competitive generation speed compared to optimization-based paradigms. Extensive experiments on both in-domain and out-of-domain objects demonstrate that our method receives significant improvement over the SOTA method. We will release the code for research purposes.
抓取生成对于机器人和人工智能生成的内容都具有重要意义。虽然基于 VAE 或 GAN 的纯网络范式可确保结果的多样性,但它们往往无法实现可信度。此外,虽然那些先预测接触然后优化距离的两步范式能产生可信的结果,但众所周知它们总是非常耗时。本文介绍了一种由 DDPM 驱动的新型范式,该范式的生成条件包括三维物体、接触能力和图像内容等,可适应不同的交互模式和不同的交互粒度。我们的主要想法是,扩散模型固有的迭代步骤可以取代现有优化方法中的迭代优化例程,从而使我们的方法产生的结果具有多样性和可信性。在使用相同训练数据的情况下,与基于优化的范式相比,我们的范式具有更优越的生成性能和更有竞争力的生成速度。对域内和域外对象的广泛实验表明,我们的方法比 SOTA 方法有显著改进。我们将为研究目的发布代码。
{"title":"GraspDiff: Grasping Generation for Hand-Object Interaction With Multimodal Guided Diffusion.","authors":"Binghui Zuo, Zimeng Zhao, Wenqian Sun, Xiaohan Yuan, Zhipeng Yu, Yangang Wang","doi":"10.1109/TVCG.2024.3466190","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3466190","url":null,"abstract":"<p><p>Grasping generation holds significant importance in both robotics and AI-generated content. While pure network paradigms based on VAEs or GANs ensure diversity in outcomes, they often fall short of achieving plausibility. Additionally, although those two-step paradigms that first predict contact and then optimize distance yield plausible results, they are always known to be time-consuming. This paper introduces a novel paradigm powered by DDPM, accommodating diverse modalities with varying interaction granularities as its generating conditions, including 3D object, contact affordance, and image content. Our key idea is that the iterative steps inherent to diffusion models can supplant the iterative optimization routines in existing optimization methods, thereby endowing the generated results from our method with both diversity and plausibility. Using the same training data, our paradigm achieves superior generation performance and competitive generation speed compared to optimization-based paradigms. Extensive experiments on both in-domain and out-of-domain objects demonstrate that our method receives significant improvement over the SOTA method. We will release the code for research purposes.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-23DOI: 10.1109/TVCG.2024.3456329
Alexander Wyss, Gabriela Morgenshtern, Amanda Hirsch-Husler, Jurgen Bernard
In medical diagnostics of both early disease detection and routine patient care, particle-based contamination of in-vitro diagnostics consumables poses a significant threat to patients. Objective data-driven decision-making on the severity of contamination is key for reducing patient risk, while saving time and cost in quality assessment. Our collaborators introduced us to their quality control process, including particle data acquisition through image recognition, feature extraction, and attributes reflecting the production context of particles. Shortcomings in the current process are limitations in exploring thousands of images, data-driven decision making, and ineffective knowledge externalization. Following the design study methodology, our contributions are a characterization of the problem space and requirements, the development and validation of DaedalusData, a comprehensive discussion of our study's learnings, and a generalizable framework for knowledge externalization. DaedalusData is a visual analytics system that enables domain experts to explore particle contamination patterns, label particles in label alphabets, and externalize knowledge through semi-supervised label-informed data projections. The results of our case study and user study show high usability of DaedalusData and its efficient support of experts in generating comprehensive overviews of thousands of particles, labeling of large quantities of particles, and externalizing knowledge to augment the dataset further. Reflecting on our approach, we discuss insights on dataset augmentation via human knowledge externalization, and on the scalability and trade-offs that come with the adoption of this approach in practice.
{"title":"DaedalusData: Exploration, Knowledge Externalization and Labeling of Particles in Medical Manufacturing - A Design Study.","authors":"Alexander Wyss, Gabriela Morgenshtern, Amanda Hirsch-Husler, Jurgen Bernard","doi":"10.1109/TVCG.2024.3456329","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3456329","url":null,"abstract":"<p><p>In medical diagnostics of both early disease detection and routine patient care, particle-based contamination of in-vitro diagnostics consumables poses a significant threat to patients. Objective data-driven decision-making on the severity of contamination is key for reducing patient risk, while saving time and cost in quality assessment. Our collaborators introduced us to their quality control process, including particle data acquisition through image recognition, feature extraction, and attributes reflecting the production context of particles. Shortcomings in the current process are limitations in exploring thousands of images, data-driven decision making, and ineffective knowledge externalization. Following the design study methodology, our contributions are a characterization of the problem space and requirements, the development and validation of DaedalusData, a comprehensive discussion of our study's learnings, and a generalizable framework for knowledge externalization. DaedalusData is a visual analytics system that enables domain experts to explore particle contamination patterns, label particles in label alphabets, and externalize knowledge through semi-supervised label-informed data projections. The results of our case study and user study show high usability of DaedalusData and its efficient support of experts in generating comprehensive overviews of thousands of particles, labeling of large quantities of particles, and externalizing knowledge to augment the dataset further. Reflecting on our approach, we discuss insights on dataset augmentation via human knowledge externalization, and on the scalability and trade-offs that come with the adoption of this approach in practice.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-23DOI: 10.1109/TVCG.2024.3466554
Xuanyu Wang, Weizhan Zhang, Christian Sandor, Hongbo Fu
Augmented Reality (AR) teleconferencing allows spatially distributed users to interact with each other in 3D through agents in their own physical environments. Existing methods leveraging volumetric capturing and reconstruction can provide a high-fidelity experience but are often too complex and expensive for everyday use. Other solutions target mobile and effortless-to-setup teleconferencing on AR Head Mounted Displays (HMD). They directly transplant the conventional video conferencing onto an AR-HMD platform or use avatars to represent remote participants. However, they can only support either a high fidelity or a high level of co-presence. Moreover, the limited Field of View (FoV) of HMDs could further degrade users' immersive experience. To achieve a balance between fidelity and co-presence, we explore using life-size 2D video-based avatars (video avatars for short) in AR teleconferencing. Specifically, with the potential effect of FoV on users' perception of proximity, we first conducted a pilot study to explore the local-user-centered optimal placement of video avatars in small-group AR conversations. With the placement results, we then implement a proof-of-concept prototype of video-avatar-based teleconferencing. We conduct user evaluations with our prototype to verify its effectiveness in balancing fidelity and co-presence. Following the indication in the pilot study, we further quantitatively explore the effect of FoV size on the video avatar's optimal placement through a user study involving more FoV conditions in a VR-simulated environment. We regress placement models to serve as references for computationally determining video avatar placements in such teleconferencing applications on various existing AR HMDs and future ones with bigger FoVs.
增强现实(AR)远程会议允许空间分散的用户在自己的物理环境中通过代理进行三维互动。利用体积捕捉和重建的现有方法可以提供高保真体验,但往往过于复杂和昂贵,不适合日常使用。其他解决方案的目标是在 AR 头戴式显示器(HMD)上进行移动和易于设置的远程会议。它们直接将传统视频会议移植到 AR-HMD 平台上,或使用虚拟化身来代表远程参与者。不过,它们只能支持高保真或高水平的共同在场。此外,HMD 有限的视场(FoV)可能会进一步降低用户的沉浸式体验。为了在逼真度和共同在场之间取得平衡,我们探索在 AR 远程会议中使用真人大小的二维视频化身(简称视频化身)。具体来说,考虑到 FoV 对用户近距离感知的潜在影响,我们首先开展了一项试点研究,探索在小团体 AR 会话中以本地用户为中心的视频化身最佳放置位置。有了摆放结果,我们就实现了基于视频化身的远程会议概念验证原型。我们对原型进行了用户评估,以验证其在平衡保真度和共同在场方面的有效性。根据试点研究的结果,我们通过在 VR 模拟环境中进行更多 FoV 条件的用户研究,进一步定量探索 FoV 大小对视频头像最佳位置的影响。我们回归了放置模型,作为计算确定视频头像在现有的各种 AR HMD 和未来更大视场角的远程会议应用中的放置位置的参考。
{"title":"Real-and-Present: Investigating the Use of Life-Size 2D Video Avatars in HMD-Based AR Teleconferencing.","authors":"Xuanyu Wang, Weizhan Zhang, Christian Sandor, Hongbo Fu","doi":"10.1109/TVCG.2024.3466554","DOIUrl":"10.1109/TVCG.2024.3466554","url":null,"abstract":"<p><p>Augmented Reality (AR) teleconferencing allows spatially distributed users to interact with each other in 3D through agents in their own physical environments. Existing methods leveraging volumetric capturing and reconstruction can provide a high-fidelity experience but are often too complex and expensive for everyday use. Other solutions target mobile and effortless-to-setup teleconferencing on AR Head Mounted Displays (HMD). They directly transplant the conventional video conferencing onto an AR-HMD platform or use avatars to represent remote participants. However, they can only support either a high fidelity or a high level of co-presence. Moreover, the limited Field of View (FoV) of HMDs could further degrade users' immersive experience. To achieve a balance between fidelity and co-presence, we explore using life-size 2D video-based avatars (video avatars for short) in AR teleconferencing. Specifically, with the potential effect of FoV on users' perception of proximity, we first conducted a pilot study to explore the local-user-centered optimal placement of video avatars in small-group AR conversations. With the placement results, we then implement a proof-of-concept prototype of video-avatar-based teleconferencing. We conduct user evaluations with our prototype to verify its effectiveness in balancing fidelity and co-presence. Following the indication in the pilot study, we further quantitatively explore the effect of FoV size on the video avatar's optimal placement through a user study involving more FoV conditions in a VR-simulated environment. We regress placement models to serve as references for computationally determining video avatar placements in such teleconferencing applications on various existing AR HMDs and future ones with bigger FoVs.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-23DOI: 10.1109/TVCG.2024.3466242
Shuai Ma, Wencheng Wang, Fei Hou
Fast detection of exact point-to-point geodesic paths on meshes is still challenging with existing methods. For this, we present a method to reduce the region to be investigated on the mesh for efficiency. It is by our observation that a mesh and its simplified one are very alike so that the geodesic path between two defined points on the mesh and the geodesic path between their corresponding two points on the simplified mesh are very near to each other in the 3D Euclidean space. Thus, with the geodesic path on the simplified mesh, we can generate a region on the original mesh that contains the geodesic path on the mesh, called the search region, by which existing methods can reduce the search scope in detecting geodesic paths, and so obtaining acceleration. We demonstrate the rationale behind our proposed method. Experimental results show that we can promote existing methods well, e.g., the global exact method VTP (vertex-oriented triangle propagation) can be sped up by even over 200 times when handling large meshes. Our search region can also speed up path initialization using the Dijkstra algorithm to promote local methods, e.g., obtaining an acceleration of at least two times in our tests.
{"title":"Reducing Search Regions for Fast Detection of Exact Point-to-Point Geodesic Paths on Meshes.","authors":"Shuai Ma, Wencheng Wang, Fei Hou","doi":"10.1109/TVCG.2024.3466242","DOIUrl":"10.1109/TVCG.2024.3466242","url":null,"abstract":"<p><p>Fast detection of exact point-to-point geodesic paths on meshes is still challenging with existing methods. For this, we present a method to reduce the region to be investigated on the mesh for efficiency. It is by our observation that a mesh and its simplified one are very alike so that the geodesic path between two defined points on the mesh and the geodesic path between their corresponding two points on the simplified mesh are very near to each other in the 3D Euclidean space. Thus, with the geodesic path on the simplified mesh, we can generate a region on the original mesh that contains the geodesic path on the mesh, called the search region, by which existing methods can reduce the search scope in detecting geodesic paths, and so obtaining acceleration. We demonstrate the rationale behind our proposed method. Experimental results show that we can promote existing methods well, e.g., the global exact method VTP (vertex-oriented triangle propagation) can be sped up by even over 200 times when handling large meshes. Our search region can also speed up path initialization using the Dijkstra algorithm to promote local methods, e.g., obtaining an acceleration of at least two times in our tests.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The importance of data charts is self-evident, given their ability to express complex data in a simple format that facilitates quick and easy comparisons, analysis, and consumption. However, the inherent visual nature of the charts creates barriers for people with visual impairments to reap the associated benefts to the same extent as their sighted peers. While extant research has predominantly focused on understanding and addressing these barriers for blind screen reader users, the needs of low-vision screen magnifer users have been largely overlooked. In an interview study, almost all low-vision participants stated that it was challenging to interact with data charts on small screen devices such as smartphones and tablets, even though they could technically "see" the chart content. They ascribed these challenges mainly to the magnifcation-induced loss of visual context that connected data points with each other and also with chart annotations, e.g., axis values. In this paper, we present a method that addresses this problem by automatically transforming charts that are typically non-interactive images into personalizable interactive charts which allow selective viewing of desired data points and preserve visual context as much as possible under screen enlargement. We evaluated our method in a usability study with 26 low-vision participants, who all performed a set of representative chart-related tasks under different study conditions. In the study, we observed that our method signifcantly improved the usability of charts over both the status quo screen magnifer and a state-of-the-art space compaction-based solution.
{"title":"Towards Enhancing Low Vision Usability of Data Charts on Smartphones.","authors":"Yash Prakash, Pathan Aseef Khan, Akshay Kolgar Nayak, Sampath Jayarathna, Hae-Na Lee, Vikas Ashok","doi":"10.1109/TVCG.2024.3456348","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3456348","url":null,"abstract":"<p><p>The importance of data charts is self-evident, given their ability to express complex data in a simple format that facilitates quick and easy comparisons, analysis, and consumption. However, the inherent visual nature of the charts creates barriers for people with visual impairments to reap the associated benefts to the same extent as their sighted peers. While extant research has predominantly focused on understanding and addressing these barriers for blind screen reader users, the needs of low-vision screen magnifer users have been largely overlooked. In an interview study, almost all low-vision participants stated that it was challenging to interact with data charts on small screen devices such as smartphones and tablets, even though they could technically \"see\" the chart content. They ascribed these challenges mainly to the magnifcation-induced loss of visual context that connected data points with each other and also with chart annotations, e.g., axis values. In this paper, we present a method that addresses this problem by automatically transforming charts that are typically non-interactive images into personalizable interactive charts which allow selective viewing of desired data points and preserve visual context as much as possible under screen enlargement. We evaluated our method in a usability study with 26 low-vision participants, who all performed a set of representative chart-related tasks under different study conditions. In the study, we observed that our method signifcantly improved the usability of charts over both the status quo screen magnifer and a state-of-the-art space compaction-based solution.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142304794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-20DOI: 10.1109/TVCG.2024.3456311
Jinrui Wang, Xinhuan Shu, Benjamin Bach, Uta Hinrichs
This paper defines, analyzes, and discusses the emerging genre of visualization atlases. We currently witness an increase in web-based, data-driven initiatives that call themselves "atlases" while explaining complex, contemporary issues through data and visualizations: climate change, sustainability, AI, or cultural discoveries. To understand this emerging genre and inform their design, study, and authoring support, we conducted a systematic analysis of 33 visualization atlases and semi-structured interviews with eight visualization atlas creators. Based on our results, we contribute (1) a definition of a visualization atlas as a compendium of (web) pages aimed at explaining and supporting exploration of data about a dedicated topic through data, visualizations and narration. (2) a set of design patterns of 8 design dimensions, (3) insights into the atlas creation from interviews and (4) the definition of 5 visualization atlas genres. We found that visualization atlases are unique in the way they combine i) exploratory visualization, ii) narrative elements from data-driven storytelling and iii) structured navigation mechanisms. They target a wide range of audiences with different levels of domain knowledge, acting as tools for study, communication, and discovery. We conclude with a discussion of current design practices and emerging questions around the ethics and potential real-world impact of visualization atlases, aimed to inform the design and study of visualization atlases.
{"title":"Visualization Atlases: Explaining and Exploring Complex Topics through Data, Visualization, and Narration.","authors":"Jinrui Wang, Xinhuan Shu, Benjamin Bach, Uta Hinrichs","doi":"10.1109/TVCG.2024.3456311","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3456311","url":null,"abstract":"<p><p>This paper defines, analyzes, and discusses the emerging genre of visualization atlases. We currently witness an increase in web-based, data-driven initiatives that call themselves \"atlases\" while explaining complex, contemporary issues through data and visualizations: climate change, sustainability, AI, or cultural discoveries. To understand this emerging genre and inform their design, study, and authoring support, we conducted a systematic analysis of 33 visualization atlases and semi-structured interviews with eight visualization atlas creators. Based on our results, we contribute (1) a definition of a visualization atlas as a compendium of (web) pages aimed at explaining and supporting exploration of data about a dedicated topic through data, visualizations and narration. (2) a set of design patterns of 8 design dimensions, (3) insights into the atlas creation from interviews and (4) the definition of 5 visualization atlas genres. We found that visualization atlases are unique in the way they combine i) exploratory visualization, ii) narrative elements from data-driven storytelling and iii) structured navigation mechanisms. They target a wide range of audiences with different levels of domain knowledge, acting as tools for study, communication, and discovery. We conclude with a discussion of current design practices and emerging questions around the ethics and potential real-world impact of visualization atlases, aimed to inform the design and study of visualization atlases.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142304763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-19DOI: 10.1109/TVCG.2024.3463800
Zhenxing Cui, Lu Chen, Yunhai Wang, Daniel Haehn, Yong Wang, Hanspeter Pfister
This paper presents a systematic study of the generalization of convolutional neural networks (CNNs) and humans on relational reasoning tasks with bar charts. We first revisit previous experiments on graphical perception and update the benchmark performance of CNNs. We then test the generalization performance of CNNs on a classic relational reasoning task: estimating bar length ratios in a bar chart, by progressively perturbing the standard visualizations. We further conduct a user study to compare the performance of CNNs and humans. Our results show that CNNs outperform humans only when the training and test data have the same visual encodings. Otherwise, they may perform worse. We also find that CNNs are sensitive to perturbations in various visual encodings, regardless of their relevance to the target bars. Yet, humans are mainly influenced by bar lengths. Our study suggests that robust relational reasoning with visualizations is challenging for CNNs. Improving CNNs' generalization performance may require training them to better recognize task-related visual properties.
{"title":"Generalization of CNNs on Relational Reasoning With Bar Charts.","authors":"Zhenxing Cui, Lu Chen, Yunhai Wang, Daniel Haehn, Yong Wang, Hanspeter Pfister","doi":"10.1109/TVCG.2024.3463800","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3463800","url":null,"abstract":"<p><p>This paper presents a systematic study of the generalization of convolutional neural networks (CNNs) and humans on relational reasoning tasks with bar charts. We first revisit previous experiments on graphical perception and update the benchmark performance of CNNs. We then test the generalization performance of CNNs on a classic relational reasoning task: estimating bar length ratios in a bar chart, by progressively perturbing the standard visualizations. We further conduct a user study to compare the performance of CNNs and humans. Our results show that CNNs outperform humans only when the training and test data have the same visual encodings. Otherwise, they may perform worse. We also find that CNNs are sensitive to perturbations in various visual encodings, regardless of their relevance to the target bars. Yet, humans are mainly influenced by bar lengths. Our study suggests that robust relational reasoning with visualizations is challenging for CNNs. Improving CNNs' generalization performance may require training them to better recognize task-related visual properties.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142304740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-19DOI: 10.1109/TVCG.2024.3464738
Riccardo Monica, Dario Lodi Rizzini, Jacopo Aleotti
Head-mounted displays (HMDs) in room-scale virtual reality are usually tracked using inside-out visual SLAM algorithms. Alternatively, to track the motion of the HMD with respect to a fixed real-world reference frame, an outside-in instrumentation like a motion capture system can be adopted. However, outside-in tracking systems may temporarily lose tracking as they suffer by occlusion and blind spots. A possible solution is to adopt a hybrid approach where the inside-out tracker of the HMD is augmented with an outside-in sensing system. On the other hand, when the tracking signal of the outside-in system is recovered after a loss of tracking the transition from inside-out tracking to hybrid tracking may generate a discontinuity, i.e a sudden change of the virtual viewpoint, that can be uncomfortable for the user. Therefore, hybrid tracking solutions for HMDs require advanced sensor fusion algorithms to obtain a smooth transition. This work proposes a method for hybrid tracking of a HMD with smooth transitions based on an adaptive complementary filter. The proposed approach can be configured with several parameters that determine a trade-off between user experience and tracking error. A user study was carried out in a room-scale virtual reality environment, where users carried out two different tasks while multiple signal tracking losses of the outside-in sensor system occurred. The results show that the proposed approach improves user experience compared to a standard Extended Kalman Filter, and that tracking error is lower compared to a state-of-the-art complementary filter when configured for the same quality of user experience.
室内虚拟现实中的头戴式显示器(HMD)通常使用内向外视觉 SLAM 算法进行跟踪。另外,为了跟踪头戴式显示器相对于固定现实世界参考帧的运动,也可以采用运动捕捉系统等外入式仪器。然而,外入式跟踪系统可能会暂时失去跟踪能力,因为它们会受到遮挡和盲点的影响。一种可行的解决方案是采用混合方法,即在 HMD 的由内向外跟踪器上增加一个由外向内的传感系统。另一方面,当外入式系统的跟踪信号在失去跟踪后恢复时,从内向外跟踪到混合跟踪的过渡可能会产生不连续性,即虚拟视点的突然变化,这会让用户感到不舒服。因此,用于 HMD 的混合跟踪解决方案需要先进的传感器融合算法来实现平稳过渡。本作品提出了一种基于自适应互补滤波器的平滑过渡 HMD 混合跟踪方法。所提出的方法可配置多个参数,这些参数决定了用户体验与跟踪误差之间的权衡。在房间规模的虚拟现实环境中进行了一项用户研究,用户在执行两项不同任务的同时,外入式传感器系统出现了多个信号跟踪损失。结果表明,与标准的扩展卡尔曼滤波器相比,所提出的方法改善了用户体验,而且在配置相同的用户体验质量时,与最先进的互补滤波器相比,跟踪误差更小。
{"title":"Adaptive Complementary Filter for Hybrid Inside-Out Outside-In HMD Tracking With Smooth Transitions.","authors":"Riccardo Monica, Dario Lodi Rizzini, Jacopo Aleotti","doi":"10.1109/TVCG.2024.3464738","DOIUrl":"10.1109/TVCG.2024.3464738","url":null,"abstract":"<p><p>Head-mounted displays (HMDs) in room-scale virtual reality are usually tracked using inside-out visual SLAM algorithms. Alternatively, to track the motion of the HMD with respect to a fixed real-world reference frame, an outside-in instrumentation like a motion capture system can be adopted. However, outside-in tracking systems may temporarily lose tracking as they suffer by occlusion and blind spots. A possible solution is to adopt a hybrid approach where the inside-out tracker of the HMD is augmented with an outside-in sensing system. On the other hand, when the tracking signal of the outside-in system is recovered after a loss of tracking the transition from inside-out tracking to hybrid tracking may generate a discontinuity, i.e a sudden change of the virtual viewpoint, that can be uncomfortable for the user. Therefore, hybrid tracking solutions for HMDs require advanced sensor fusion algorithms to obtain a smooth transition. This work proposes a method for hybrid tracking of a HMD with smooth transitions based on an adaptive complementary filter. The proposed approach can be configured with several parameters that determine a trade-off between user experience and tracking error. A user study was carried out in a room-scale virtual reality environment, where users carried out two different tasks while multiple signal tracking losses of the outside-in sensor system occurred. The results show that the proposed approach improves user experience compared to a standard Extended Kalman Filter, and that tracking error is lower compared to a state-of-the-art complementary filter when configured for the same quality of user experience.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142304718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-19DOI: 10.1109/TVCG.2024.3456395
Yu Qin, Brittany Terese Fasy, Carola Wenk, Brian Summa
Merge trees are a valuable tool in the scientific visualization of scalar fields; however, current methods for merge tree comparisons are computationally expensive, primarily due to the exhaustive matching between tree nodes. To address this challenge, we introduce the Merge Tree Neural Network (MTNN), a learned neural network model designed for merge tree comparison. The MTNN enables rapid and high-quality similarity computation. We first demonstrate how to train graph neural networks, which emerged as effective encoders for graphs, in order to produce embeddings of merge trees in vector spaces for efficient similarity comparison. Next, we formulate the novel MTNN model that further improves the similarity comparisons by integrating the tree and node embeddings with a new topological attention mechanism. We demonstrate the effectiveness of our model on real-world data in different domains and examine our model's generalizability across various datasets. Our experimental analysis demonstrates our approach's superiority in accuracy and efficiency. In particular, we speed up the prior state-of-the-art by more than 100× on the benchmark datasets while maintaining an error rate below 0.1%.
{"title":"Rapid and Precise Topological Comparison with Merge Tree Neural Networks.","authors":"Yu Qin, Brittany Terese Fasy, Carola Wenk, Brian Summa","doi":"10.1109/TVCG.2024.3456395","DOIUrl":"https://doi.org/10.1109/TVCG.2024.3456395","url":null,"abstract":"<p><p>Merge trees are a valuable tool in the scientific visualization of scalar fields; however, current methods for merge tree comparisons are computationally expensive, primarily due to the exhaustive matching between tree nodes. To address this challenge, we introduce the Merge Tree Neural Network (MTNN), a learned neural network model designed for merge tree comparison. The MTNN enables rapid and high-quality similarity computation. We first demonstrate how to train graph neural networks, which emerged as effective encoders for graphs, in order to produce embeddings of merge trees in vector spaces for efficient similarity comparison. Next, we formulate the novel MTNN model that further improves the similarity comparisons by integrating the tree and node embeddings with a new topological attention mechanism. We demonstrate the effectiveness of our model on real-world data in different domains and examine our model's generalizability across various datasets. Our experimental analysis demonstrates our approach's superiority in accuracy and efficiency. In particular, we speed up the prior state-of-the-art by more than 100× on the benchmark datasets while maintaining an error rate below 0.1%.</p>","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142304778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Text entry with word-gesture keyboards (WGK) is emerging as a popular method and becoming a key interaction for Extended Reality (XR). However, the diversity of interaction modes, keyboard sizes, and visual feedback in these environments introduces divergent word-gesture trajectory data patterns, thus leading to complexity in decoding trajectories into text. Template-matching decoding methods, such as SHARK2 [32], are commonly used for these WGK systems because they are easy to implement and configure. However, these methods are susceptible to decoding inaccuracies for noisy trajectories. While conventional neural-network-based decoders (neural decoders) trained on word-gesture trajectory data have been proposed to improve accuracy, they have their own limitations: they require extensive data for training and deep-learning expertise for implementation. To address these challenges, we propose a novel solution that combines ease of implementation with high decoding accuracy: a generalizable neural decoder enabled by pre-training on large-scale coarsely discretized word-gesture trajectories. This approach produces a ready-to-use WGK decoder that is generalizable across mid-air and on-surface WGK systems in augmented reality (AR) and virtual reality (VR), which is evident by a robust average Top-4 accuracy of 90.4% on four diverse datasets. It significantly outperforms SHARK2 with a 37.2% enhancement and surpasses the conventional neural decoder by 7.4%. Moreover, the Pre-trained Neural Decoder's size is only 4 MB after quantization, without sacrificing accuracy, and it can operate in real-time, executing in just 97 milliseconds on Quest 3.
{"title":"Gesture2Text: A Generalizable Decoder for Word-Gesture Keyboards in XR Through Trajectory Coarse Discretization and Pre-Training","authors":"Junxiao Shen;Khadija Khaldi;Enmin Zhou;Hemant Bhaskar Surale;Amy Karlson","doi":"10.1109/TVCG.2024.3456198","DOIUrl":"10.1109/TVCG.2024.3456198","url":null,"abstract":"Text entry with word-gesture keyboards (WGK) is emerging as a popular method and becoming a key interaction for Extended Reality (XR). However, the diversity of interaction modes, keyboard sizes, and visual feedback in these environments introduces divergent word-gesture trajectory data patterns, thus leading to complexity in decoding trajectories into text. Template-matching decoding methods, such as SHARK2 [32], are commonly used for these WGK systems because they are easy to implement and configure. However, these methods are susceptible to decoding inaccuracies for noisy trajectories. While conventional neural-network-based decoders (neural decoders) trained on word-gesture trajectory data have been proposed to improve accuracy, they have their own limitations: they require extensive data for training and deep-learning expertise for implementation. To address these challenges, we propose a novel solution that combines ease of implementation with high decoding accuracy: a generalizable neural decoder enabled by pre-training on large-scale coarsely discretized word-gesture trajectories. This approach produces a ready-to-use WGK decoder that is generalizable across mid-air and on-surface WGK systems in augmented reality (AR) and virtual reality (VR), which is evident by a robust average Top-4 accuracy of 90.4% on four diverse datasets. It significantly outperforms SHARK2 with a 37.2% enhancement and surpasses the conventional neural decoder by 7.4%. Moreover, the Pre-trained Neural Decoder's size is only 4 MB after quantization, without sacrificing accuracy, and it can operate in real-time, executing in just 97 milliseconds on Quest 3.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"30 11","pages":"7118-7128"},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142304741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}