Pub Date : 2024-08-14DOI: 10.1016/j.cag.2024.104045
Xi Ren , Nan Guo , Zichen Zhu , Xinbei Jiang
Category-level pose estimation offers the generalization ability to novel objects unseen during training, which has attracted increasing attention in recent years. Despite the advantage, annotating real-world data with pose label is intricate and laborious. Although using synthetic data with free annotations can greatly reduce training costs, the Synthetic-to-Real (Sim2Real) domain gap could result in a sharp performance decline on real-world test. In this paper, we propose Dual-COPE, a novel prior-based category-level object pose estimation method with dual Sim2Real domain adaptation to avoid expensive real pose annotations. First, we propose an estimation network featured with conjoined prior deformation and transformer-based matching to realize high-precision pose prediction. Upon that, an efficient dual Sim2Real domain adaptation module is further designed to reduce the feature distribution discrepancy between synthetic and real-world data both semantically and geometrically, thus maintaining superior performance on real-world test. Moreover, the adaptation module is loosely coupled with estimation network, allowing for easy integration with other methods without any additional inference overhead. Comprehensive experiments show that Dual-COPE outperforms existing unsupervised methods and achieves state-of-the-art precision under supervised settings.
{"title":"Dual-COPE: A novel prior-based category-level object pose estimation network with dual Sim2Real unsupervised domain adaptation module","authors":"Xi Ren , Nan Guo , Zichen Zhu , Xinbei Jiang","doi":"10.1016/j.cag.2024.104045","DOIUrl":"10.1016/j.cag.2024.104045","url":null,"abstract":"<div><p>Category-level pose estimation offers the generalization ability to novel objects unseen during training, which has attracted increasing attention in recent years. Despite the advantage, annotating real-world data with pose label is intricate and laborious. Although using synthetic data with free annotations can greatly reduce training costs, the Synthetic-to-Real (Sim2Real) domain gap could result in a sharp performance decline on real-world test. In this paper, we propose Dual-COPE, a novel prior-based category-level object pose estimation method with dual Sim2Real domain adaptation to avoid expensive real pose annotations. First, we propose an estimation network featured with conjoined prior deformation and transformer-based matching to realize high-precision pose prediction. Upon that, an efficient dual Sim2Real domain adaptation module is further designed to reduce the feature distribution discrepancy between synthetic and real-world data both semantically and geometrically, thus maintaining superior performance on real-world test. Moreover, the adaptation module is loosely coupled with estimation network, allowing for easy integration with other methods without any additional inference overhead. Comprehensive experiments show that Dual-COPE outperforms existing unsupervised methods and achieves state-of-the-art precision under supervised settings.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104045"},"PeriodicalIF":2.5,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-10DOI: 10.1016/j.cag.2024.104039
Dongjin Huang , Nan Wang , Xinghan Huang , Jiantao Qu , Shiyu Zhang
Text-to-3D generation is a challenging but significant task and has gained widespread attention. Its capability to rapidly generate 3D digital assets holds huge potential application value in fields such as film, video games, and virtual reality. However, current methods often face several drawbacks, including long generation times, difficulties with the multi-face Janus problem, and issues like chaotic topology and redundant structures during mesh extraction. Additionally, the lack of control over the generated results limits their utility in downstream applications. To address these problems, we propose a novel text-to-3D framework capable of generating meshes with high fidelity and controllability. Our approach can efficiently produce meshes and textures that match the text description and the desired level of detail (LOD) by specifying input text and LOD preferences. This framework consists of two stages. In the coarse stage, 3D Gaussians are employed to accelerate generation speed, and weighted positive and negative prompts from various observation perspectives are used to address the multi-face Janus problem in the generated results. In the refinement stage, mesh vertices and faces are iteratively refined to enhance surface quality and output meshes and textures that meet specified LOD requirements. Compared to the state-of-the-art text-to-3D methods, extensive experiments demonstrate that the proposed method performs better in solving the multi-face Janus problem, enabling the rapid generation of 3D meshes with enhanced prompt adherence. Furthermore, the proposed framework can generate meshes with enhanced topology, offering controllable vertices and faces with textures featuring UV adaptation to achieve multi-level-of-detail(LODs) outputs. Specifically, the proposed method can preserve the output’s relevance to input texts during simplification, making it better suited for mesh editing and rendering efficiency. User studies also indicate that our framework receives higher evaluations compared to other methods.
文本到 3D 的生成是一项具有挑战性但意义重大的任务,已受到广泛关注。它能够快速生成三维数字资产,在电影、视频游戏和虚拟现实等领域具有巨大的潜在应用价值。然而,当前的方法往往面临一些缺陷,包括生成时间长、难以解决多面杰纳斯问题,以及网格提取过程中的拓扑结构混乱和冗余结构等问题。此外,对生成结果缺乏控制也限制了其在下游应用中的实用性。为了解决这些问题,我们提出了一种新颖的文本到三维框架,能够生成高保真和可控的网格。通过指定输入文本和 LOD 偏好,我们的方法可以高效生成符合文本描述和所需细节级别(LOD)的网格和纹理。该框架由两个阶段组成。在粗化阶段,使用三维高斯来加快生成速度,并使用来自不同观察视角的加权正负提示来解决生成结果中的多面简纳斯问题。在细化阶段,对网格顶点和面进行迭代细化,以提高表面质量,并输出符合指定 LOD 要求的网格和纹理。与最先进的文本到三维方法相比,大量实验证明,所提出的方法在解决多面简努斯问题方面表现更佳,能够快速生成三维网格,并增强了及时性。此外,所提出的框架还能生成具有增强拓扑结构的网格,提供可控顶点和具有 UV 自适应纹理的面,从而实现多级细节(LOD)输出。具体来说,建议的方法可以在简化过程中保持输出与输入文本的相关性,使其更适合网格编辑和提高渲染效率。用户研究还表明,与其他方法相比,我们的框架获得了更高的评价。
{"title":"Mesh-controllable multi-level-of-detail text-to-3D generation","authors":"Dongjin Huang , Nan Wang , Xinghan Huang , Jiantao Qu , Shiyu Zhang","doi":"10.1016/j.cag.2024.104039","DOIUrl":"10.1016/j.cag.2024.104039","url":null,"abstract":"<div><p>Text-to-3D generation is a challenging but significant task and has gained widespread attention. Its capability to rapidly generate 3D digital assets holds huge potential application value in fields such as film, video games, and virtual reality. However, current methods often face several drawbacks, including long generation times, difficulties with the multi-face Janus problem, and issues like chaotic topology and redundant structures during mesh extraction. Additionally, the lack of control over the generated results limits their utility in downstream applications. To address these problems, we propose a novel text-to-3D framework capable of generating meshes with high fidelity and controllability. Our approach can efficiently produce meshes and textures that match the text description and the desired level of detail (LOD) by specifying input text and LOD preferences. This framework consists of two stages. In the coarse stage, 3D Gaussians are employed to accelerate generation speed, and weighted positive and negative prompts from various observation perspectives are used to address the multi-face Janus problem in the generated results. In the refinement stage, mesh vertices and faces are iteratively refined to enhance surface quality and output meshes and textures that meet specified LOD requirements. Compared to the state-of-the-art text-to-3D methods, extensive experiments demonstrate that the proposed method performs better in solving the multi-face Janus problem, enabling the rapid generation of 3D meshes with enhanced prompt adherence. Furthermore, the proposed framework can generate meshes with enhanced topology, offering controllable vertices and faces with textures featuring UV adaptation to achieve multi-level-of-detail(LODs) outputs. Specifically, the proposed method can preserve the output’s relevance to input texts during simplification, making it better suited for mesh editing and rendering efficiency. User studies also indicate that our framework receives higher evaluations compared to other methods.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104039"},"PeriodicalIF":2.5,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141998269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-08DOI: 10.1016/j.cag.2024.104037
ChangAn Zhu, Chris Joslin
3D face animation has been a critical component of character animation in a wide range of media since the early 90’s. The conventional process for animating a 3D face is usually keyframe-based, which is labor-intensive. Therefore, the film and game industries have started using live-action actors’ performances to animate the faces of 3D characters, the process is also known as performance-driven facial animation. At the core of performance-driven facial animation is facial motion retargeting, which transfers the source facial motions to a target 3D face. However, facial motion retargeting still has many limitations that influence its capability to further assist the facial animation process. Existing motion retargeting frameworks cannot accurately transfer the source motion’s semantic information (i.e., meaning and intensity of the motion), especially when applying the motion to non-human-like or stylized target characters. The retargeting quality relies on the parameterization of the target face, which is time-consuming to build and usually not generalizable across proportionally different faces. In this survey paper, we review the literature relating to 3D facial motion retargeting methods and the relevant topics within this area. We provide a systematic understanding of the essential modules of the retargeting pipeline, a taxonomy of the available approaches under these modules, and a thorough analysis of their advantages and limitations with research directions that could potentially contribute to this area. We also contributed a 3D character categorization matrix, which has been used in this survey and might be useful for future research to evaluate the character compatibility of their retargeting or face parameterization methods.
自上世纪 90 年代初以来,三维面部动画一直是各种媒体角色动画的重要组成部分。传统的 3D 脸部动画制作过程通常是基于关键帧的,耗费大量人力物力。因此,电影和游戏行业开始使用真人演员的表演来制作三维角色的面部动画,这一过程也被称为表演驱动的面部动画。表演驱动面部动画的核心是面部动作重定向,它将源面部动作转移到目标三维面部。然而,面部动作重定向仍有许多局限性,影响了其进一步辅助面部动画制作的能力。现有的动作重定向框架无法准确传递源动作的语义信息(即动作的含义和强度),尤其是在将动作应用于非人类或风格化的目标角色时。重定向质量依赖于目标脸部的参数化,而参数化的建立非常耗时,而且通常无法在不同比例的脸部中通用。在本调查报告中,我们回顾了与三维面部运动重定位方法有关的文献以及该领域的相关主题。我们系统地了解了重定向管道的基本模块,对这些模块下的可用方法进行了分类,并深入分析了它们的优势和局限性,以及有可能为该领域做出贡献的研究方向。我们还提供了一个三维人物分类矩阵,该矩阵已在本次调查中使用,可能对未来研究评估重定目标或人脸参数化方法的人物兼容性有用。
{"title":"A review of motion retargeting techniques for 3D character facial animation","authors":"ChangAn Zhu, Chris Joslin","doi":"10.1016/j.cag.2024.104037","DOIUrl":"10.1016/j.cag.2024.104037","url":null,"abstract":"<div><p>3D face animation has been a critical component of character animation in a wide range of media since the early 90’s. The conventional process for animating a 3D face is usually keyframe-based, which is labor-intensive. Therefore, the film and game industries have started using live-action actors’ performances to animate the faces of 3D characters, the process is also known as performance-driven facial animation. At the core of performance-driven facial animation is facial motion retargeting, which transfers the source facial motions to a target 3D face. However, facial motion retargeting still has many limitations that influence its capability to further assist the facial animation process. Existing motion retargeting frameworks cannot accurately transfer the source motion’s semantic information (i.e., meaning and intensity of the motion), especially when applying the motion to non-human-like or stylized target characters. The retargeting quality relies on the parameterization of the target face, which is time-consuming to build and usually not generalizable across proportionally different faces. In this survey paper, we review the literature relating to 3D facial motion retargeting methods and the relevant topics within this area. We provide a systematic understanding of the essential modules of the retargeting pipeline, a taxonomy of the available approaches under these modules, and a thorough analysis of their advantages and limitations with research directions that could potentially contribute to this area. We also contributed a 3D character categorization matrix, which has been used in this survey and might be useful for future research to evaluate the character compatibility of their retargeting or face parameterization methods.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104037"},"PeriodicalIF":2.5,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001729/pdfft?md5=887467d22bf59df3534253c1761b0e20&pid=1-s2.0-S0097849324001729-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141990816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Special Section contains extended and revised versions of the best papers presented at the 10th Conference on Smart Tools and Applications in Graphics (STAG 2023), held in Matera on November 16–17, 2023. Four papers were selected by appointed members from the Program Committee; extended versions were submitted and further reviewed by external experts. The result is a rich collection of papers spanning diverse domains: from shape analysis and computational geometry to advanced applications in machine learning, virtual interaction, and digital fabrication. Topics include shape modeling, functional maps, and point clouds, highlighting cutting-edge research in user experience and interaction design.
{"title":"Foreword to the Special Section on Smart Tools and Applications in Graphics (STAG 2023)","authors":"Nicola Capece , Katia Lupinetti , Ugo Erra , Francesco Banterle","doi":"10.1016/j.cag.2024.104036","DOIUrl":"10.1016/j.cag.2024.104036","url":null,"abstract":"<div><p>The Special Section contains extended and revised versions of the best papers presented at the 10th Conference on Smart Tools and Applications in Graphics (STAG 2023), held in Matera on November 16–17, 2023. Four papers were selected by appointed members from the Program Committee; extended versions were submitted and further reviewed by external experts. The result is a rich collection of papers spanning diverse domains: from shape analysis and computational geometry to advanced applications in machine learning, virtual interaction, and digital fabrication. Topics include shape modeling, functional maps, and point clouds, highlighting cutting-edge research in user experience and interaction design.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104036"},"PeriodicalIF":2.5,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142007100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-06DOI: 10.1016/j.cag.2024.104019
Haya Almaree , Roland Fischer , René Weller , Verena Uslar , Dirk Weyhe , Gabriel Zachmann
Common techniques for anatomy education in medicine include lectures and cadaver dissection, as well as the use of replicas. However, recent advances in virtual reality (VR) technology have led to the development of specialized VR tools for teaching, training, and other purposes. The use of VR technology has the potential to greatly enhance the learning experience for students. These tools offer highly interactive and engaging learning environments that allow students to inspect and interact with virtual 3D anatomical structures repeatedly, intuitively, and immersively. Additionally, multi-user VR environments can facilitate collaborative learning, which has the potential to enhance the learning experience even further. However, the effectiveness of collaborative learning in VR has not been adequately explored. Therefore, we conducted two user studies, each with participants, to evaluate the effectiveness of virtual collaboration in the context of anatomy learning, and compared it to individual learning. For our two studies, we developed a multi-user VR anatomy learning application using UE4. Our results demonstrate that our VR Anatomy Atlas offers an engaging and effective learning experience for anatomy, both individually and collaboratively. However, we did not find any significant advantages of collaborative learning in terms of learning effectiveness or motivation, despite the multi-user group spending more time in the learning environment. In fact, motivation tended to be slightly lower. Although the usability was rather high for the single-user condition, it tended to be lower for the multi-user group in one of the two studies, which may have had a slightly negative effect. However, in the second study, the usability scores were similarly high for both groups. The absence of advantages for collaborative learning may be due to the more complex environment and higher cognitive load. In consequence, more research into collaborative VR learning is needed to determine the relevant factors promoting collaborative learning in VR and the settings in which individual or collaborative learning in VR is more effective, respectively.
{"title":"Enhancing anatomy learning through collaborative VR? An advanced investigation","authors":"Haya Almaree , Roland Fischer , René Weller , Verena Uslar , Dirk Weyhe , Gabriel Zachmann","doi":"10.1016/j.cag.2024.104019","DOIUrl":"10.1016/j.cag.2024.104019","url":null,"abstract":"<div><p>Common techniques for anatomy education in medicine include lectures and cadaver dissection, as well as the use of replicas. However, recent advances in virtual reality (VR) technology have led to the development of specialized VR tools for teaching, training, and other purposes. The use of VR technology has the potential to greatly enhance the learning experience for students. These tools offer highly interactive and engaging learning environments that allow students to inspect and interact with virtual 3D anatomical structures repeatedly, intuitively, and immersively. Additionally, multi-user VR environments can facilitate collaborative learning, which has the potential to enhance the learning experience even further. However, the effectiveness of collaborative learning in VR has not been adequately explored. Therefore, we conducted two user studies, each with <span><math><mrow><msub><mrow><mi>n</mi></mrow><mrow><mn>1</mn><mo>,</mo><mn>2</mn></mrow></msub><mo>=</mo><mn>33</mn></mrow></math></span> participants, to evaluate the effectiveness of virtual collaboration in the context of anatomy learning, and compared it to individual learning. For our two studies, we developed a multi-user VR anatomy learning application using UE4. Our results demonstrate that our VR Anatomy Atlas offers an engaging and effective learning experience for anatomy, both individually and collaboratively. However, we did not find any significant advantages of collaborative learning in terms of learning effectiveness or motivation, despite the multi-user group spending more time in the learning environment. In fact, motivation tended to be slightly lower. Although the usability was rather high for the single-user condition, it tended to be lower for the multi-user group in one of the two studies, which may have had a slightly negative effect. However, in the second study, the usability scores were similarly high for both groups. The absence of advantages for collaborative learning may be due to the more complex environment and higher cognitive load. In consequence, more research into collaborative VR learning is needed to determine the relevant factors promoting collaborative learning in VR and the settings in which individual or collaborative learning in VR is more effective, respectively.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104019"},"PeriodicalIF":2.5,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142040307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-06DOI: 10.1016/j.cag.2024.104033
Rafael Romeiro , Elmar Eisemann , Ricardo Marroquim
The display coefficients that produce the signal emitted by a light field display are usually calculated to approximate the radiance over a set of sampled rays in the light field space. However, not all information contained in the light field signal is of equal importance to an observer. We propose a retinal pre-filtering of the light field samples that takes into account the image formation process of the observer to determine display coefficients that will ultimately produce better retinal images for a range of focus distances. We demonstrate a significant increase in image definition without changing the display resolution.
{"title":"Retinal pre-filtering for light field displays","authors":"Rafael Romeiro , Elmar Eisemann , Ricardo Marroquim","doi":"10.1016/j.cag.2024.104033","DOIUrl":"10.1016/j.cag.2024.104033","url":null,"abstract":"<div><p>The display coefficients that produce the signal emitted by a light field display are usually calculated to approximate the radiance over a set of sampled rays in the light field space. However, not all information contained in the light field signal is of equal importance to an observer. We propose a retinal pre-filtering of the light field samples that takes into account the image formation process of the observer to determine display coefficients that will ultimately produce better retinal images for a range of focus distances. We demonstrate a significant increase in image definition without changing the display resolution.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104033"},"PeriodicalIF":2.5,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001687/pdfft?md5=a3ada2f14da0a4ee885b3020bef4c154&pid=1-s2.0-S0097849324001687-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142040308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-06DOI: 10.1016/j.cag.2024.104034
Paulo Knob, Greice Pinho, Gabriel Fonseca Silva, Rubens Montanha, Vitor Peres, Victor Araujo, Soraia Raupp Musse
Virtual Humans (VHs) emerged over 50 years ago and have since experienced notable advancements. Initially, developing and animating VHs posed significant challenges. However, modern technology, both commercially available and freely accessible, has democratized the creation and animation processes, making them more accessible to users, programmers, and designers. These advancements have led to the replication of authentic traits and behaviors of real actors in VHs, resulting in visually convincing and behaviorally lifelike characters. As a consequence, many research areas arise as functional VH technologies. This paper explored the evolution of four areas and emerging trends related to VHs while examining some of the implications and challenges posed by highly realistic characters within these domains.
{"title":"Surveying the evolution of virtual humans expressiveness toward real humans","authors":"Paulo Knob, Greice Pinho, Gabriel Fonseca Silva, Rubens Montanha, Vitor Peres, Victor Araujo, Soraia Raupp Musse","doi":"10.1016/j.cag.2024.104034","DOIUrl":"10.1016/j.cag.2024.104034","url":null,"abstract":"<div><p>Virtual Humans (VHs) emerged over 50 years ago and have since experienced notable advancements. Initially, developing and animating VHs posed significant challenges. However, modern technology, both commercially available and freely accessible, has democratized the creation and animation processes, making them more accessible to users, programmers, and designers. These advancements have led to the replication of authentic traits and behaviors of real actors in VHs, resulting in visually convincing and behaviorally lifelike characters. As a consequence, many research areas arise as functional VH technologies. This paper explored the evolution of four areas and emerging trends related to VHs while examining some of the implications and challenges posed by highly realistic characters within these domains.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104034"},"PeriodicalIF":2.5,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141942921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-06DOI: 10.1016/j.cag.2024.104025
Lifang Chen , Yuchen Xiong , Yanjie Zhang , Ruiyin Yu , Lian Fang , Defeng Liu
Water and light interactions cause color shifts and blurring in underwater images, while dynamic underwater illumination further disrupts scene consistency, resulting in poor performance of optical image-based reconstruction methods underwater. Although Neural Radiance Fields (NeRF) can describe aqueous medium through volume rendering, applying it directly underwater may induce artifacts and floaters. We propose SP-SeaNeRF, which uses micro MLP to predict water column parameters and simulates the degradation process as a combination of real colors and scattered colors in underwater images, enhancing the model’s perception of scattering. We use illumination embedding vectors to learn the illumination bias within the images, in order to prevent dynamic illumination from disrupting scene consistency. We have introduced a novel sampling module, which focuses on maximum weight points, effectively improves training and inference speed. We evaluated our proposed method on SeaThru-NeRF and Neuralsea underwater datasets. The experimental results show that our method exhibits superior underwater color restoration ability, outperforming existing underwater NeRF in terms of reconstruction quality and speed.
{"title":"SP-SeaNeRF: Underwater Neural Radiance Fields with strong scattering perception","authors":"Lifang Chen , Yuchen Xiong , Yanjie Zhang , Ruiyin Yu , Lian Fang , Defeng Liu","doi":"10.1016/j.cag.2024.104025","DOIUrl":"10.1016/j.cag.2024.104025","url":null,"abstract":"<div><p>Water and light interactions cause color shifts and blurring in underwater images, while dynamic underwater illumination further disrupts scene consistency, resulting in poor performance of optical image-based reconstruction methods underwater. Although Neural Radiance Fields (NeRF) can describe aqueous medium through volume rendering, applying it directly underwater may induce artifacts and floaters. We propose SP-SeaNeRF, which uses micro MLP to predict water column parameters and simulates the degradation process as a combination of real colors and scattered colors in underwater images, enhancing the model’s perception of scattering. We use illumination embedding vectors to learn the illumination bias within the images, in order to prevent dynamic illumination from disrupting scene consistency. We have introduced a novel sampling module, which focuses on maximum weight points, effectively improves training and inference speed. We evaluated our proposed method on SeaThru-NeRF and Neuralsea underwater datasets. The experimental results show that our method exhibits superior underwater color restoration ability, outperforming existing underwater NeRF in terms of reconstruction quality and speed.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104025"},"PeriodicalIF":2.5,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142007101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-06DOI: 10.1016/j.cag.2024.104035
Matt Gottsacker , Hiroshi Furuya , Zubin Datta Choudhary , Austin Erickson , Ryan Schubert , Gerd Bruder , Michael P. Browne , Gregory F. Welch
This research paper explores the impact of augmented reality (AR) tracking characteristics, specifically an AR head-worn display’s tracking registration accuracy and precision, on users’ spatial abilities and subjective perceptions of trust in and reliance on the technology. Our study aims to clarify the relationships between user performance and the different behaviors users may employ based on varying degrees of trust in and reliance on AR. Our controlled experimental setup used a 360° field-of-regard search-and-selection task and combines the immersive aspects of a CAVE-like environment with AR overlays viewed with a head-worn display.
We investigated three levels of simulated AR tracking errors in terms of both accuracy and precision (+0°, +1°, +2°). We controlled for four user task behaviors that correspond to different levels of trust in and reliance on an AR system: AR-Only (only relying on AR), AR-First (prioritizing AR over real world), Real-Only (only relying on real world), and Real-First (prioritizing real world over AR). By controlling for these behaviors, our results showed that even small amounts of AR tracking errors had noticeable effects on users’ task performance, especially if they relied completely on the AR cues (AR-Only). Our results link AR tracking characteristics with user behavior, highlighting the importance of understanding these elements to improve AR technology and user satisfaction.
本研究论文探讨了增强现实(AR)追踪特性(特别是 AR 头戴式显示器的追踪注册准确性和精确度)对用户空间能力以及对该技术信任和依赖的主观感受的影响。我们的研究旨在阐明用户表现与用户基于对 AR 不同程度的信任和依赖而采取的不同行为之间的关系。我们的受控实验设置使用了 360° 视场搜索和选择任务,并将类似 CAVE 的沉浸式环境与通过头戴式显示器查看的 AR 叠加效果相结合。我们对用户的四种任务行为进行了控制,这些行为与对 AR 系统的不同信任和依赖程度相对应:AR-Only(仅依赖 AR)、AR-First(优先考虑 AR 而非真实世界)、Real-Only(仅依赖真实世界)和 Real-First(优先考虑真实世界而非 AR)。通过对这些行为进行控制,我们的结果表明,即使是少量的 AR 跟踪错误也会对用户的任务表现产生明显影响,尤其是在用户完全依赖 AR 提示的情况下(仅依赖 AR)。我们的研究结果将 AR 跟踪特征与用户行为联系起来,强调了了解这些因素对于改进 AR 技术和提高用户满意度的重要性。
{"title":"Investigating the relationships between user behaviors and tracking factors on task performance and trust in augmented reality","authors":"Matt Gottsacker , Hiroshi Furuya , Zubin Datta Choudhary , Austin Erickson , Ryan Schubert , Gerd Bruder , Michael P. Browne , Gregory F. Welch","doi":"10.1016/j.cag.2024.104035","DOIUrl":"10.1016/j.cag.2024.104035","url":null,"abstract":"<div><p>This research paper explores the impact of augmented reality (AR) tracking characteristics, specifically an AR head-worn display’s tracking registration accuracy and precision, on users’ spatial abilities and subjective perceptions of trust in and reliance on the technology. Our study aims to clarify the relationships between user performance and the different behaviors users may employ based on varying degrees of trust in and reliance on AR. Our controlled experimental setup used a 360° field-of-regard search-and-selection task and combines the immersive aspects of a CAVE-like environment with AR overlays viewed with a head-worn display.</p><p>We investigated three levels of simulated AR tracking errors in terms of both accuracy and precision (+0°, +1°, +2°). We controlled for four user task behaviors that correspond to different levels of trust in and reliance on an AR system: <em>AR-Only</em> (only relying on AR), <em>AR-First</em> (prioritizing AR over real world), <em>Real-Only</em> (only relying on real world), and <em>Real-First</em> (prioritizing real world over AR). By controlling for these behaviors, our results showed that even small amounts of AR tracking errors had noticeable effects on users’ task performance, especially if they relied completely on the AR cues (AR-Only). Our results link AR tracking characteristics with user behavior, highlighting the importance of understanding these elements to improve AR technology and user satisfaction.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104035"},"PeriodicalIF":2.5,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141964253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1016/j.cag.2024.104023
Sarah Mittenentzwei , Sophie Mlitzke , Darija Grisanova , Kai Lawonn , Bernhard Preim , Monique Meuschke
In this paper, we investigate the suitability of different visual representations of pathological growth and shrinkage using surface models of intracranial aneurysms and liver tumors. By presenting complex medical information in a visually accessible manner, audiences can better understand and comprehend the progression of pathological structures. Previous work in medical visualization provides an extensive design space for visualizing medical image data. However, determining which visualization techniques are appropriate for a general audience has not been thoroughly investigated.
We conducted a user study (n = 40) to evaluate different visual representations in terms of their suitability for solving tasks and their aesthetics. We created surface models representing the evolution of pathological structures over multiple discrete time steps and visualized them using illumination-based and illustrative techniques. Our results indicate that users’ aesthetic preferences largely coincide with their preferred visualization technique for task-solving purposes. In general, the illumination-based technique has been preferred to the illustrative technique, but the latter offers great potential for increasing the accessibility of visualizations to users with color vision deficiencies.
{"title":"Visually communicating pathological changes: A case study on the effectiveness of phong versus outline shading","authors":"Sarah Mittenentzwei , Sophie Mlitzke , Darija Grisanova , Kai Lawonn , Bernhard Preim , Monique Meuschke","doi":"10.1016/j.cag.2024.104023","DOIUrl":"10.1016/j.cag.2024.104023","url":null,"abstract":"<div><p>In this paper, we investigate the suitability of different visual representations of pathological growth and shrinkage using surface models of intracranial aneurysms and liver tumors. By presenting complex medical information in a visually accessible manner, audiences can better understand and comprehend the progression of pathological structures. Previous work in medical visualization provides an extensive design space for visualizing medical image data. However, determining which visualization techniques are appropriate for a general audience has not been thoroughly investigated.</p><p>We conducted a user study (n = 40) to evaluate different visual representations in terms of their suitability for solving tasks and their aesthetics. We created surface models representing the evolution of pathological structures over multiple discrete time steps and visualized them using illumination-based and illustrative techniques. Our results indicate that users’ aesthetic preferences largely coincide with their preferred visualization technique for task-solving purposes. In general, the illumination-based technique has been preferred to the illustrative technique, but the latter offers great potential for increasing the accessibility of visualizations to users with color vision deficiencies.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104023"},"PeriodicalIF":2.5,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001584/pdfft?md5=290698cd5eeb6b5b6aca798a4452f2fb&pid=1-s2.0-S0097849324001584-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142002321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}