Computers & Graphics-Uk最新文献_第6页

Enhancing semantic mapping in text-to-image diffusion via Gather-and-Bind 通过聚合绑定增强文本到图像扩散中的语义映射

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-11-07 DOI: 10.1016/j.cag.2024.104118

Huan Fu, Guoqing Cheng

Text-to-image synthesis is a challenging task that aims to generate realistic and diverse images from natural language descriptions. However, existing text-to-image diffusion models (e.g., Stable Diffusion) sometimes fail to satisfy the semantic descriptions of the users, especially when the prompts contain multiple concepts or modifiers such as colors. By visualizing the cross-attention maps of the Stable Diffusion model during the denoising process, we find that one of the concepts has a very scattered attention map, which cannot form a whole and gradually gets ignored. Moreover, the attention maps of the modifiers are hard to overlap with the corresponding concepts, resulting in incorrect semantic mapping. To address this issue, we propose a Gather-and-Bind method that intervenes in the cross-attention maps during the denoising process to alleviate the catastrophic forgetting and attribute binding problems without any pre-training. Specifically, we first use information entropy to measure the dispersion degree of the cross-attention maps and construct an information entropy loss to gather these scattered attention maps, which eventually captures all the concepts in the generated output. Furthermore, we construct an attribute binding loss that minimizes the distance between the attention maps of the attributes and their corresponding concepts, which enables the model to establish correct semantic mapping and significantly improves the performance of the baseline model. We conduct extensive experiments on public datasets and demonstrate that our method can better capture the semantic information of the input prompts. Code is available at https://github.com/huan085128/Gather-and-Bind.

文本到图像的合成是一项具有挑战性的任务，其目的是根据自然语言描述生成逼真、多样的图像。然而，现有的文本到图像扩散模型（如稳定扩散模型）有时无法满足用户的语义描述，尤其是当提示包含多个概念或修饰词（如颜色）时。通过可视化稳定扩散模型在去噪过程中的交叉注意图，我们发现其中一个概念的注意图非常分散，无法形成一个整体，逐渐被忽略。此外，修饰词的注意图很难与相应的概念重叠，导致语义映射不正确。针对这一问题，我们提出了一种 "聚合与绑定"（Gather-and-Bind）方法，即在去噪过程中对交叉注意力图进行干预，以缓解灾难性遗忘和属性绑定问题，而无需任何预训练。具体来说，我们首先使用信息熵来测量交叉注意力图的分散程度，并构建一个信息熵损失来收集这些分散的注意力图，最终在生成的输出中捕获所有概念。此外，我们还构建了一种属性绑定损失，使属性注意图与其对应概念之间的距离最小化，从而使模型能够建立正确的语义映射，并显著提高了基线模型的性能。我们在公共数据集上进行了大量实验，证明我们的方法能更好地捕捉输入提示的语义信息。代码见 https://github.com/huan085128/Gather-and-Bind。

{"title":"Enhancing semantic mapping in text-to-image diffusion via Gather-and-Bind","authors":"Huan Fu, Guoqing Cheng","doi":"10.1016/j.cag.2024.104118","DOIUrl":"10.1016/j.cag.2024.104118","url":null,"abstract":"<div><div>Text-to-image synthesis is a challenging task that aims to generate realistic and diverse images from natural language descriptions. However, existing text-to-image diffusion models (e.g., Stable Diffusion) sometimes fail to satisfy the semantic descriptions of the users, especially when the prompts contain multiple concepts or modifiers such as colors. By visualizing the cross-attention maps of the Stable Diffusion model during the denoising process, we find that one of the concepts has a very scattered attention map, which cannot form a whole and gradually gets ignored. Moreover, the attention maps of the modifiers are hard to overlap with the corresponding concepts, resulting in incorrect semantic mapping. To address this issue, we propose a Gather-and-Bind method that intervenes in the cross-attention maps during the denoising process to alleviate the catastrophic forgetting and attribute binding problems without any pre-training. Specifically, we first use information entropy to measure the dispersion degree of the cross-attention maps and construct an information entropy loss to gather these scattered attention maps, which eventually captures all the concepts in the generated output. Furthermore, we construct an attribute binding loss that minimizes the distance between the attention maps of the attributes and their corresponding concepts, which enables the model to establish correct semantic mapping and significantly improves the performance of the baseline model. We conduct extensive experiments on public datasets and demonstrate that our method can better capture the semantic information of the input prompts. Code is available at <span><span>https://github.com/huan085128/Gather-and-Bind</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104118"},"PeriodicalIF":2.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning geometric complexes for 3D shape classification 学习用于三维形状分类的几何复合物

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-11-07 DOI: 10.1016/j.cag.2024.104119

Prachi Kudeshia, Muhammad Altaf Agowun, Jiju Poovvancheri

Geometry and topology are vital elements in discerning and describing the shape of an object. Geometric complexes constructed on the point cloud of a 3D object capture the geometry as well as topological features of the underlying shape space. Leveraging this aspect of geometric complexes, we present an attention-based dual stream graph neural network (DS-GNN) for 3D shape classification. In the first stream of DS-GNN, we introduce spiked skeleton complex (SSC) for learning the shape patterns through comprehensive feature integration of the point cloud’s core structure. SSC is a novel and concise geometric complex comprising principal plane-based cluster centroids complemented with per-centroid spatial locality information. The second stream of DS-GNN consists of alpha complex which facilitates the learning of geometric patterns embedded in the object shapes via higher dimensional simplicial attention. To evaluate the model’s response to different shape topologies, we perform a persistent homology-based object segregation that groups the objects based on the underlying topological space characteristics quantified through the second Betti number. Our experimental study on benchmark datasets such as ModelNet40 and ScanObjectNN shows the potential of the proposed GNN for the classification of 3D shapes with different topologies and offers an alternative to the current evaluation practices in this domain.

几何和拓扑是辨别和描述物体形状的重要元素。在三维物体点云上构建的几何复合体可以捕捉底层形状空间的几何和拓扑特征。利用几何复合物的这一特点，我们提出了一种基于注意力的双流图神经网络（DS-GNN），用于三维形状分类。在 DS-GNN 的第一流中，我们引入了尖峰骨架复合体（SSC），通过对点云核心结构的综合特征整合来学习形状模式。SSC 是一种新颖简洁的几何复合体，由基于主平面的聚类中心点和每个中心点的空间位置信息组成。DS-GNN 的第二流由阿尔法复合体组成，它通过高维简约注意力促进学习嵌入在物体形状中的几何模式。为了评估模型对不同形状拓扑结构的响应，我们进行了基于持久同源性的物体分离，根据通过第二个贝蒂数量化的底层拓扑空间特征对物体进行分组。我们在 ModelNet40 和 ScanObjectNN 等基准数据集上进行的实验研究表明，所提出的 GNN 具有对具有不同拓扑结构的三维形状进行分类的潜力，并为该领域当前的评估实践提供了一种替代方案。

{"title":"Learning geometric complexes for 3D shape classification","authors":"Prachi Kudeshia, Muhammad Altaf Agowun, Jiju Poovvancheri","doi":"10.1016/j.cag.2024.104119","DOIUrl":"10.1016/j.cag.2024.104119","url":null,"abstract":"<div><div>Geometry and topology are vital elements in discerning and describing the shape of an object. Geometric complexes constructed on the point cloud of a 3D object capture the geometry as well as topological features of the underlying shape space. Leveraging this aspect of geometric complexes, we present an attention-based dual stream graph neural network (DS-GNN) for 3D shape classification. In the first stream of DS-GNN, we introduce spiked skeleton complex (SSC) for learning the shape patterns through comprehensive feature integration of the point cloud’s core structure. SSC is a novel and concise geometric complex comprising principal plane-based cluster centroids complemented with per-centroid spatial locality information. The second stream of DS-GNN consists of alpha complex which facilitates the learning of geometric patterns embedded in the object shapes via higher dimensional simplicial attention. To evaluate the model’s response to different shape topologies, we perform a persistent homology-based object segregation that groups the objects based on the underlying topological space characteristics quantified through the second Betti number. Our experimental study on benchmark datasets such as ModelNet40 and ScanObjectNN shows the potential of the proposed GNN for the classification of 3D shapes with different topologies and offers an alternative to the current evaluation practices in this domain.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104119"},"PeriodicalIF":2.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RenalViz: Visual analysis of cohorts with chronic kidney disease RenalViz：慢性肾脏病队列的可视化分析

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-11-07 DOI: 10.1016/j.cag.2024.104120

Markus Höhn , Sarah Schwindt-Drews , Sara Hahn , Sammy Patyna , Stefan Büttner , Jörn Kohlhammer

Chronic Kidney Disease (CKD) is a prominent health problem. Progressive CKD leads to impaired kidney function with decreased ability to filter the patients’ blood, concluding in multiple complications, like heart disease and ultimately death from the disease. In previous work, we developed a prototype to support nephrologists in gaining an overview of their CKD patients. The prototype visualizes the patients in cohorts according to their pairwise similarity. The user can interactively modify the similarity by changing the underlying weights of the included features. The work in this paper expands upon this previous work by the enlargement of the data set and the user interface of the application. With a focus on the distinction between individual CKD classes we introduce a color scheme used throughout all visualization. Furthermore, the visualizations were adopted to display the data of several patients at once. This also involved the option to align the visualizations to sentinel points, such as the onset of a particular CKD stage, in order to quantify the progression of all selected patients in relation to this event. The prototype was developed in response to the identified potential for improvement of the earlier application. An additional user study concerning the intuitiveness and usability confirms good results for the prototype and leads to the assessment of an easy-to-use approach.

慢性肾脏病（CKD）是一个突出的健康问题。渐进性慢性肾脏病会导致肾功能受损，患者血液过滤能力下降，引发多种并发症，如心脏病，最终导致患者死亡。在之前的工作中，我们开发了一个原型，帮助肾科医生全面了解他们的慢性肾功能衰竭患者。该原型根据成对的相似性将患者可视化为不同的队列。用户可以通过改变所含特征的基本权重来交互式地修改相似度。本文中的工作通过扩大数据集和应用程序的用户界面对之前的工作进行了扩展。为了重点区分各个 CKD 类别，我们引入了一种颜色方案，并在所有可视化过程中使用。此外，我们还采用了可视化技术，以同时显示多个患者的数据。这还涉及到将可视化与前哨点（如特定 CKD 阶段的开始）对齐的选项，以便量化所有选定患者与该事件相关的进展情况。原型是针对早期应用的改进潜力而开发的。另外一项关于直观性和可用性的用户研究证实了原型的良好效果，并对易于使用的方法进行了评估。

{"title":"RenalViz: Visual analysis of cohorts with chronic kidney disease","authors":"Markus Höhn , Sarah Schwindt-Drews , Sara Hahn , Sammy Patyna , Stefan Büttner , Jörn Kohlhammer","doi":"10.1016/j.cag.2024.104120","DOIUrl":"10.1016/j.cag.2024.104120","url":null,"abstract":"<div><div>Chronic Kidney Disease (CKD) is a prominent health problem. Progressive CKD leads to impaired kidney function with decreased ability to filter the patients’ blood, concluding in multiple complications, like heart disease and ultimately death from the disease. In previous work, we developed a prototype to support nephrologists in gaining an overview of their CKD patients. The prototype visualizes the patients in cohorts according to their pairwise similarity. The user can interactively modify the similarity by changing the underlying weights of the included features. The work in this paper expands upon this previous work by the enlargement of the data set and the user interface of the application. With a focus on the distinction between individual CKD classes we introduce a color scheme used throughout all visualization. Furthermore, the visualizations were adopted to display the data of several patients at once. This also involved the option to align the visualizations to sentinel points, such as the onset of a particular CKD stage, in order to quantify the progression of all selected patients in relation to this event. The prototype was developed in response to the identified potential for improvement of the earlier application. An additional user study concerning the intuitiveness and usability confirms good results for the prototype and leads to the assessment of an easy-to-use approach.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104120"},"PeriodicalIF":2.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive 360° video timeline exploration in VR environment 在 VR 环境中探索自适应 360° 视频时间轴

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-11-04 DOI: 10.1016/j.cag.2024.104108

Mengmeng Yu, Chongke Bi

Timeline control is a crucial interaction during video viewing, aiding users in quickly locating or jumping to specific points in the video playback, especially when dealing with lengthy content. 360°videos, with their ability to offer an all-encompassing view, have gradually gained popularity, providing a more immersive experience compared to videos with a single perspective. While most 360°videos are currently displayed on two-dimensional screens, the timeline design has largely remained similar to that of conventional videos. However, virtual reality (VR) headsets provide a more immersive viewing experience for 360°videos and offer additional dimensions for timeline design. In this paper, we initially explored 6 timeline design styles by varying the shape and interaction distance of the timeline, aiming to discover designs more suitable for the VR environment of 360°videos. Subsequently, we introduced an adaptive timeline display mechanism based on eye gaze sequences to optimize the timeline, addressing issues like obstructing the view and causing distractions when the timeline is consistently visible. Through two studies, we first demonstrated that in the 360°space, the three-dimensional timeline performs better in terms of usability than the two-dimensional one, and the reachable timeline has advantages in performance and experience over the distant one. Secondly, we verified that, without compromising interaction efficiency and system usability, the adaptive display timeline gained more user preference due to its accurate prediction of user timeline needs.

时间轴控制是视频观看过程中的重要交互方式，可帮助用户快速定位或跳转到视频播放中的特定点，尤其是在处理冗长的内容时。360° 视频能够提供全方位的视角，与单一视角的视频相比，它能提供更身临其境的体验，因此逐渐受到人们的欢迎。虽然目前大多数 360° 视频都是在二维屏幕上显示的，但时间线的设计与传统视频基本保持相似。然而，虚拟现实（VR）头显为 360° 视频提供了更身临其境的观看体验，并为时间轴设计提供了更多的维度。在本文中，我们通过改变时间轴的形状和交互距离，初步探索了 6 种时间轴设计风格，旨在发现更适合 360° 视频的 VR 环境的设计。随后，我们引入了一种基于眼睛注视序列的自适应时间轴显示机制，以优化时间轴，解决时间轴持续可见时阻碍视线和分散注意力等问题。通过两项研究，我们首先证明了在 360° 空间中，三维时间轴的可用性优于二维时间轴，可触及时间轴的性能和体验优于远距离时间轴。其次，我们验证了在不影响交互效率和系统可用性的情况下，自适应显示时间轴因其对用户时间轴需求的准确预测而获得了更多用户的青睐。

{"title":"Adaptive 360° video timeline exploration in VR environment","authors":"Mengmeng Yu, Chongke Bi","doi":"10.1016/j.cag.2024.104108","DOIUrl":"10.1016/j.cag.2024.104108","url":null,"abstract":"<div><div>Timeline control is a crucial interaction during video viewing, aiding users in quickly locating or jumping to specific points in the video playback, especially when dealing with lengthy content. 360°videos, with their ability to offer an all-encompassing view, have gradually gained popularity, providing a more immersive experience compared to videos with a single perspective. While most 360°videos are currently displayed on two-dimensional screens, the timeline design has largely remained similar to that of conventional videos. However, virtual reality (VR) headsets provide a more immersive viewing experience for 360°videos and offer additional dimensions for timeline design. In this paper, we initially explored 6 timeline design styles by varying the shape and interaction distance of the timeline, aiming to discover designs more suitable for the VR environment of 360°videos. Subsequently, we introduced an adaptive timeline display mechanism based on eye gaze sequences to optimize the timeline, addressing issues like obstructing the view and causing distractions when the timeline is consistently visible. Through two studies, we first demonstrated that in the 360°space, the three-dimensional timeline performs better in terms of usability than the two-dimensional one, and the reachable timeline has advantages in performance and experience over the distant one. Secondly, we verified that, without compromising interaction efficiency and system usability, the adaptive display timeline gained more user preference due to its accurate prediction of user timeline needs.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104108"},"PeriodicalIF":2.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CGLight: An effective indoor illumination estimation method based on improved convmixer and GauGAN CGLight：基于改进型卷积混频器和 GauGAN 的有效室内光照度估计方法

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-11-04 DOI: 10.1016/j.cag.2024.104122

Yang Wang , Shijia Song , Lijun Zhao , Huijuan Xia , Zhenyu Yuan , Ying Zhang

Illumination consistency is a key factor for seamlessly integrating virtual objects with real scenes in augmented reality (AR) systems. High dynamic range (HDR) panoramic images are widely used to estimate scene lighting accurately. However, generating environment maps requires complex deep network architectures, which cannot operate on devices with limited memory space. To address this issue, this paper proposes CGLight, an effective illumination estimation method that predicts HDR panoramic environment maps from a single limited field-of-view (LFOV) image. We first design a CMAtten encoder to extract features from input images, which learns the spherical harmonic (SH) lighting representation with fewer model parameters. Guided by the lighting parameters, we train a generative adversarial network (GAN) to generate HDR environment maps. In addition, to enrich lighting details and reduce training time, we specifically introduce the color consistency loss and independent discriminator, considering the impact of color properties on the lighting estimation task while improving computational efficiency. Furthermore, the effectiveness of CGLight is verified by relighting virtual objects using the predicted environment maps, and the root mean square error and angular error are 0.0494 and 4.0607 in the gray diffuse sphere, respectively. Extensive experiments and analyses demonstrate that CGLight achieves a balance between indoor illumination estimation accuracy and resource efficiency, attaining higher accuracy with nearly 4 times fewer model parameters than the ViT-B16 model.

在增强现实（AR）系统中，照明一致性是将虚拟对象与真实场景无缝集成的关键因素。高动态范围（HDR）全景图像被广泛用于准确估计场景照明。然而，生成环境地图需要复杂的深度网络架构，而这种架构无法在内存空间有限的设备上运行。为了解决这个问题，本文提出了一种有效的光照度估算方法 CGLight，它能从单个有限视场（LFOV）图像预测 HDR 全景环境图。我们首先设计了一个 CMAtten 编码器来从输入图像中提取特征，该编码器以较少的模型参数学习球谐波（SH）光照表示。在光照参数的指导下，我们训练生成式对抗网络（GAN）来生成 HDR 环境贴图。此外，为了丰富光照细节并减少训练时间，我们特别引入了颜色一致性损失和独立判别器，在提高计算效率的同时考虑了颜色属性对光照估计任务的影响。此外，我们还利用预测的环境贴图对虚拟物体进行了重新照明，验证了 CGLight 的有效性，其在灰色漫射球中的均方根误差和角度误差分别为 0.0494 和 4.0607。大量的实验和分析表明，CGLight 实现了室内光照度估计精度和资源效率之间的平衡，以比 ViT-B16 模型少近 4 倍的模型参数获得了更高的精度。

{"title":"CGLight: An effective indoor illumination estimation method based on improved convmixer and GauGAN","authors":"Yang Wang , Shijia Song , Lijun Zhao , Huijuan Xia , Zhenyu Yuan , Ying Zhang","doi":"10.1016/j.cag.2024.104122","DOIUrl":"10.1016/j.cag.2024.104122","url":null,"abstract":"<div><div>Illumination consistency is a key factor for seamlessly integrating virtual objects with real scenes in augmented reality (AR) systems. High dynamic range (HDR) panoramic images are widely used to estimate scene lighting accurately. However, generating environment maps requires complex deep network architectures, which cannot operate on devices with limited memory space. To address this issue, this paper proposes CGLight, an effective illumination estimation method that predicts HDR panoramic environment maps from a single limited field-of-view (LFOV) image. We first design a CMAtten encoder to extract features from input images, which learns the spherical harmonic (SH) lighting representation with fewer model parameters. Guided by the lighting parameters, we train a generative adversarial network (GAN) to generate HDR environment maps. In addition, to enrich lighting details and reduce training time, we specifically introduce the color consistency loss and independent discriminator, considering the impact of color properties on the lighting estimation task while improving computational efficiency. Furthermore, the effectiveness of CGLight is verified by relighting virtual objects using the predicted environment maps, and the root mean square error and angular error are 0.0494 and 4.0607 in the gray diffuse sphere, respectively. Extensive experiments and analyses demonstrate that CGLight achieves a balance between indoor illumination estimation accuracy and resource efficiency, attaining higher accuracy with nearly 4 times fewer model parameters than the ViT-B16 model.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104122"},"PeriodicalIF":2.5,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Editorial Note Computers & Graphics Issue 124 编者按《计算机与图形》第 124 期

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-11-01 DOI: 10.1016/j.cag.2024.104117

Joaquim Jorge (Editor-in-Chief)

引用次数: 0

Foreword to the special section on Symposium on Virtual and Augmented Reality 2024 (SVR 2024) 2024 年虚拟与增强现实研讨会（SVR 2024）特别部分前言

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-10-31 DOI: 10.1016/j.cag.2024.104111

Rosa Costa, Cléber Corrêa, Skip Rizzo

引用次数: 0

DVRT: Design and evaluation of a virtual reality drone programming teaching system DVRT：虚拟现实无人机编程教学系统的设计与评估

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-10-29 DOI: 10.1016/j.cag.2024.104114

Zean Jin, Yulong Bai, Wei Song, Qinghe Yu, Xiaoxin Yue, Xiang Jia

Virtual Reality (VR) is an immersive virtual environment generated through computer technology. VR teaching, by utilizing an immersive learning model, offers innovative learning methods for Science, Technology, Engineering and Mathematics (STEM) education as well as programming education. This study developed a Drone Virtual Reality Teaching (DVRT) system aimed at beginners in drone operation and programming, with the goal of addressing the challenges in traditional drone and programming education, such as difficulty in engaging students and lack of practicality. Through the system's curriculum, students learn basic drone operation skills and advanced programming techniques. We conducted a course experiment primarily targeting undergraduate students who are beginners in drone operation. The test results showed that most students achieved scores above 4 out of 5, indicating that DVRT can effectively promote the development of users' comprehensive STEM literacy and computational thinking, thereby demonstrating the great potential of VR technology in STEM education. Through this innovative teaching method, students not only gain knowledge but also enjoy the fun of immersive learning.

虚拟现实（VR）是通过计算机技术生成的一种身临其境的虚拟环境。虚拟现实教学通过利用沉浸式学习模式，为科学、技术、工程和数学（STEM）教育以及编程教育提供了创新的学习方法。本研究针对无人机操作和编程初学者开发了无人机虚拟现实教学（DVRT）系统，旨在解决传统无人机和编程教育中学生难以参与、缺乏实用性等难题。通过该系统的课程，学生可以学习基本的无人机操作技能和高级编程技术。我们主要针对初学无人机操作的本科生进行了课程实验。测试结果显示，大多数学生的成绩都在 4 分（满分 5 分）以上，这表明 DVRT 能够有效促进用户的 STEM 综合素养和计算思维的发展，从而展示了 VR 技术在 STEM 教育中的巨大潜力。通过这种创新的教学方法，学生们不仅获得了知识，还享受到了身临其境的学习乐趣。

{"title":"DVRT: Design and evaluation of a virtual reality drone programming teaching system","authors":"Zean Jin, Yulong Bai, Wei Song, Qinghe Yu, Xiaoxin Yue, Xiang Jia","doi":"10.1016/j.cag.2024.104114","DOIUrl":"10.1016/j.cag.2024.104114","url":null,"abstract":"<div><div>Virtual Reality (VR) is an immersive virtual environment generated through computer technology. VR teaching, by utilizing an immersive learning model, offers innovative learning methods for Science, Technology, Engineering and Mathematics (STEM) education as well as programming education. This study developed a Drone Virtual Reality Teaching (DVRT) system aimed at beginners in drone operation and programming, with the goal of addressing the challenges in traditional drone and programming education, such as difficulty in engaging students and lack of practicality. Through the system's curriculum, students learn basic drone operation skills and advanced programming techniques. We conducted a course experiment primarily targeting undergraduate students who are beginners in drone operation. The test results showed that most students achieved scores above 4 out of 5, indicating that DVRT can effectively promote the development of users' comprehensive STEM literacy and computational thinking, thereby demonstrating the great potential of VR technology in STEM education. Through this innovative teaching method, students not only gain knowledge but also enjoy the fun of immersive learning.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104114"},"PeriodicalIF":2.5,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast spline collision detection (FSCD) algorithm for solving multiple contacts in real-time 用于实时解决多重接触的快速样条碰撞检测（FSCD）算法

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-10-20 DOI: 10.1016/j.cag.2024.104107

Lucas Zanusso Morais , Marcelo Gomes Martins , Rafael Piccin Torchelsen , Anderson Maciel , Luciana Porcher Nedel

Collision detection has been widely studied in the last decades. While plenty of solutions exist, certain simulation scenarios are still challenging when permanent contact and deformable bodies are involved. In this paper, we introduce a novel approach based on volumetric splines that is applicable to complex deformable tubes, such as in the simulation of colonoscopy and other endoscopies. The method relies on modeling radial control points, extracting surface information from a triangle mesh, and storing the volume information around a spline path. Such information is later used to compute the intersection between the object surfaces under the assumption of spatial coherence between neighboring splines. We analyze the method’s performance in terms of both speed and accuracy, comparing it with previous works. Results show that our method solves collisions between complex meshes with over 300k triangles, generating over 1,000 collisions per frame between objects while maintaining an average time of under 1ms without compromising accuracy.

碰撞检测在过去几十年中得到了广泛的研究。虽然已有大量解决方案，但在涉及永久接触和可变形体时，某些模拟场景仍具有挑战性。在本文中，我们介绍了一种基于体积样条的新方法，它适用于复杂的可变形管道，如结肠镜和其他内窥镜的模拟。该方法依赖于径向控制点建模，从三角形网格中提取表面信息，并将体积信息存储在花键路径周围。这些信息随后将用于计算物体表面之间的交点，前提是相邻花键之间存在空间一致性。我们从速度和准确性两方面分析了该方法的性能，并将其与之前的研究成果进行了比较。结果表明，我们的方法可以解决超过 30 万个三角形的复杂网格之间的碰撞，每帧产生超过 1000 次物体之间的碰撞，同时保持低于 1 毫秒的平均时间而不影响精度。

{"title":"Fast spline collision detection (FSCD) algorithm for solving multiple contacts in real-time","authors":"Lucas Zanusso Morais , Marcelo Gomes Martins , Rafael Piccin Torchelsen , Anderson Maciel , Luciana Porcher Nedel","doi":"10.1016/j.cag.2024.104107","DOIUrl":"10.1016/j.cag.2024.104107","url":null,"abstract":"<div><div>Collision detection has been widely studied in the last decades. While plenty of solutions exist, certain simulation scenarios are still challenging when permanent contact and deformable bodies are involved. In this paper, we introduce a novel approach based on volumetric splines that is applicable to complex deformable tubes, such as in the simulation of colonoscopy and other endoscopies. The method relies on modeling radial control points, extracting surface information from a triangle mesh, and storing the volume information around a spline path. Such information is later used to compute the intersection between the object surfaces under the assumption of spatial coherence between neighboring splines. We analyze the method’s performance in terms of both speed and accuracy, comparing it with previous works. Results show that our method solves collisions between complex meshes with over 300k triangles, generating over 1,000 collisions per frame between objects while maintaining an average time of under 1ms without compromising accuracy.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104107"},"PeriodicalIF":2.5,"publicationDate":"2024-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Supporting tailorability in augmented reality based remote assistance in the manufacturing industry: A user study 在基于增强现实技术的制造业远程协助中支持量身定制：用户研究

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Computers & Graphics-Uk

Pub Date : 2024-10-16 DOI: 10.1016/j.cag.2024.104095

Troels Rasmussen , Kaj Grønbæk , Weidong Huang

Research on remote assistance in real-world industries is sparse, as most research is conducted in the laboratory under controlled conditions. Consequently, little is known about how users tailor remote assistance technologies at work. Therefore, we developed an augmented reality-based remote assistance prototype called Remote Assist Kit (RAK). RAK is a component-based system, allowing us to study tailoring activities and the usefulness of tailorable remote assistance technologies. We conducted a user evaluation with employees from the plastic manufacturing industry. The employees configured the RAK to solve real-world problems in three collaborative scenarios: (1) troubleshooting a running injection molding machine, (2) tool maintenance, (3) solving a trigonometry problem. Our results show that the tailorability of RAK was perceived as useful, and users were able to successfully tailor RAK to the distinct properties of the scenarios. Specific findings and their implications for the design of tailorable remote assistance technologies are presented. Among other findings, requirements specific to remote assistance in the manufacturing industry were discussed, such as the importance of sharing machine sounds between the local operator and the remote helper.

由于大多数研究都是在受控条件下在实验室中进行的，因此对实际行业中远程协助的研究很少。因此，人们对用户在工作中如何使用远程协助技术知之甚少。因此，我们开发了一个基于增强现实技术的远程协助原型，名为远程协助工具包（RAK）。RAK 是一个基于组件的系统，使我们能够研究定制活动和可定制远程协助技术的实用性。我们对塑料制造业的员工进行了用户评估。员工们对 RAK 进行了配置，以解决三个协作场景中的实际问题：(1) 对运行中的注塑机进行故障排除，(2) 工具维护，(3) 解决三角函数问题。我们的结果表明，RAK 的可定制性被认为是有用的，用户能够成功地定制 RAK 以适应不同场景的不同特性。本文介绍了具体的研究结果及其对量身定制的远程协助技术设计的影响。除其他发现外，还讨论了制造业对远程协助的具体要求，例如本地操作员和远程协助人员共享机器声音的重要性。

{"title":"Supporting tailorability in augmented reality based remote assistance in the manufacturing industry: A user study","authors":"Troels Rasmussen , Kaj Grønbæk , Weidong Huang","doi":"10.1016/j.cag.2024.104095","DOIUrl":"10.1016/j.cag.2024.104095","url":null,"abstract":"<div><div>Research on remote assistance in real-world industries is sparse, as most research is conducted in the laboratory under controlled conditions. Consequently, little is known about how users tailor remote assistance technologies at work. Therefore, we developed an augmented reality-based remote assistance prototype called Remote Assist Kit (RAK). RAK is a component-based system, allowing us to study tailoring activities and the usefulness of tailorable remote assistance technologies. We conducted a user evaluation with employees from the plastic manufacturing industry. The employees configured the RAK to solve real-world problems in three collaborative scenarios: (1) troubleshooting a running injection molding machine, (2) tool maintenance, (3) solving a trigonometry problem. Our results show that the tailorability of RAK was perceived as useful, and users were able to successfully tailor RAK to the distinct properties of the scenarios. Specific findings and their implications for the design of tailorable remote assistance technologies are presented. Among other findings, requirements specific to remote assistance in the manufacturing industry were discussed, such as the importance of sharing machine sounds between the local operator and the remote helper.</div></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"125 ","pages":"Article 104095"},"PeriodicalIF":2.5,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0