首页 > 最新文献

Computers & Graphics-Uk最新文献

英文 中文
Graph Transformer for 3D point clouds classification and semantic segmentation 用于三维点云分类和语义分割的图形变换器
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-22 DOI: 10.1016/j.cag.2024.104050
Wei Zhou , Qian Wang , Weiwei Jin , Xinzhe Shi , Ying He

Recently, graph-based and Transformer-based deep learning have demonstrated excellent performances on various point cloud tasks. Most of the existing graph-based methods rely on static graph, which take a fixed input to establish graph relations. Moreover, many graph-based methods apply maximizing and averaging to aggregate neighboring features, so that only a single neighboring point affects the feature of centroid or different neighboring points own the same influence on the centroid’s feature, which ignoring the correlation and difference between points. Most Transformer-based approaches extract point cloud features based on global attention and lack the feature learning on local neighbors. To solve the above issues of graph-based and Transformer-based models, we propose a new feature extraction block named Graph Transformer and construct a 3D point cloud learning network called GTNet to learn features of point clouds on local and global patterns. Graph Transformer integrates the advantages of graph-based and Transformer-based methods, and consists of Local Transformer that use intra-domain cross-attention and Global Transformer that use global self-attention. Finally, we use GTNet for shape classification, part segmentation and semantic segmentation tasks in this paper. The experimental results show that our model achieves good learning and prediction ability on most tasks. The source code and pre-trained model of GTNet will be released on https://github.com/NWUzhouwei/GTNet.

最近,基于图的深度学习和基于变换器的深度学习在各种点云任务中表现出色。现有的基于图的方法大多依赖于静态图,即通过固定输入来建立图关系。此外,很多基于图的方法会对相邻点的特征进行最大化和平均化处理,从而导致只有单个相邻点会影响中心点的特征,或者不同相邻点对中心点特征的影响相同,从而忽略了点与点之间的相关性和差异性。大多数基于变换器的方法都是基于全局注意力提取点云特征,缺乏对局部邻点的特征学习。为了解决基于图和基于变换器模型的上述问题,我们提出了一种名为 Graph Transformer 的新特征提取模块,并构建了一个名为 GTNet 的三维点云学习网络,以学习局部和全局模式的点云特征。图形变换器集成了基于图形和基于变换器的方法的优点,由使用域内交叉注意的局部变换器和使用全局自注意的全局变换器组成。最后,本文将 GTNet 用于形状分类、部件分割和语义分割任务。实验结果表明,我们的模型在大多数任务上都取得了良好的学习和预测能力。GTNet 的源代码和预训练模型将在 https://github.com/NWUzhouwei/GTNet 上发布。
{"title":"Graph Transformer for 3D point clouds classification and semantic segmentation","authors":"Wei Zhou ,&nbsp;Qian Wang ,&nbsp;Weiwei Jin ,&nbsp;Xinzhe Shi ,&nbsp;Ying He","doi":"10.1016/j.cag.2024.104050","DOIUrl":"10.1016/j.cag.2024.104050","url":null,"abstract":"<div><p>Recently, graph-based and Transformer-based deep learning have demonstrated excellent performances on various point cloud tasks. Most of the existing graph-based methods rely on static graph, which take a fixed input to establish graph relations. Moreover, many graph-based methods apply maximizing and averaging to aggregate neighboring features, so that only a single neighboring point affects the feature of centroid or different neighboring points own the same influence on the centroid’s feature, which ignoring the correlation and difference between points. Most Transformer-based approaches extract point cloud features based on global attention and lack the feature learning on local neighbors. To solve the above issues of graph-based and Transformer-based models, we propose a new feature extraction block named Graph Transformer and construct a 3D point cloud learning network called GTNet to learn features of point clouds on local and global patterns. Graph Transformer integrates the advantages of graph-based and Transformer-based methods, and consists of Local Transformer that use intra-domain cross-attention and Global Transformer that use global self-attention. Finally, we use GTNet for shape classification, part segmentation and semantic segmentation tasks in this paper. The experimental results show that our model achieves good learning and prediction ability on most tasks. The source code and pre-trained model of GTNet will be released on <span><span>https://github.com/NWUzhouwei/GTNet</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104050"},"PeriodicalIF":2.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142099002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing the effect of undermining on suture forces during simulated skin flap surgeries with a three-dimensional finite element method 用三维有限元方法分析在模拟皮瓣手术中破坏对缝合力的影响
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-22 DOI: 10.1016/j.cag.2024.104057
Wenzhangzhi Guo , Allison Tsz Kwan Lau , Joel C. Davies , Vito Forte , Eitan Grinspun , Lueder Alexander Kahrs

Skin flaps are common procedures used by surgeons to cover an excised area during the reconstruction of a defect. It is often a challenging task for a surgeon to come up with the most optimal design for a patient. In this paper, we set up a simulation system based on the finite element method for one of the most common flap types — the rhomboid flap. Instead of using the standard 2D planar patch, we constructed a 3D patch with multiple layers. This allowed us to investigate the impact of different undermining areas and depths. We compared the suture forces for each case and identified vertices with the largest suture force. The shape of the final suture line is also visualized for each case, which is an important clue when deciding on the most optimal skin flap orientation according to medical textbooks. We found that under the optimal undermining setup, the maximum suture force is around 0.7 N for top of the undermined layer and 1.0 N for bottom of the undermined layer. When measuring difference in final suture line shape, the maximum normalized Hausdorff distance is 0.099, which suggests that different undermining region can have significant impact on the shape of the suture line, especially in the tail region. After analyzing the suture force plots, we provided recommendations on the most optimal undermining region for rhomboid flaps.

皮瓣是外科医生在重建缺损时用来覆盖切除区域的常用程序。对于外科医生来说,为患者设计出最理想的皮瓣往往是一项极具挑战性的任务。在本文中,我们针对最常见的皮瓣类型之一--斜方形皮瓣,建立了一个基于有限元法的模拟系统。我们没有使用标准的二维平面补片,而是构建了一个具有多层的三维补片。这样,我们就能研究不同破坏区域和深度的影响。我们比较了每个病例的缝合力,并确定了缝合力最大的顶点。每个病例最终缝合线的形状也可视化,这是在根据医学教科书决定最佳皮瓣方向时的重要线索。我们发现,在最佳埋线设置下,埋线层顶部的最大缝合力约为 0.7 N,埋线层底部的最大缝合力约为 1.0 N。在测量最终缝合线形状的差异时,最大归一化豪斯多夫距离为 0.099,这表明不同的破坏区域会对缝合线的形状产生显著影响,尤其是在尾部区域。在分析了缝合力图之后,我们对斜方形皮瓣的最佳下拉区域提出了建议。
{"title":"Analyzing the effect of undermining on suture forces during simulated skin flap surgeries with a three-dimensional finite element method","authors":"Wenzhangzhi Guo ,&nbsp;Allison Tsz Kwan Lau ,&nbsp;Joel C. Davies ,&nbsp;Vito Forte ,&nbsp;Eitan Grinspun ,&nbsp;Lueder Alexander Kahrs","doi":"10.1016/j.cag.2024.104057","DOIUrl":"10.1016/j.cag.2024.104057","url":null,"abstract":"<div><p>Skin flaps are common procedures used by surgeons to cover an excised area during the reconstruction of a defect. It is often a challenging task for a surgeon to come up with the most optimal design for a patient. In this paper, we set up a simulation system based on the finite element method for one of the most common flap types — the rhomboid flap. Instead of using the standard 2D planar patch, we constructed a 3D patch with multiple layers. This allowed us to investigate the impact of different undermining areas and depths. We compared the suture forces for each case and identified vertices with the largest suture force. The shape of the final suture line is also visualized for each case, which is an important clue when deciding on the most optimal skin flap orientation according to medical textbooks. We found that under the optimal undermining setup, the maximum suture force is around 0.7 N for top of the undermined layer and 1.0 N for bottom of the undermined layer. When measuring difference in final suture line shape, the maximum normalized Hausdorff distance is 0.099, which suggests that different undermining region can have significant impact on the shape of the suture line, especially in the tail region. After analyzing the suture force plots, we provided recommendations on the most optimal undermining region for rhomboid flaps.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104057"},"PeriodicalIF":2.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001924/pdfft?md5=2b5edd39561a506e03eb5a66dbf3e9fc&pid=1-s2.0-S0097849324001924-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142117642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Foreword to the special section on Shape Modeling International 2024 (SMI2024) 2024 年国际建模会议(SMI2024)特别会议前言
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-22 DOI: 10.1016/j.cag.2024.104047
Georges-Pierre Bonneau, Tao Ju, Zichun Zhong
{"title":"Foreword to the special section on Shape Modeling International 2024 (SMI2024)","authors":"Georges-Pierre Bonneau,&nbsp;Tao Ju,&nbsp;Zichun Zhong","doi":"10.1016/j.cag.2024.104047","DOIUrl":"10.1016/j.cag.2024.104047","url":null,"abstract":"","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104047"},"PeriodicalIF":2.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142050396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OpenECAD: An efficient visual language model for editable 3D-CAD design OpenECAD:可编辑 3D CAD 设计的高效视觉语言模型
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-22 DOI: 10.1016/j.cag.2024.104048
Zhe Yuan , Jianqi Shi , Yanhong Huang

Computer-aided design (CAD) tools are utilized in the manufacturing industry for modeling everything from cups to spacecraft. These programs are complex to use and typically require years of training and experience to master. Structured and well-constrained 2D sketches and 3D constructions are crucial components of CAD modeling. A well-executed CAD model can be seamlessly integrated into the manufacturing process, thereby enhancing production efficiency. Deep generative models of 3D shapes and 3D object reconstruction models have garnered significant research interest. However, most of these models produce discrete forms of 3D objects that are not editable. Moreover, the few models based on CAD operations often have substantial input restrictions. In this work, we fine-tuned pre-trained models to create OpenECAD models (0.55B, 0.89B, 2.4B and 3.1B), leveraging the visual, logical, coding, and general capabilities of visual language models. OpenECAD models can process images of 3D designs as input and generate highly structured 2D sketches and 3D construction commands, ensuring that the designs are editable. These outputs can be directly used with existing CAD tools’ APIs to generate project files. To train our network, we created a series of OpenECAD datasets. These datasets are derived from existing public CAD datasets, adjusted and augmented to meet the specific requirements of vision language model (VLM) training. Additionally, we have introduced an approach that utilizes dependency relationships to define and generate sketches, further enriching the content and functionality of the datasets.

在制造业中,计算机辅助设计(CAD)工具被用于从杯子到航天器的各种建模。这些程序使用复杂,通常需要多年的培训和经验才能掌握。结构严谨、约束良好的二维草图和三维结构是 CAD 建模的重要组成部分。执行良好的 CAD 模型可以无缝集成到制造流程中,从而提高生产效率。三维形状的深度生成模型和三维物体重构模型已经引起了广泛的研究兴趣。然而,这些模型大多生成不可编辑的离散三维物体。此外,少数基于 CAD 操作的模型往往有很大的输入限制。在这项工作中,我们利用视觉语言模型的视觉、逻辑、编码和通用能力,对预训练模型进行了微调,创建了 OpenECAD 模型(0.55B、0.89B、2.4B 和 3.1B)。OpenECAD 模型可以处理作为输入的三维设计图像,并生成高度结构化的二维草图和三维施工指令,确保设计可编辑。这些输出可直接用于现有 CAD 工具的 API,以生成项目文件。为了训练我们的网络,我们创建了一系列 OpenECAD 数据集。这些数据集来自现有的公共 CAD 数据集,并经过调整和增强,以满足视觉语言模型 (VLM) 训练的特定要求。此外,我们还引入了一种利用依赖关系来定义和生成草图的方法,进一步丰富了数据集的内容和功能。
{"title":"OpenECAD: An efficient visual language model for editable 3D-CAD design","authors":"Zhe Yuan ,&nbsp;Jianqi Shi ,&nbsp;Yanhong Huang","doi":"10.1016/j.cag.2024.104048","DOIUrl":"10.1016/j.cag.2024.104048","url":null,"abstract":"<div><p>Computer-aided design (CAD) tools are utilized in the manufacturing industry for modeling everything from cups to spacecraft. These programs are complex to use and typically require years of training and experience to master. Structured and well-constrained 2D sketches and 3D constructions are crucial components of CAD modeling. A well-executed CAD model can be seamlessly integrated into the manufacturing process, thereby enhancing production efficiency. Deep generative models of 3D shapes and 3D object reconstruction models have garnered significant research interest. However, most of these models produce discrete forms of 3D objects that are not editable. Moreover, the few models based on CAD operations often have substantial input restrictions. In this work, we fine-tuned pre-trained models to create OpenECAD models (0.55B, 0.89B, 2.4B and 3.1B), leveraging the visual, logical, coding, and general capabilities of visual language models. OpenECAD models can process images of 3D designs as input and generate highly structured 2D sketches and 3D construction commands, ensuring that the designs are editable. These outputs can be directly used with existing CAD tools’ APIs to generate project files. To train our network, we created a series of OpenECAD datasets. These datasets are derived from existing public CAD datasets, adjusted and augmented to meet the specific requirements of vision language model (VLM) training. Additionally, we have introduced an approach that utilizes dependency relationships to define and generate sketches, further enriching the content and functionality of the datasets.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104048"},"PeriodicalIF":2.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Foreword to the Special Section on XR Technologies for Healthcare and Wellbeing 为 "XR 技术促进医疗保健和福祉 "专题撰写的前言
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-20 DOI: 10.1016/j.cag.2024.104046
Anderson Maciel, Matias Volonte, Helena Mentis
{"title":"Foreword to the Special Section on XR Technologies for Healthcare and Wellbeing","authors":"Anderson Maciel,&nbsp;Matias Volonte,&nbsp;Helena Mentis","doi":"10.1016/j.cag.2024.104046","DOIUrl":"10.1016/j.cag.2024.104046","url":null,"abstract":"","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104046"},"PeriodicalIF":2.5,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142129160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LSGRNet: Local Spatial Latent Geometric Relation Learning Network for 3D point cloud semantic segmentation LSGRNet:用于三维点云语义分割的本地空间潜在几何关系学习网络
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-20 DOI: 10.1016/j.cag.2024.104053
Liguo Luo, Jian Lu, Xiaogai Chen, Kaibing Zhang, Jian Zhou

In recent years, remarkable ability has been demonstrated by the Transformer model in capturing remote dependencies and improving point cloud segmentation performance. However, localized regions separated from conventional sampling architectures have resulted in the destruction of structural information of instances and a lack of exploration of potential geometric relationships between localized regions. To address this issue, a Local Spatial Latent Geometric Relation Learning Network (LSGRNet) is proposed in this paper, with the geometric properties of point clouds serving as a reference. Specifically, spatial transformation and gradient computation are performed on the local point cloud to uncover potential geometric relationships within the local neighborhood. Furthermore, a local relationship aggregator based on semantic and geometric relationships is constructed to enable the interaction of spatial geometric structure and information within the local neighborhood. Simultaneously, boundary interaction feature learning module is employed to learn the boundary information of the point cloud, aiming to better describe the local structure. The experimental results indicate that excellent segmentation performance is exhibited by the proposed LSGRNet in benchmark tests on the indoor datasets S3DIS and ScanNetV2, as well as the outdoor datasets SemanticKITTI and Semantic3D.

近年来,Transformer 模型在捕捉远程依赖关系和提高点云分割性能方面表现出了卓越的能力。然而,从传统采样架构中分离出来的局部区域破坏了实例的结构信息,并且缺乏对局部区域之间潜在几何关系的探索。为解决这一问题,本文提出了一种局部空间潜在几何关系学习网络(LSGRNet),以点云的几何属性作为参考。具体来说,对本地点云进行空间变换和梯度计算,以发现本地邻域内的潜在几何关系。此外,还构建了一个基于语义和几何关系的本地关系聚合器,以实现空间几何结构与本地邻域内信息的交互。同时,边界交互特征学习模块用于学习点云的边界信息,旨在更好地描述局部结构。实验结果表明,在室内数据集 S3DIS 和 ScanNetV2 以及室外数据集 SemanticKITTI 和 Semantic3D 的基准测试中,所提出的 LSGRNet 表现出了出色的分割性能。
{"title":"LSGRNet: Local Spatial Latent Geometric Relation Learning Network for 3D point cloud semantic segmentation","authors":"Liguo Luo,&nbsp;Jian Lu,&nbsp;Xiaogai Chen,&nbsp;Kaibing Zhang,&nbsp;Jian Zhou","doi":"10.1016/j.cag.2024.104053","DOIUrl":"10.1016/j.cag.2024.104053","url":null,"abstract":"<div><p>In recent years, remarkable ability has been demonstrated by the Transformer model in capturing remote dependencies and improving point cloud segmentation performance. However, localized regions separated from conventional sampling architectures have resulted in the destruction of structural information of instances and a lack of exploration of potential geometric relationships between localized regions. To address this issue, a Local Spatial Latent Geometric Relation Learning Network (LSGRNet) is proposed in this paper, with the geometric properties of point clouds serving as a reference. Specifically, spatial transformation and gradient computation are performed on the local point cloud to uncover potential geometric relationships within the local neighborhood. Furthermore, a local relationship aggregator based on semantic and geometric relationships is constructed to enable the interaction of spatial geometric structure and information within the local neighborhood. Simultaneously, boundary interaction feature learning module is employed to learn the boundary information of the point cloud, aiming to better describe the local structure. The experimental results indicate that excellent segmentation performance is exhibited by the proposed LSGRNet in benchmark tests on the indoor datasets S3DIS and ScanNetV2, as well as the outdoor datasets SemanticKITTI and Semantic3D.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104053"},"PeriodicalIF":2.5,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An impartial framework to investigate demosaicking input embedding options 调查去马赛克输入嵌入选项的公正框架
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-16 DOI: 10.1016/j.cag.2024.104044
Yan Niu , Xuanchen Li , Yang Tao , Bo Zhao

Convolutional Neural Networks (CNNs) have proven highly effective for demosaicking, transforming raw Color Filter Array (CFA) sensor samples into standard RGB images. Directly applying convolution to the CFA tensor can lead to misinterpretation of the color context, so existing demosaicking networks typically embed the CFA tensor into the Euclidean space before convolution. The most prevalent embedding options are Reordering and Pre-interpolation. However, it remains unclear which option is more advantageous for demosaicking. Moreover, no existing demosaicking network is suitable for conducting a fair comparison. As a result, in practice, the selection of these two embedding options is often based on intuition and heuristic approaches. This paper addresses the non-comparability between the two options and investigates whether pre-interpolation contributes additional knowledge to the demosaicking network. Based on rigorous mathematical derivation, we design pairs of end-to-end fully convolutional evaluation networks, ensuring that the performance difference between each pair of networks can be solely attributed to their differing CFA embedding strategies. Under strictly fair comparison conditions, we measure the performance contrast between the two embedding options across various scenarios. Our comprehensive evaluation reveals that the prior knowledge introduced by pre-interpolation benefits lightweight models. Additionally, pre-interpolation enhances the robustness to imaging artifacts for larger models. Our findings offer practical guidelines for designing imaging software or Image Signal Processors (ISPs) for RGB cameras.

事实证明,卷积神经网络(CNN)在去马赛克、将原始彩色滤波阵列(CFA)传感器样本转换为标准 RGB 图像方面非常有效。直接对 CFA 张量进行卷积会导致对色彩背景的误读,因此现有的去马赛克网络通常会在卷积之前将 CFA 张量嵌入欧几里得空间。最常用的嵌入方法是重新排序和预插值。然而,目前还不清楚哪种方案对去马赛克更有利。此外,现有的去马赛克网络都不适合进行公平的比较。因此,在实践中,这两种嵌入方案的选择往往基于直觉和启发式方法。本文针对这两种方案之间的不可比性,研究了预插值是否为去马赛克网络贡献了额外的知识。基于严格的数学推导,我们设计了一对端到端全卷积评估网络,确保每对网络之间的性能差异可以完全归因于它们不同的 CFA 嵌入策略。在严格公平的比较条件下,我们测量了两种嵌入方案在各种情况下的性能对比。我们的综合评估显示,预插值引入的先验知识有利于轻量级模型。此外,预内插法还能增强大型模型对成像伪影的稳健性。我们的研究结果为设计 RGB 相机的成像软件或图像信号处理器(ISP)提供了实用指南。
{"title":"An impartial framework to investigate demosaicking input embedding options","authors":"Yan Niu ,&nbsp;Xuanchen Li ,&nbsp;Yang Tao ,&nbsp;Bo Zhao","doi":"10.1016/j.cag.2024.104044","DOIUrl":"10.1016/j.cag.2024.104044","url":null,"abstract":"<div><p>Convolutional Neural Networks (CNNs) have proven highly effective for demosaicking, transforming raw Color Filter Array (CFA) sensor samples into standard RGB images. Directly applying convolution to the CFA tensor can lead to misinterpretation of the color context, so existing demosaicking networks typically embed the CFA tensor into the Euclidean space before convolution. The most prevalent embedding options are <em>Reordering</em> and <em>Pre-interpolation</em>. However, it remains unclear which option is more advantageous for demosaicking. Moreover, no existing demosaicking network is suitable for conducting a fair comparison. As a result, in practice, the selection of these two embedding options is often based on intuition and heuristic approaches. This paper addresses the non-comparability between the two options and investigates whether pre-interpolation contributes additional knowledge to the demosaicking network. Based on rigorous mathematical derivation, we design pairs of end-to-end fully convolutional evaluation networks, ensuring that the performance difference between each pair of networks can be solely attributed to their differing CFA embedding strategies. Under strictly fair comparison conditions, we measure the performance contrast between the two embedding options across various scenarios. Our comprehensive evaluation reveals that the prior knowledge introduced by pre-interpolation benefits lightweight models. Additionally, pre-interpolation enhances the robustness to imaging artifacts for larger models. Our findings offer practical guidelines for designing imaging software or Image Signal Processors (ISPs) for RGB cameras.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104044"},"PeriodicalIF":2.5,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142041334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-COPE: A novel prior-based category-level object pose estimation network with dual Sim2Real unsupervised domain adaptation module Dual-COPE:带有双 Sim2Real 无监督领域适应模块的基于先验的新式类别级物体姿态估计网络
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-14 DOI: 10.1016/j.cag.2024.104045
Xi Ren , Nan Guo , Zichen Zhu , Xinbei Jiang

Category-level pose estimation offers the generalization ability to novel objects unseen during training, which has attracted increasing attention in recent years. Despite the advantage, annotating real-world data with pose label is intricate and laborious. Although using synthetic data with free annotations can greatly reduce training costs, the Synthetic-to-Real (Sim2Real) domain gap could result in a sharp performance decline on real-world test. In this paper, we propose Dual-COPE, a novel prior-based category-level object pose estimation method with dual Sim2Real domain adaptation to avoid expensive real pose annotations. First, we propose an estimation network featured with conjoined prior deformation and transformer-based matching to realize high-precision pose prediction. Upon that, an efficient dual Sim2Real domain adaptation module is further designed to reduce the feature distribution discrepancy between synthetic and real-world data both semantically and geometrically, thus maintaining superior performance on real-world test. Moreover, the adaptation module is loosely coupled with estimation network, allowing for easy integration with other methods without any additional inference overhead. Comprehensive experiments show that Dual-COPE outperforms existing unsupervised methods and achieves state-of-the-art precision under supervised settings.

类别级姿态估计具有泛化能力,可以泛化到训练过程中未见的新物体,这在近年来引起了越来越多的关注。尽管有这样的优势,但在真实世界数据中标注姿势标签是一项复杂而费力的工作。虽然使用带有免费注释的合成数据可以大大降低训练成本,但合成到真实(Sim2Real)领域的差距可能会导致在真实世界测试中的性能急剧下降。在本文中,我们提出了一种新颖的基于先验的类别级物体姿态估计方法--Dual-COPE,该方法具有双 Sim2Real 域适应性,可避免昂贵的真实姿态注释。首先,我们提出了一种以先验变形和基于变换器的匹配相结合为特征的估计网络,以实现高精度姿态预测。在此基础上,我们进一步设计了高效的双 Sim2Real 域适配模块,以减少合成数据与真实世界数据在语义和几何上的特征分布差异,从而在真实世界测试中保持优异的性能。此外,适应模块与估算网络松散耦合,可与其他方法轻松集成,而无需任何额外的推理开销。综合实验表明,Dual-COPE 优于现有的无监督方法,并在有监督设置下达到了最先进的精度。
{"title":"Dual-COPE: A novel prior-based category-level object pose estimation network with dual Sim2Real unsupervised domain adaptation module","authors":"Xi Ren ,&nbsp;Nan Guo ,&nbsp;Zichen Zhu ,&nbsp;Xinbei Jiang","doi":"10.1016/j.cag.2024.104045","DOIUrl":"10.1016/j.cag.2024.104045","url":null,"abstract":"<div><p>Category-level pose estimation offers the generalization ability to novel objects unseen during training, which has attracted increasing attention in recent years. Despite the advantage, annotating real-world data with pose label is intricate and laborious. Although using synthetic data with free annotations can greatly reduce training costs, the Synthetic-to-Real (Sim2Real) domain gap could result in a sharp performance decline on real-world test. In this paper, we propose Dual-COPE, a novel prior-based category-level object pose estimation method with dual Sim2Real domain adaptation to avoid expensive real pose annotations. First, we propose an estimation network featured with conjoined prior deformation and transformer-based matching to realize high-precision pose prediction. Upon that, an efficient dual Sim2Real domain adaptation module is further designed to reduce the feature distribution discrepancy between synthetic and real-world data both semantically and geometrically, thus maintaining superior performance on real-world test. Moreover, the adaptation module is loosely coupled with estimation network, allowing for easy integration with other methods without any additional inference overhead. Comprehensive experiments show that Dual-COPE outperforms existing unsupervised methods and achieves state-of-the-art precision under supervised settings.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"124 ","pages":"Article 104045"},"PeriodicalIF":2.5,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142048643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mesh-controllable multi-level-of-detail text-to-3D generation 网格可控的多级细节文本三维生成技术
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-10 DOI: 10.1016/j.cag.2024.104039
Dongjin Huang , Nan Wang , Xinghan Huang , Jiantao Qu , Shiyu Zhang

Text-to-3D generation is a challenging but significant task and has gained widespread attention. Its capability to rapidly generate 3D digital assets holds huge potential application value in fields such as film, video games, and virtual reality. However, current methods often face several drawbacks, including long generation times, difficulties with the multi-face Janus problem, and issues like chaotic topology and redundant structures during mesh extraction. Additionally, the lack of control over the generated results limits their utility in downstream applications. To address these problems, we propose a novel text-to-3D framework capable of generating meshes with high fidelity and controllability. Our approach can efficiently produce meshes and textures that match the text description and the desired level of detail (LOD) by specifying input text and LOD preferences. This framework consists of two stages. In the coarse stage, 3D Gaussians are employed to accelerate generation speed, and weighted positive and negative prompts from various observation perspectives are used to address the multi-face Janus problem in the generated results. In the refinement stage, mesh vertices and faces are iteratively refined to enhance surface quality and output meshes and textures that meet specified LOD requirements. Compared to the state-of-the-art text-to-3D methods, extensive experiments demonstrate that the proposed method performs better in solving the multi-face Janus problem, enabling the rapid generation of 3D meshes with enhanced prompt adherence. Furthermore, the proposed framework can generate meshes with enhanced topology, offering controllable vertices and faces with textures featuring UV adaptation to achieve multi-level-of-detail(LODs) outputs. Specifically, the proposed method can preserve the output’s relevance to input texts during simplification, making it better suited for mesh editing and rendering efficiency. User studies also indicate that our framework receives higher evaluations compared to other methods.

文本到 3D 的生成是一项具有挑战性但意义重大的任务,已受到广泛关注。它能够快速生成三维数字资产,在电影、视频游戏和虚拟现实等领域具有巨大的潜在应用价值。然而,当前的方法往往面临一些缺陷,包括生成时间长、难以解决多面杰纳斯问题,以及网格提取过程中的拓扑结构混乱和冗余结构等问题。此外,对生成结果缺乏控制也限制了其在下游应用中的实用性。为了解决这些问题,我们提出了一种新颖的文本到三维框架,能够生成高保真和可控的网格。通过指定输入文本和 LOD 偏好,我们的方法可以高效生成符合文本描述和所需细节级别(LOD)的网格和纹理。该框架由两个阶段组成。在粗化阶段,使用三维高斯来加快生成速度,并使用来自不同观察视角的加权正负提示来解决生成结果中的多面简纳斯问题。在细化阶段,对网格顶点和面进行迭代细化,以提高表面质量,并输出符合指定 LOD 要求的网格和纹理。与最先进的文本到三维方法相比,大量实验证明,所提出的方法在解决多面简努斯问题方面表现更佳,能够快速生成三维网格,并增强了及时性。此外,所提出的框架还能生成具有增强拓扑结构的网格,提供可控顶点和具有 UV 自适应纹理的面,从而实现多级细节(LOD)输出。具体来说,建议的方法可以在简化过程中保持输出与输入文本的相关性,使其更适合网格编辑和提高渲染效率。用户研究还表明,与其他方法相比,我们的框架获得了更高的评价。
{"title":"Mesh-controllable multi-level-of-detail text-to-3D generation","authors":"Dongjin Huang ,&nbsp;Nan Wang ,&nbsp;Xinghan Huang ,&nbsp;Jiantao Qu ,&nbsp;Shiyu Zhang","doi":"10.1016/j.cag.2024.104039","DOIUrl":"10.1016/j.cag.2024.104039","url":null,"abstract":"<div><p>Text-to-3D generation is a challenging but significant task and has gained widespread attention. Its capability to rapidly generate 3D digital assets holds huge potential application value in fields such as film, video games, and virtual reality. However, current methods often face several drawbacks, including long generation times, difficulties with the multi-face Janus problem, and issues like chaotic topology and redundant structures during mesh extraction. Additionally, the lack of control over the generated results limits their utility in downstream applications. To address these problems, we propose a novel text-to-3D framework capable of generating meshes with high fidelity and controllability. Our approach can efficiently produce meshes and textures that match the text description and the desired level of detail (LOD) by specifying input text and LOD preferences. This framework consists of two stages. In the coarse stage, 3D Gaussians are employed to accelerate generation speed, and weighted positive and negative prompts from various observation perspectives are used to address the multi-face Janus problem in the generated results. In the refinement stage, mesh vertices and faces are iteratively refined to enhance surface quality and output meshes and textures that meet specified LOD requirements. Compared to the state-of-the-art text-to-3D methods, extensive experiments demonstrate that the proposed method performs better in solving the multi-face Janus problem, enabling the rapid generation of 3D meshes with enhanced prompt adherence. Furthermore, the proposed framework can generate meshes with enhanced topology, offering controllable vertices and faces with textures featuring UV adaptation to achieve multi-level-of-detail(LODs) outputs. Specifically, the proposed method can preserve the output’s relevance to input texts during simplification, making it better suited for mesh editing and rendering efficiency. User studies also indicate that our framework receives higher evaluations compared to other methods.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104039"},"PeriodicalIF":2.5,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141998269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of motion retargeting techniques for 3D character facial animation 三维角色面部动画运动重定向技术综述
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Pub Date : 2024-08-08 DOI: 10.1016/j.cag.2024.104037
ChangAn Zhu, Chris Joslin

3D face animation has been a critical component of character animation in a wide range of media since the early 90’s. The conventional process for animating a 3D face is usually keyframe-based, which is labor-intensive. Therefore, the film and game industries have started using live-action actors’ performances to animate the faces of 3D characters, the process is also known as performance-driven facial animation. At the core of performance-driven facial animation is facial motion retargeting, which transfers the source facial motions to a target 3D face. However, facial motion retargeting still has many limitations that influence its capability to further assist the facial animation process. Existing motion retargeting frameworks cannot accurately transfer the source motion’s semantic information (i.e., meaning and intensity of the motion), especially when applying the motion to non-human-like or stylized target characters. The retargeting quality relies on the parameterization of the target face, which is time-consuming to build and usually not generalizable across proportionally different faces. In this survey paper, we review the literature relating to 3D facial motion retargeting methods and the relevant topics within this area. We provide a systematic understanding of the essential modules of the retargeting pipeline, a taxonomy of the available approaches under these modules, and a thorough analysis of their advantages and limitations with research directions that could potentially contribute to this area. We also contributed a 3D character categorization matrix, which has been used in this survey and might be useful for future research to evaluate the character compatibility of their retargeting or face parameterization methods.

自上世纪 90 年代初以来,三维面部动画一直是各种媒体角色动画的重要组成部分。传统的 3D 脸部动画制作过程通常是基于关键帧的,耗费大量人力物力。因此,电影和游戏行业开始使用真人演员的表演来制作三维角色的面部动画,这一过程也被称为表演驱动的面部动画。表演驱动面部动画的核心是面部动作重定向,它将源面部动作转移到目标三维面部。然而,面部动作重定向仍有许多局限性,影响了其进一步辅助面部动画制作的能力。现有的动作重定向框架无法准确传递源动作的语义信息(即动作的含义和强度),尤其是在将动作应用于非人类或风格化的目标角色时。重定向质量依赖于目标脸部的参数化,而参数化的建立非常耗时,而且通常无法在不同比例的脸部中通用。在本调查报告中,我们回顾了与三维面部运动重定位方法有关的文献以及该领域的相关主题。我们系统地了解了重定向管道的基本模块,对这些模块下的可用方法进行了分类,并深入分析了它们的优势和局限性,以及有可能为该领域做出贡献的研究方向。我们还提供了一个三维人物分类矩阵,该矩阵已在本次调查中使用,可能对未来研究评估重定目标或人脸参数化方法的人物兼容性有用。
{"title":"A review of motion retargeting techniques for 3D character facial animation","authors":"ChangAn Zhu,&nbsp;Chris Joslin","doi":"10.1016/j.cag.2024.104037","DOIUrl":"10.1016/j.cag.2024.104037","url":null,"abstract":"<div><p>3D face animation has been a critical component of character animation in a wide range of media since the early 90’s. The conventional process for animating a 3D face is usually keyframe-based, which is labor-intensive. Therefore, the film and game industries have started using live-action actors’ performances to animate the faces of 3D characters, the process is also known as performance-driven facial animation. At the core of performance-driven facial animation is facial motion retargeting, which transfers the source facial motions to a target 3D face. However, facial motion retargeting still has many limitations that influence its capability to further assist the facial animation process. Existing motion retargeting frameworks cannot accurately transfer the source motion’s semantic information (i.e., meaning and intensity of the motion), especially when applying the motion to non-human-like or stylized target characters. The retargeting quality relies on the parameterization of the target face, which is time-consuming to build and usually not generalizable across proportionally different faces. In this survey paper, we review the literature relating to 3D facial motion retargeting methods and the relevant topics within this area. We provide a systematic understanding of the essential modules of the retargeting pipeline, a taxonomy of the available approaches under these modules, and a thorough analysis of their advantages and limitations with research directions that could potentially contribute to this area. We also contributed a 3D character categorization matrix, which has been used in this survey and might be useful for future research to evaluate the character compatibility of their retargeting or face parameterization methods.</p></div>","PeriodicalId":50628,"journal":{"name":"Computers & Graphics-Uk","volume":"123 ","pages":"Article 104037"},"PeriodicalIF":2.5,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0097849324001729/pdfft?md5=887467d22bf59df3534253c1761b0e20&pid=1-s2.0-S0097849324001729-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141990816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computers & Graphics-Uk
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1