EGCT: enhanced graph convolutional transformer for 3D point cloud representation learning

The Visual Computer Pub Date : 2024-08-23 DOI:10.1007/s00371-024-03600-2

Gang Chen, Wenju Wang, Haoran Zhou, Xiaolin Wang

{"title":"EGCT: enhanced graph convolutional transformer for 3D point cloud representation learning","authors":"Gang Chen, Wenju Wang, Haoran Zhou, Xiaolin Wang","doi":"10.1007/s00371-024-03600-2","DOIUrl":null,"url":null,"abstract":"<p>It is an urgent problem of high-precision 3D environment perception to carry out representation learning on point cloud data, which complete the synchronous acquisition of local and global feature information. However, current representation learning methods either only focus on how to efficiently learn local features, or capture long-distance dependencies but lose the fine-grained features. Therefore, we explore transformer on topological structures of point cloud graphs, proposing an enhanced graph convolutional transformer (EGCT) method. EGCT construct graph topology for disordered and unstructured point cloud. Then it uses the enhanced point feature representation method to further aggregate the feature information of all neighborhood points, which can compactly represent the features of this local neighborhood graph. Subsequent process, the graph convolutional transformer simultaneously performs self-attention calculations and convolution operations on the point coordinates and features of the neighborhood graph. It efficiently utilizes the spatial geometric information of point cloud objects. Therefore, EGCT learns comprehensive geometric information of point cloud objects, which can help to improve segmentation and classification accuracy. On the ShapeNetPart and ModelNet40 datasets, our EGCT method achieves a mIoU of 86.8%, OA and AA of 93.5% and 91.2%, respectively. On the large-scale indoor scene point cloud dataset (S3DIS), the OA of EGCT method is 90.1%, and the mIoU is 67.8%. Experimental results demonstrate that our EGCT method can achieve comparable point cloud segmentation and classification performance to state-of-the-art methods while maintaining low model complexity. Our source code is available at https://github.com/shepherds001/EGCT.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03600-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

It is an urgent problem of high-precision 3D environment perception to carry out representation learning on point cloud data, which complete the synchronous acquisition of local and global feature information. However, current representation learning methods either only focus on how to efficiently learn local features, or capture long-distance dependencies but lose the fine-grained features. Therefore, we explore transformer on topological structures of point cloud graphs, proposing an enhanced graph convolutional transformer (EGCT) method. EGCT construct graph topology for disordered and unstructured point cloud. Then it uses the enhanced point feature representation method to further aggregate the feature information of all neighborhood points, which can compactly represent the features of this local neighborhood graph. Subsequent process, the graph convolutional transformer simultaneously performs self-attention calculations and convolution operations on the point coordinates and features of the neighborhood graph. It efficiently utilizes the spatial geometric information of point cloud objects. Therefore, EGCT learns comprehensive geometric information of point cloud objects, which can help to improve segmentation and classification accuracy. On the ShapeNetPart and ModelNet40 datasets, our EGCT method achieves a mIoU of 86.8%, OA and AA of 93.5% and 91.2%, respectively. On the large-scale indoor scene point cloud dataset (S3DIS), the OA of EGCT method is 90.1%, and the mIoU is 67.8%. Experimental results demonstrate that our EGCT method can achieve comparable point cloud segmentation and classification performance to state-of-the-art methods while maintaining low model complexity. Our source code is available at https://github.com/shepherds001/EGCT.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

EGCT：用于三维点云表示学习的增强型图卷积变换器

对点云数据进行表征学习，完成局部和全局特征信息的同步获取，是高精度三维环境感知亟待解决的问题。然而，目前的表征学习方法要么只关注如何高效地学习局部特征，要么只捕捉到长距离依赖关系，却丢失了细粒度特征。因此，我们探索了点云图拓扑结构的变换器，提出了增强图卷积变换器（EGCT）方法。EGCT 为无序和非结构化的点云构建图拓扑结构。然后，它使用增强点特征表示方法进一步聚合所有邻域点的特征信息，从而可以紧凑地表示该局部邻域图的特征。在随后的处理过程中，图卷积变换器会同时对邻域图中的点坐标和特征进行自注意计算和卷积操作。它有效地利用了点云对象的空间几何信息。因此，EGCT 可以学习点云对象的全面几何信息，有助于提高分割和分类的准确性。在 ShapeNetPart 和 ModelNet40 数据集上，我们的 EGCT 方法的 mIoU 达到 86.8%，OA 和 AA 分别达到 93.5% 和 91.2%。在大规模室内场景点云数据集（S3DIS）上，EGCT 方法的 OA 为 90.1%，mIoU 为 67.8%。实验结果表明，我们的 EGCT 方法可以在保持较低模型复杂度的同时，实现与最先进方法相当的点云分割和分类性能。我们的源代码可在 https://github.com/shepherds001/EGCT 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The Visual Computer

自引率

0.00%

发文量