Shichao Zhang;Yibo Ding;Tianxiang Huo;Shukai Duan;Lidan Wang
{"title":"PointAttention: Rethinking Feature Representation and Propagation in Point Cloud","authors":"Shichao Zhang;Yibo Ding;Tianxiang Huo;Shukai Duan;Lidan Wang","doi":"10.1109/TMM.2024.3521745","DOIUrl":null,"url":null,"abstract":"Self-attention mechanisms have revolutionized natural language processing and computer vision. However, in point cloud analysis, most existing methods focus on point convolution operators for feature extraction, but fail to model long-range and hierarchical dependencies. To overcome above issues, in this paper, we present PointAttention, a novel network for point cloud feature representation and propagation. Specifically, this architecture uses a two-stage Learnable Self-attention for long-range attention weights learning, which is more effective than conventional triple attention. Furthermore, it employs a Hierarchical Learnable Attention Mechanism to formulate momentous global prior representation and perform fine-grained context understanding, which enables our framework to break through the limitation of the receptive field and reduce the loss of contexts. Interestingly, we show that the proposed Learnable Self-attention is equivalent to the coupling of two Softmax attention operations while having lower complexity. Extensive experiments demonstrate that our network achieves highly competitive performance on several challenging publicly available benchmarks, including point cloud classification on ScanObjectNN and ModelNet40, and part segmentation on ShapeNet-Part.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"327-339"},"PeriodicalIF":8.4000,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10814668/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Self-attention mechanisms have revolutionized natural language processing and computer vision. However, in point cloud analysis, most existing methods focus on point convolution operators for feature extraction, but fail to model long-range and hierarchical dependencies. To overcome above issues, in this paper, we present PointAttention, a novel network for point cloud feature representation and propagation. Specifically, this architecture uses a two-stage Learnable Self-attention for long-range attention weights learning, which is more effective than conventional triple attention. Furthermore, it employs a Hierarchical Learnable Attention Mechanism to formulate momentous global prior representation and perform fine-grained context understanding, which enables our framework to break through the limitation of the receptive field and reduce the loss of contexts. Interestingly, we show that the proposed Learnable Self-attention is equivalent to the coupling of two Softmax attention operations while having lower complexity. Extensive experiments demonstrate that our network achieves highly competitive performance on several challenging publicly available benchmarks, including point cloud classification on ScanObjectNN and ModelNet40, and part segmentation on ShapeNet-Part.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.